First I share some common answer and their problem:
1. A tree-type algorithm:
First round: Copy the file to 2nd server.
2nd round: Copy the file from 1st server to 3rd server AND from 2nd server to 4th server at the same time.
3rd round: 1->5, 2->6, 3->7, 4->8
And so on.
So log2(N) rounds of file transfer can sync the file on N systems. Assume t is the time to transfer 5G file over network, it will take log2(N) * t.
It is nice from algorithm point of view. But in real system, if the 5G file cannot fit in the system cache, it will be re-read from disk for N times on 1st system, which can be slow. (Ref. Numbers Everyone Should Know)
2. "Install BT and let BT do it" solution
It can be a simple answer for a lazy guy and a very complex real environment. But actually BT would do many thing that is unnecessary for this simple task.
My solution: A pipeline streaming solution
Stream the file from 1st to 2nd system (e.g. use nc). And stream the file from 2nd system to 3rd system at the same time and so on.
In theory, it will take N*(initial transfer delay) + t. (see the graph below)
And the biggest benefit is, the part of the file that the system operate on will be most likely in the system cache when it is writing and reading from disk, which will greatly increase the efficiency.
Here is a graph to illustrate. The x-axis is the time. The upper part is the tree-algorithm and the lower part is the streaming solution:
Moreover, if the program can utilize zero-copy (A good reference from IBM: Efficient data transfer through zero copy), it may be even faster.
But one major problem of this solution is that if one node is broken, the transfer to the node after it will be affected.
But frankly I haven't test it in real environment yet. I would like to know how it would perform in real environment.