Tuesday, February 1, 2011

How to copy a 5G file to large number of servers in shortest time?

One day my ex-colleague asked me this question. It seems like a common interview question. I don't know the best answer (if exist) but I would like to share my solution and my view point.

First I share some common answer and their problem:

1. A tree-type algorithm:

First round: Copy the file to 2nd server.
2nd round: Copy the file from 1st server to 3rd server AND from 2nd server to 4th server at the same time.
3rd round: 1->5, 2->6, 3->7, 4->8
And so on.

So log2(N) rounds of file transfer can sync the file on N systems. Assume t is the time to transfer 5G file over network, it will take log2(N) * t.

It is nice from algorithm point of view. But in real system, if the 5G file cannot fit in the system cache, it will be re-read from disk for N times on 1st system, which can be slow. (Ref. Numbers Everyone Should Know)

2. "Install BT and let BT do it" solution

It can be a simple answer for a lazy guy and a very complex real environment. But actually BT would do many thing that is unnecessary for this simple task.

My solution: A pipeline streaming solution

Stream the file from 1st to 2nd system (e.g. use nc). And stream the file from 2nd system to 3rd system at the same time and so on.

In theory, it will take N*(initial transfer delay) + t. (see the graph below)
And the biggest benefit is, the part of the file that the system operate on will be most likely in the system cache when it is writing and reading from disk, which will greatly increase the efficiency.

Here is a graph to illustrate. The x-axis is the time. The upper part is the tree-algorithm and the lower part is the streaming solution:


Moreover, if the program can utilize zero-copy (A good reference from IBM: Efficient data transfer through zero copy), it may be even faster.

But one major problem of this solution is that if one node is broken, the transfer to the node after it will be affected.

But frankly I haven't test it in real environment yet. I would like to know how it would perform in real environment.

Thursday, January 13, 2011

pipe, split, rotate

Recently I have to strace to monitor a long-running daemon. strace will generate a lot of output but it doesn't have a built-in way to split the output into files. At first, I have a few idea to solve the problem:

1. "split" command can split the output to files but it has a pre-set limit. For example, default it names the output file with suffix "aa", "ab"...., "zz" and then it will stop. You can increase suffix length but it will eventually stop.

2. Normal "logrotate" method would not work because strace will not accept SIGHUP (like httpd does) to close and open the log file.

3. Use a cronjob to "stop, rotate file and restart strace" - it will lost the trace between the stop and restart of strace.

I then try to write my own program to read the input and write to file and rotate. But after I finished, I search on web and find an existing program which can solve my problem: "rotatelogs". It comes with httpd package.

For example:

# strace -f -t -p | rotatelogs output_log 86400


It will write the output to output_log and rotate it every 86400s (24hrs). You can also specify the size of each output file.