Rsync performance for tar files
How well does rsync handle TAR files?
Tar files are basically a bunch of files appended together:
[header] [file1] [header] [file2] ...
And rsync is supposed to be smart about just transferring the parts of files that have changed. If I append to a tar file of course it'll be efficient:
dd if=/dev/urandom bs=1M count=100 > /tmp/big1 dd if=/dev/urandom bs=1M count=100 > /tmp/big2 tar -rvf a.tar /tmp/big1 rsync -v a.tar localhost:/tmp/out.tar → sent 104,893,539 bytes received 35 bytes 209,787,148.00 bytes/sec → total size is 104,867,840 speedup is 1.00 tar -rvf a.tar /tmp/big2 rsync -v a.tar localhost:/tmp/out.tar → sent 104,934,486 bytes received 71,726 bytes 42,002,484.80 bytes/sec → total size is 209,725,440 speedup is 2.00
But what if I add the new file to the beginning?
/bin/rm /tmp/out.tar tar -rvf 1.tar /tmp/big1 rsync -v 1.tar localhost:/tmp/out.tar → total size is 104,867,840 speedup is 1.00 tar -rvf 2-1.tar /tmp/big2 /tmp/big1 rsync -v 2-1.tar localhost:/tmp/out.tar → total size is 209,725,440 speedup is 2.00
Looks good. What if I change the order?
tar -rvf 1-2.tar /tmp/big1 /tmp/big2 rsync -v 1-2.tar localhost:/tmp/out.tar → total size is 209,725,440 speedup is 1,130.39
Even that is fine. Looks like rsync does a pretty good job of finding existing chunks.
~~ dev, 2020-04-20 16:21