Let’s say I test network speed by copying two files using scp to a remote server.
$ ls -l file*
-rw-rw-r-- 1 russell russell 104857600 Mar 21 11:53 file1
-rw-rw-r-- 1 russell russell 104857600 Mar 21 11:53 file2
$ scp -C file1 server:
file1 100% 100MB 11.1MB/s 00:09
$ scp -C file2 server:
file2 100% 100MB 100.0MB/s 00:01
file1 was transferred at 11.1MB/s in about 9 seconds. file2, a file of the exact same size, was transferred in a fraction of the time! Why?
Well, I’m using the -C option to scp that enables compression. Perhaps file2 is much more amenable to compression than file1. We can test using gzip.
$ gzip -c file1 | wc -c
$ gzip -c file2 | wc -c
Sure enough, file1 can’t be compressed at all — gzip actually makes file1 bigger. On the other hand, gzip compresses file2 by a factor of a hundred, shrinking 100MB down to 100KB.
Moral of the story
For most performance testing of this sort, use random data that can’t be compressed. Don’t get caught unaware by compression optimizations that will make your test run faster than it should. Note that the -C option to scpcould have been implicitly set in my ssh config file, in which case I wouldn’t have had that clue on my command line.
The exception to this rule is if you want to test real world performance instead of an idealized performance metric. For instance, if you want to know “how fast is my network?”, use random data. If you want to know “how fast is my network at scp transfers with compression enabled of English language text documents?”, use text documents as your sample data.
How I created the files
For random data, /dev/urandom (a never ending stream of random bytes) is your friend. For completely nonrandom data, /dev/zero (a never ending stream of zero bytes) does the trick.
dd if=/dev/urandom bs=1M count=100 > file1
dd if=/dev/zero bs=1M count=100 > file2
I am lazy and always forget the options to dd. You can also use head -c, which is slower but works:
head -c 104857600 /dev/urandom > file1