Using Random Data in Linux Performance Testing

March 22, 2013

Digital Technology, Digital Transformation

Russell Silva

A Mystery

Let’s say I test network speed by copying two files using scp to a remote server.

$ ls -l file*
-rw-rw-r-- 1 russell russell 104857600 Mar 21 11:53 file1
-rw-rw-r-- 1 russell russell 104857600 Mar 21 11:53 file2
$ scp -C file1 server:
file1   100%  100MB  11.1MB/s   00:09
$ scp -C file2 server:
file2   100%  100MB 100.0MB/s   00:01

file1 was transferred at 11.1MB/s in about 9 seconds. file2, a file of the exact same size, was transferred in a fraction of the time! Why?

Mystery solved

Well, I’m using the -C option to scp that enables compression. Perhaps file2 is much more amenable to compression than file1. We can test using gzip.

$ gzip -c file1 | wc -c
104874184
$ gzip -c file2 | wc -c
101803

Sure enough, file1 can’t be compressed at all — gzip actually makes file1 bigger. On the other hand, gzip compresses file2 by a factor of a hundred, shrinking 100MB down to 100KB.

Moral of the story

For most performance testing of this sort, use random data that can’t be compressed. Don’t get caught unaware by compression optimizations that will make your test run faster than it should. Note that the -C option to scpcould have been implicitly set in my ssh config file, in which case I wouldn’t have had that clue on my command line.

The exception to this rule is if you want to test real world performance instead of an idealized performance metric. For instance, if you want to know “how fast is my network?”, use random data. If you want to know “how fast is my network at scp transfers with compression enabled of English language text documents?”, use text documents as your sample data.

How I created the files

For random data, /dev/urandom (a never ending stream of random bytes) is your friend. For completely nonrandom data, /dev/zero (a never ending stream of zero bytes) does the trick.

dd if=/dev/urandom bs=1M count=100 > file1
dd if=/dev/zero bs=1M count=100 > file2

I am lazy and always forget the options to dd. You can also use head -c, which is slower but works:

head -c 104857600 /dev/urandom > file1

Using Random Data in Linux Performance Testing

Russell Silva

A Mystery

Mystery solved

Moral of the story

How I created the files

Services

Company

Social