keskiviikko 10. marraskuuta 2010

More on software RAID 10: Benchmarking the heck out of it

Background
I already touched the RAID issue in previous blog post of mine. I have been doing lot of (software) RAID 10 installs lately, mainly in openSUSE that I use for workstations and servers. OpenSUSE seemingly has some bugs when installing on RAID 10, at leasts on Dell Precision T3500 machines, but they are resolvable with some command line magic. Anyway RAID 10 is at least four hard drive setup where your data is both striped and mirrored, which in my mind is a nice balance between data preservation and performance. RAID 10 enables you to utilize half of the available disk space, while the other half is used for mirroring. RAID 10 also has 66% chance of surviving 2-drive failure.

Before creating my first RAID 10 array for my desktop at work I read up on RAID 10 performance and settings that affect that. From reading several blog and forum posts on the issue I came up with the impression that F2 RAID layout and chunk size around 256kb to 512kb would be the best performing setup. I now realize that it is not that simple, but varies greatly by the usage scenario. After that realization I decided to test different ways of creating 4-disk RAID 10 and wrote a script that loops through (almost) all ways of creating SW RAID 10 and runs some tests on the RAID to help me decice optimal settings for my usage scenario.

Testing
As I mentioned in previous paragraph, I wrote a small and primitive shell script that handles creating the software RAID 10 array automatically, tests it to extent I wanted and writes down the results for analysis. Some idea of the scale of this testing gives that IOzone tests alone took almost a week to execute on my test bench and I ended up with 1500 datapoints (1500 different variations of the RAID 10 setup). Some of the variables used in the testing were: filesystem (ext3, ext4 etc), RAID layout (n2/f2), chunk size (64kb ... 2mb), stride and stripe.

The results are not exactly scientific grade data, but they are sufficient enough for me to draw some conclusions on the matter. You should not use them as absolute proof of anything, I am first to admit that my scripting skill and knowledge in filesystems and RAID settings is limited.

The script runs several different kinds of tests that happen to interest me personally, it would be quite trivial to add more tests there, but these are the ones that interested me the most. IOzone is the one test that I already mentioned, the script also runs hdparm read timings test, Combilebench and something called Custom. Custom test for me is just basically measuring a real life usage scenario here at work, in which I simulate what my continuous integration servers do day in, day out. My Custom basically times how long it takes to make a local clone from Mercurial SCM repository to the RAID, then builds our Qt-project according to our build steps (configuration, cleaning, compilation and so on). The Custom step obviously needs to be disabled or modified to meet ones needs.

The Script
The script is available here: raid_benchmark.sh.
Keep in mind that you is it on your own risk and will need to modify it to some extent to suit your own needs. Most important point is that it has only implementation for SUSE and Debian based distributions, of which openSUSE is tested.

Benchmark results will come on later date.