Friday 4 January 2013

Tweaking


After playing around with the individual drive and array defaults I found that the performance can be improved substantially with a few easy tweaks.

Useful resources

I have looked long and hard for formulas in that can be used to obtain results that make sense but ultimately it comes down to testing each value and doing a benchmark test in order to 'measure' the difference it makes in your environment.

A few settings that can be adjusted are listed below. I list them in the order in which I would apply the settings during benchmarking testing, starting with the settings with the biggest impact and ending with the settings with smaller impact according to my findings.

Settings applied to the mdadm RAID array with defaults on my system (Ubuntu 12.04):

Summary of mdadm RAID Array settings
Command to apply setting Default value Tweaked value Description
Blockdev --setra 20480 /dev/md1268192 102400 Read-ahead
echo 5120 > /sys/block/md126/md/stripe_cache_size 256 5120 Stripe-cache size
echo 100000 > /sys/block/md126/md/speed_limit_max? 100 000 max speed

It is important to keep in mind that the stripe_cache_size will use a portion of RAM. For example a mdadm RAID array such as mine will use:
stripe_cache_size * block some * number of disks
=32768 * 4k * 4 (active disks)
=512MB of RAM

In my case I have 4GB of RAM and the functions performed on the machine are pretty basic so it is of little concern.

Settings applied to each drive with defaults on my system (Ubuntu 12.04)

Summary of mdadm RAID Array settings
Command to apply setting Default value Tweaked value Description
Blockdev --setra 20480 /dev/md1268192 102400 Read-ahead
echo 1 > /sys/block/sdX/queue/queue_depth31 1 NCQ Queue Depth
echo 64 > /sys/block/sdX/queue/nr_requests128 64 Nr of requests
echo deadline > /sys/block/sdX/queue/schedulerdefault noop deadline [cfq] deadline Scheduler

After hours of testing and a massive spreadsheetI have values that provide substantial performance gains. Here are some benchmark tests.



Before I apply the values persistently I will reset the values by restarting the machine..




The key benchmark with the iozone test is Stride Read, so we compare that now. 2657627 before vs 2818608 after. dd test, 150 MB/s before and 236 MB/s after.

Let's look at the bonnie output. I will spend the most energy on this as I think this is the most informative benchmark: As expected, we see that the Sequential block output is similar to the dd output. Doing dd tests have actually been redundant as the results are also contained in the bonnie output as well, but for the sake of thoroughness I did both tests.
Sequencial block input or read isn't much improved by the tweaking. This is not ideal, although it i reality unfortunately. Read and writes are a balancing act as a read performance improvement will mostly cause a reduction in write performance.
Sequential block rewrite is reading data and then writing it, so it is essentially the reading and writing performance combined. In this case, 103900 with defaults and 136714 with the tweaks in place.
Random seeks are how many random blocks bonnie can read, in this case 519 vs 404.
The +++++ means that the measurement is fast too the point where the error margin is a sizeable percentage of the measurement and the result is therefore inaccurate.



Here is a discussion on a script that tweaks the mdadm RAID array automatically. I used this as a reference although I found some of the settings mentioned here not to make a great difference.
http://ubuntuforums.org/showthread.php?t=1916607

I found that the best way to tweak was to choose some baseline value based on manual tweaking and testing and from there run through a number of values on one setting and compare them to each other. I used the following script to save some time:



In order to analyse the output I used a simple greps like below. I used screen to run the benchmark and logged all the screen output with the -L option.



Importing this into Excel and using the conditional formatting makes digging through the number easier. It is clear from the numbers below that there will not be a size fits all solution. The Sequential input and output is an example of settings that play off against each other.

Another interesting observation is the large impact the /sys/md126/md/queue/scheduler setting has on the sequential block input and output.

Also, it is useful to note that more cache isn't always better


My choice has been made and I will use this script below to configure it after reboot.



 In the next post I plan to implement these values persistently.


1 comment:

  1. Nice work. I'm thinking about the performance drop seen when 16384 and 32768 are being used as values for stripe_cache_size. Could it be your system started swapping at that point?

    ReplyDelete