I fought for days, literally, to get an external SAS enclosure, the Dell MD1000, working with decent performance on RHEL5 x86_64 (PowerEdge 1950 III w/ two quad cores and 16GB RAM) . It’s a 15 disk enclosure, and this one is stuffed with 15 Hitachi 1TB SATAII drives which deliver a raw linear read speed of 80MB/s each. The SAS connection to the Dell PERC5/E PCIe card is made through a 4x SAS cable, which has a theoretical maximum throughput of 4x3Gbps = 12Gbps.
Nice, huh? Well, a default RAID5 array would drop to 60MB/s for 3 parallel linear reads. Ouch!
It took me some time, but it’s now performing much better. Up to 600MB/s for the exact same test.
Here is a nice read for all the gory details :
Basically, the major catch was the read ahead at the Linux kernel level. Simply changing it from 256 to 8192 or 16384 for that block device is enough to get the expected performance (those are 512 byte sectors, so those mean 128kB, 4MB and 8MB respectively). This is done by executing :
/sbin/blockdev --setra 8192 /dev/sdb
This is not to be confused with the PERC’s Read Ahead, which doesn’t improve performance in any of my tests, and is disabled by default.
Apart from this simple change, a few other are quite useful to make sure you get the best possible performance :
- Make sure WriteBack is enabled on the PERC’s volume (this is the default)
- Choose a stripe size matching the use of the volume (64kB or 128kB are usually good choices)
- Make sure “Cached” is enabled for the volume (this is the default, but not when creating the volume with MegaCli)
- Align the partition on the block device with the stripes. Example (beware : kB means 1000B instead of 1024B for parted, so use only Bytes) :
parted /dev/sdb mkpart primary 131072B 100%
- Choose the best filesystem options for your stripe size and number. Example for ext3 :
-E stride=32. Example for xfs :
I think that’s about it. All of my further tests with hdparm, iozone and dumb copies have shown more than acceptable performance. Under load, the device still manages to get a sustained speed of over 200MB/s for both reads and writes simultaneously. And for heavy fragmented reads, it stays over 400MB/s in real life, with iozone reporting up to 800MB/s for a single non cached linear read.