Comment 38 for bug 666211

Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

To concur with comment #37, I speculate that you have a slow EBS volume or you aren't able to commit things fast enough due to your heavy I/O. Performance of EBS volumes can vary widely. One thing to remember is that EBS disks _ARE NETWORK ATTACHED STORAGE_, and with that comes all the fun that network attached storage bring.

I think that you can do some tuning here and see where you get. Try setting the following sysctl settings (these will force uncommitted disks writes to be flushed sooner than later). You can play with the settings, as this is a delicate balance between performance and being safe.
vm.dirty_writeback_centisecs = 300 ( force flush after three seconds )
vm.dirty_ratio = 5 ( no more 3% of memory can be dirty pages )

My hunch is that you are using at least a m1.large or c1.meduim (at the least) and you saturating the network links used to flush the disk writes, while at the same time pulling more information onto the disk(s), preventing the flush from completing. The default settings on Maverick allow for 20% of memory and 5 seconds when flushing. Reducing the ratios will affect your performance, but I suspect that it will stabilize your system to make sure that everything gets to disk.

Another tactic would be use to ephemeral store as a "temp" directory -- push your data to the emphemeral storage and when it is ready for permance, commit it to the RAIDed EBS volumes.