For some configurations, having whisper write synchronously is better

Bug #710269 reported by Brian Hatfield
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Graphite
Fix Released
Undecided
Unassigned

Bug Description

For my graphite configuration, I noticed really bursty performance; with periods of fast performance, and then other periods of almost-total unresponsiveness. We're running graphite on a less-than-amazing disk configuration, so I wanted to eek out as much performance from the current hardware as I could.

It turned out that the bursty performance was pdflush related - we'd write so much so fast that we'd end up with 500MB+ of dirtypages, which pdflush would then dump out to disk. While pdflush is tunable via the dirty_background_ratio and dirty_ratio parameters, running on a slower disk configuration meant that "background writes" never had cycles to perform, so the dirtypages size never decreased.

By adding an os.fsync call to whisper.py when it closes the file it's writing, I've completely normalized the performance of graphite on my configuration. I'm not sure how this will affect others running on newer kernels, as we are on the RHEL 5.5 kernel (2.6.18), but adding this to whisper.py (0.9.7 release version) had a drastic, positive effect:

Line 363:
fh.flush()
os.fsync(fh.fileno())

right before the fh.close() call.

See the attached graph for the change in performance characteristics. (Note that while I don't have cache size added to this graph, graphite is keeping up writing changes to disk as one would expect).

I hope this can be useful - perhaps as an option for whisper?

Revision history for this message
Brian Hatfield (bmhatfield) wrote :
Revision history for this message
chrismd (chrismd) wrote :

I've also run into issues where over-zealously making write() calls leads to more "wobbly" performance because you fill a bunch of buffers, then are forced to write synchronously, then many of them get flushed, then you go back to writing to buffers really quickly, rinse & repeat. I implemented a setting in carbon.conf called MAX_UPDATES_PER_SECOND to address this in my situation, where carbon simply avoids doing writes too quickly, essentially the same effect. However, your solution has the advantage of not requiring an arbitrarily chosen value. I think this is worth adding as an option, I'll add it in the coming 0.9.8 release.

chrismd (chrismd)
Changed in graphite:
status: New → Fix Committed
chrismd (chrismd)
Changed in graphite:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers