2012-02-29 04:14:35 |
Francois Mikus |
description |
I found my fresh graphite install abusing my disk pretty heavily. Upon investigation, I found carbon creating new whisper files and then issueing an 800mb write to that file. This is repeatable for all metrics not previously seen.
strace output here: https://raw.github.com/gist/1338425/3da1dd9e541cc62f7419c706e5631b6774147624/gistfile1.txt
I tracked this in whisper.py to here (this code is copied from 0.9.9's whisper)
for secondsPerPoint,points in archiveList:
archiveInfo = struct.pack(archiveInfoFormat, archiveOffsetPointer, secondsPerPoint, points)
fh.write(archiveInfo)
archiveOffsetPointer += (points * pointSize)
zeroes = '\x00' * (archiveOffsetPointer - headerSize)
fh.write(zeroes)
This code, to me, says to write a few headers and then pads the rest of the file with zeroes. This zero-fill operation causes a huge amount of bytes to be written to disk and explains the heavy I/O usage I observed.
I think the 'zeroing' action can be better written like this:
fh.seek(archiveOffsetPointer - headerSize - 1)
fh.write("\0")
The above should achieve the same results as the original code but without incurring huge amounts of disk activity.
I'm pretty sure this is a problem and am quite happy to write a patch for this. Thoughts? |
I found my fresh graphite install abusing my disk pretty heavily. Upon investigation, I found carbon creating new whisper files and then issueing an 800mb write to that file. This is repeatable for all metrics not previously seen.
strace output here: https://raw.github.com/gist/1338425/3da1dd9e541cc62f7419c706e5631b6774147624/gistfile1.txt
I tracked this in whisper.py to here (this code is copied from 0.9.9's whisper)
for secondsPerPoint,points in archiveList:
archiveInfo = struct.pack(archiveInfoFormat, archiveOffsetPointer, secondsPerPoint, points)
fh.write(archiveInfo)
archiveOffsetPointer += (points * pointSize)
zeroes = '\x00' * (archiveOffsetPointer - headerSize)
fh.write(zeroes)
This code, to me, says to write a few headers and then pads the rest of the file with zeroes. This zero-fill operation causes a huge amount of bytes to be written to disk and explains the heavy I/O usage I observed.
I think the 'zeroing' action can be better written like this:
fh.seek(archiveOffsetPointer - headerSize - 1)
fh.write("\0")
The above should achieve the same results as the original code but without incurring huge amounts of disk activity.
I'm pretty sure this is a problem and am quite happy to write a patch for this. Thoughts? |
|