Comment 9 for bug 885944

Revision history for this message
Nicholas Leskiw (nleskiw) wrote : Re: [Bug 885944] Re: carbon/whisper performing 800mb writes

This is not a problem, Whisper does this to ensure that the files do not become fragmented and are as sequential on the disk as possible. A side effect is that if your storage partition fills up, all the space has been allocated and Graphite continues to collect data for existing metrics.

Please do not change this behavior. You can adjust the rate new metrics are created in the carbon configuration files and limit the number of new creates per minute. (Carbon caches data until it is written to disk.)

-Nick

On Feb 28, 2012, at 10:14 PM, Francois Mikus <email address hidden> wrote:

> ** Description changed:
>
> I found my fresh graphite install abusing my disk pretty heavily. Upon
> investigation, I found carbon creating new whisper files and then
> issueing an 800mb write to that file. This is repeatable for all metrics
> not previously seen.
>
> strace output here:
> https://raw.github.com/gist/1338425/3da1dd9e541cc62f7419c706e5631b6774147624/gistfile1.txt
>
> I tracked this in whisper.py to here (this code is copied from 0.9.9's
> whisper)
>
> - for secondsPerPoint,points in archiveList:
> - archiveInfo = struct.pack(archiveInfoFormat, archiveOffsetPointer, secondsPerPoint, points)
> - fh.write(archiveInfo)
> - archiveOffsetPointer += (points * pointSize)
> + for secondsPerPoint,points in archiveList:
> + archiveInfo = struct.pack(archiveInfoFormat, archiveOffsetPointer, secondsPerPoint, points)
> + fh.write(archiveInfo)
> + archiveOffsetPointer += (points * pointSize)
>
> - zeroes = '\x00' * (archiveOffsetPointer - headerSize)
> - fh.write(zeroes)
> + zeroes = '\x00' * (archiveOffsetPointer - headerSize)
> + fh.write(zeroes)
>
> This code, to me, says to write a few headers and then pads the rest of
> the file with zeroes. This zero-fill operation causes a huge amount of
> bytes to be written to disk and explains the heavy I/O usage I observed.
>
> I think the 'zeroing' action can be better written like this:
>
> fh.seek(archiveOffsetPointer - headerSize - 1)
> fh.write("\0")
>
> The above should achieve the same results as the original code but
> without incurring huge amounts of disk activity.
>
> I'm pretty sure this is a problem and am quite happy to write a patch
> for this. Thoughts?
>
> --
> You received this bug notification because you are subscribed to
> Graphite.
> https://bugs.launchpad.net/bugs/885944
>
> Title:
> carbon/whisper performing 800mb writes
>
> Status in Graphite - Enterprise scalable realtime graphing:
> Fix Committed
>
> Bug description:
> I found my fresh graphite install abusing my disk pretty heavily. Upon
> investigation, I found carbon creating new whisper files and then
> issueing an 800mb write to that file. This is repeatable for all
> metrics not previously seen.
>
> strace output here:
> https://raw.github.com/gist/1338425/3da1dd9e541cc62f7419c706e5631b6774147624/gistfile1.txt
>
> I tracked this in whisper.py to here (this code is copied from 0.9.9's
> whisper)
>
> for secondsPerPoint,points in archiveList:
> archiveInfo = struct.pack(archiveInfoFormat, archiveOffsetPointer, secondsPerPoint, points)
> fh.write(archiveInfo)
> archiveOffsetPointer += (points * pointSize)
>
> zeroes = '\x00' * (archiveOffsetPointer - headerSize)
> fh.write(zeroes)
>
> This code, to me, says to write a few headers and then pads the rest
> of the file with zeroes. This zero-fill operation causes a huge amount
> of bytes to be written to disk and explains the heavy I/O usage I
> observed.
>
> I think the 'zeroing' action can be better written like this:
>
> fh.seek(archiveOffsetPointer - headerSize - 1)
> fh.write("\0")
>
> The above should achieve the same results as the original code but
> without incurring huge amounts of disk activity.
>
> I'm pretty sure this is a problem and am quite happy to write a patch
> for this. Thoughts?
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/graphite/+bug/885944/+subscriptions