Corrupt / empty files causing problems

Bug #874826 reported by Scott Smith on 2011-10-15
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
carbon
New
Undecided
Unassigned

Bug Description

Earlier today my graphite server OOM'ed (long story). At the same time, carbon was creating a bunch of new metrics. These were empty files and caused a lot of problems (data never got written to them, graphite-web metrics tree didn't show *any* stats in that directory, etc).

I'm not sure what you think should be proper behavior here. This may be acceptable to you. I would like to propose two things:

1. [most important] It seems like at least the tree browser should ignore the corrupt files.
2. It would also be nice if carbon recreated a file that was 0 bytes.

Obviously an edge case, but could prevent a lot of confusion.

Darrell Bishop (darrellb) wrote :

I had something very similar affect me. From my (internal) bug report (which we haven't investigated yet):

"...one device's CPU whisper directory had 2 okay files, one partially-written file (maybe with a bad header as well?), and several zero-byte files. While in this state, metrics collection seemed to not be working. Metrics were coming in (possibly getting stored to the okay files, but I can't remember), but the problem didn't correct itself even after I removed the partial file and restarted daemons all over the place. Completely removing both nodes' whisper directories allowed things to work again once all the whisper files got created again."

In hindsight, maybe it was the zero-byte files causing the problem? It doesn't sound like I tried removing those.

Definitely rm any empty files, and check directory owners and permissions.

I've seen that cause problems. Check the logs for 'corrupt whisper file' errors.

-Nick

On Mar 5, 2012, at 11:24 AM, Darrell Bishop <email address hidden> wrote:

> I had something very similar affect me. From my (internal) bug report
> (which we haven't investigated yet):
>
> "...one device's CPU whisper directory had 2 okay files, one partially-
> written file (maybe with a bad header as well?), and several zero-byte
> files. While in this state, metrics collection seemed to not be
> working. Metrics were coming in (possibly getting stored to the okay
> files, but I can't remember), but the problem didn't correct itself even
> after I removed the partial file and restarted daemons all over the
> place. Completely removing both nodes' whisper directories allowed
> things to work again once all the whisper files got created again."
>
> In hindsight, maybe it was the zero-byte files causing the problem? It
> doesn't sound like I tried removing those.
>
> --
> You received this bug notification because you are subscribed to
> Graphite.
> https://bugs.launchpad.net/bugs/874826
>
> Title:
> Corrupt / empty files causing problems
>
> Status in Graphite - Enterprise scalable realtime graphing:
> New
>
> Bug description:
> Earlier today my graphite server OOM'ed (long story). At the same
> time, carbon was creating a bunch of new metrics. These were empty
> files and caused a lot of problems (data never got written to them,
> graphite-web metrics tree didn't show *any* stats in that directory,
> etc).
>
> I'm not sure what you think should be proper behavior here. This may
> be acceptable to you. I would like to propose two things:
>
> 1. [most important] It seems like at least the tree browser should ignore the corrupt files.
> 2. It would also be nice if carbon recreated a file that was 0 bytes.
>
> Obviously an edge case, but could prevent a lot of confusion.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/graphite/+bug/874826/+subscriptions

Sidnei da Silva (sidnei) on 2012-05-08
affects: graphite → carbon
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers