Multiple same-second events are discarded

Bug #987176 reported by Jens Rantil on 2012-04-23
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Graphite
New
Undecided
Unassigned
whisper
New
Undecided
Unassigned

Bug Description

This bug report came out of a question a posed a week ago here: https://answers.launchpad.net/graphite/+question/193502

I have seen numerous examples where people are using Whisper/Carbon to counts number of events over time. Examples of this are for example aggregating number of new customers per day. There is a problem with this; updates in Whisper are always simply setters rather than increments. This means that two events that happens on the same minimum resolution will never be counted as 2, but rather as 1. Obviously the expected outcome should be 2, since two events actually happened.

Sure, your Whisper database could be updated with a new resolution. But that smallest resolution handled by Whisper is a second and two events on the same second will always result in only one event counted. This makes me consider this as a bug, since you actually can't reliably count events with Carbon/Whisper - this is a concurrency bug.

There are two ways this can be solved;
 1. Do aggregation of events within the same second in the event-generating application.
 2. Allow Whisper and Carbon to receive incremental updates.

While I understand you want ot keep Graphite simple, I propose the latter (alternative 2, that is) using the syntax "my.data +1 1334621057" to Carbon, and also the following to Whisper:

    $ whisper-update.py test.wsp 1334621057:+1

Depending on how scalable things should be; Would you consider some other format that is more easily parsed? Really that's up to you.

Note that I do not propose to add "my.data -1 1334621057" or similar - things like that can be transformed afterwards in graphite webapp.

I understand that there will be a performance penalty when writing an increment to Whisper since a read needs to be done for each write. However, this is only for the timestamp incremented in question. Apart from that, aggregationMethod=sum will do its job. Do note, however, that change 2 also introduces optimization possibilities in terms of the aggregating proxy. Multiple increments for the same second can be aggregated to one before writing to disc. Therefor, I don't consider this to break Graphite scalability in any way.

Also, this change will yield two addition improvements to Whisper:
 * importing data into Whisper waaay easier. Today (see refered question), importing data into Whisper requires manual aggregation to get the numbers right.
 * counting/aggregating events older than the minimum resolution will be easier than it is today.

Related branches

Jens Rantil (jens-rantil) wrote :

The more I've been thinking about this, the more I realize this bug could actually be more serious than first anticipated. It wouldn't surprise me if there were many users who simply make a direct call/push to carbon for every event.

Jens Rantil (jens-rantil) wrote :

I just saw that you were working on a new DB engine, ceres (http://graphite.wikidot.com/roadmap). Does ceres adress these issues? Maybe it's wiser to work on it there.

Jens Rantil (jens-rantil) wrote :

As a workaround for this bug https://github.com/etsy/statsd could be used.

Jens Rantil (jens-rantil) wrote :

> * importing data into Whisper waaay easier. Today (see refered question), importing data into Whisper requires manual aggregation to get the numbers right.

The related branch that I just added at least solves the issue of importing aggregated values.

Nicholas Leskiw (nleskiw) wrote :

What happens if I have a 60 second interval (e.g. 1min:1day retention rate) and I send two updates in the same minute like this:

my.data +1 1337029420
my.data +1 1337029430

(both fall into the minute of 2012-05-14 16:03 CST)

Will that result in a value of 2 or 1 for that minute?

It will result in a 2 for that minute.

On Mon, May 14, 2012 at 11:05 PM, Nicholas Leskiw <<email address hidden>
> wrote:

> What happens if I have a 60 second interval (e.g. 1min:1day retention
> rate) and I send two updates in the same minute like this:
>
> my.data +1 1337029420
> my.data +1 1337029430
>
> (both fall into the minute of 2012-05-14 16:03 CST)
>
> Will that result in a value of 2 or 1 for that minute?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/987176
>
> Title:
> Multiple same-second events are discarded
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/graphite/+bug/987176/+subscriptions
>

--
Want to know how full my inbox is? Or how to get in touch with me faster?
Or tell me your e-mail is not that important? Then check this out:
http://courteous.ly/4WtfZY

Evan Kyle (evankyle) wrote :

Would be nice to have the ability to use the other aggregation types as well.

key value timestamp [avg|last(default)|sum|max|min]

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers