Multiple same-second events are discarded
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Graphite |
New
|
Undecided
|
Unassigned | ||
whisper |
New
|
Undecided
|
Unassigned |
Bug Description
This bug report came out of a question a posed a week ago here: https:/
I have seen numerous examples where people are using Whisper/Carbon to counts number of events over time. Examples of this are for example aggregating number of new customers per day. There is a problem with this; updates in Whisper are always simply setters rather than increments. This means that two events that happens on the same minimum resolution will never be counted as 2, but rather as 1. Obviously the expected outcome should be 2, since two events actually happened.
Sure, your Whisper database could be updated with a new resolution. But that smallest resolution handled by Whisper is a second and two events on the same second will always result in only one event counted. This makes me consider this as a bug, since you actually can't reliably count events with Carbon/Whisper - this is a concurrency bug.
There are two ways this can be solved;
1. Do aggregation of events within the same second in the event-generating application.
2. Allow Whisper and Carbon to receive incremental updates.
While I understand you want ot keep Graphite simple, I propose the latter (alternative 2, that is) using the syntax "my.data +1 1334621057" to Carbon, and also the following to Whisper:
$ whisper-update.py test.wsp 1334621057:+1
Depending on how scalable things should be; Would you consider some other format that is more easily parsed? Really that's up to you.
Note that I do not propose to add "my.data -1 1334621057" or similar - things like that can be transformed afterwards in graphite webapp.
I understand that there will be a performance penalty when writing an increment to Whisper since a read needs to be done for each write. However, this is only for the timestamp incremented in question. Apart from that, aggregationMeth
Also, this change will yield two addition improvements to Whisper:
* importing data into Whisper waaay easier. Today (see refered question), importing data into Whisper requires manual aggregation to get the numbers right.
* counting/
Related branches
- Michael Leinartas: Approve
-
Diff: 133 lines (+61/-10)4 files modifiedbin/whisper-resize.py (+58/-6)
bin/whisper-update.py (+0/-1)
setup.py (+1/-1)
whisper.py (+2/-2)
The more I've been thinking about this, the more I realize this bug could actually be more serious than first anticipated. It wouldn't surprise me if there were many users who simply make a direct call/push to carbon for every event.