The last received value for a datapoint should be written into whisper file
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
carbon |
Fix Committed
|
Medium
|
Unassigned |
Bug Description
I have multiple clients sending metrics to one carbon-aggregator that does some simple aggregation and pushes the metrics to a single carbon-cache. All client metrics are generated each minute and have the same timestamp value, however metrics from different servers do not reach carbon-aggregator simultaneously - there can be several seconds delay. For example, the 'server1.event' metric for 08:00:00 is received by carbon-aggregator at 08:00:02, while 'server2.event' metric reaches carbon-aggregator at 08:00:25. Carbon-aggregator generates a 'total' aggregated metric with 60-second aggregation interval:
total.events (60) = sum server*.event
While debugging metric flow, I am seeing that, as carbon-aggregator receives metrics from clients, it sends the same aggregated metric to carbon-cache several times. For example:
23/12/2011 13:39:22 :: total.events 1324643940.0 48136.0
23/12/2011 13:40:31 :: total.events 1324643940.0 251980.0
Obviously, the last sent value (251980 in this case) is the correct one.
When carbon-cache gets several values for the same metric+timestamp it seems to store all of them internally. When graphite web interface gets those metrics from carbon-cache RAM cache, the correct last value is being displayed. However, when carbon-cache writes those metrics to whisper file, the first received value is written to the file (48136 in this case), which is incorrect. As the result, "fresh" metrics (that have not been dumped to whisper files yet) are graphed correctly, while older values (that are fetched from whisper files rather than from carbon-cache) are incorrect.
I was able to fix this by a simple 1-line patch to whisper.py (see attached).
Thanks,
Anton.
affects: | graphite → carbon |
Changed in carbon: | |
milestone: | 0.9.10 → none |
Indeed, documented behavior is that the last point sent should win. Thanks for the good find