# Efficient implementation of cumulative counter

Bug #1061817 reported by Julien Danjou on 2012-10-04
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Won't Fix
High
Unassigned

### Bug Description

Cumulative counters can have values going back to 0, like:
0, 10, 20, 30, 0, 20, 30

which translates to:
0, 10, 20, 30, 30, 50, 60

Requesting a sum on this looks inefficient at first sight because we need to through every event to check that none of the value went back to 0.

We need to find an efficient way to manipulate this counters in every backend, when at least getting max() for example.

Julien Danjou (jdanjou) on 2012-10-04
 Changed in ceilometer: status: New → Confirmed importance: Undecided → High
 Julien Danjou (jdanjou) wrote on 2012-10-04: #1

I've submitted the problem to a panel of SQL experts and they provided 2 implementations of a MAX() function for such a counter.

Simple
=====
This is probably the most portable approach, but it requires a double scan of the table, so it's not that efficient. It has been tested on MySQL.

SELECT SUM(IF(COALESCE((SELECT counter_volume FROM meter M2 WHERE M2.timestamp>M1.timestamp
ORDER BY M2.TIMESTAMP ASC LIMIT 1),0)<,counter_volume,0)) FROM meter M1 ORDER BY timestamp
ASC

Windowed
========
This uses window function and is more efficient, however I'm not sure it's available everywhere. It has been tested on PostgreSQL.

with t(tops) as (
select case when lead(counter_volume) over w < counter_volume then counter_volume
when lead(counter_volume) over w is null then counter_volume
else null
end as max
from meter
window w as (order by timestamp)
)
select sum(tops) from t;

(t returns the list of the higher values above all, and then we sum them to get max)

I think this is a good proof that this is definitely doable. For such counter, we likely want MAX() - FIRST() (to get the actual number of bytes sent via a NIC, for example ) and this is doable in an SQL query.

 Doug Hellmann (doug-hellmann) wrote on 2012-10-05: #2

 Doug Hellmann (doug-hellmann) wrote on 2012-10-05: #3

We are taking all of our measurements from outside of the VM by asking libvirt (and eventually other hypervisors). Do the counters for disk I/O and bandwidth reset if a VM reboots, or only if the hypervisor itself reboots? If the latter, I think we're safe because either the instance is considered destroyed when the hypervisor reboots or we could add a "sequence id" to the event stream to help us identify the reboot condition.

 Julien Danjou (jdanjou) wrote on 2012-10-05: #4

Hum, wow adding a sequence ID would help?

 Doug Hellmann (doug-hellmann) wrote on 2012-10-05: #5

If the agent creates a unique sequence id (like a uuid) each time it starts, that gives us a value that is the same for all events for a given run of the agent. We can then use it to group the query results. The result for a meter would be the total of the max volumes for that meter grouped by resource id and sequence id.

This doesn't work if the agent restarts but the counter doesn't reset, though, so I'm not sure it really helps after all.

 John Tran (jtran) wrote on 2012-11-05: #6

Would there need to be a solution implemented for mongodb too?

 Jay Pipes (jaypipes) wrote on 2012-11-05: #7

Are you sure the MySQL solution above actually works?

mysql> desc meter;
+---------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(20) | NO | | NULL | |
| value | int(10) unsigned | NO | | NULL | |
| created | datetime | NO | | NULL | |
+---------+------------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)

mysql> select * from meter;
+----+--------+-------+---------------------+
| id | name | value | created |
+----+--------+-------+---------------------+
| 1 | meter1 | 0 | 2012-11-05 15:18:17 |
| 2 | meter1 | 10 | 2012-11-05 15:18:21 |
| 4 | meter1 | 20 | 2012-11-05 15:18:28 |
| 5 | meter1 | 30 | 2012-11-05 15:18:31 |
| 6 | meter1 | 0 | 2012-11-05 15:18:35 |
| 7 | meter1 | 20 | 2012-11-05 15:18:39 |
| 8 | meter1 | 30 | 2012-11-05 15:18:45 |
+----+--------+-------+---------------------+
7 rows in set (0.00 sec)

mysql> SELECT SUM(IF(COALESCE((SELECT value FROM meter M2 WHERE M2.created>M1.created ORDER BY M2.created ASC LIMIT 1),0), value,0)) FROM meter M1 ORDER BY created ASC;
+------------------------------------------------------------------------------------------------------------------------+
| SUM(IF(COALESCE((SELECT value FROM meter M2 WHERE M2.created>M1.created ORDER BY M2.created ASC LIMIT 1),0), value,0)) |
+------------------------------------------------------------------------------------------------------------------------+
| 50 |
+------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

 Jay Pipes (jaypipes) wrote on 2012-11-05: #8

To be honest, it is probably going to be much more efficient to handle cumulative counters at the time of collection than at aggregation.

In other words, if it is known that the meter is cumulative and resettable, then logic should be put into the trigger to account for new zero values....

 John Tran (jtran) wrote on 2012-11-05: #9

Jay, thanks for the input I was thinking the same thing the only caveat is the amount of sql overhead of initiating a API call for every meter record.

 Julien Danjou (jdanjou) wrote on 2012-11-06: #10

That raises again the issue on how to deal with that when the pollster restart and can't have a clue the counter has been reset.
I really don't think it would solve the problem at the right place.

Julien Danjou (jdanjou) on 2012-12-28

Is there a reason that ceilometer resets a meter at all? It seems weird that a cumulative meter can be reset unless it is done so manually.

I'm encountering this issue when suspending an instance and tracking cpu time. Would the reset originate in ceilometer or nova? Does the reset even have a purpose or is it accidental?

 Harri Hämäläinen (hhamalai) wrote on 2014-02-13: #12

The reason is that semantics of Telemetry's "Cumulative counter" promise an increassing sequence of metered values, but there's no implementation to keep that promise when the actual data source resets the metered value to zero. E.g. libvirt will reset the CPU seconds to zero when the machine reboots. Telemetry really should provide this implementation as breaking this promise of cumulative counters will result obscure errors in other applications using the data provided by Telemetry.

 Changed in ceilometer: assignee: nobody → Anirudh Vedantam (anirudh-vedantam)
 information type: Public → Public Security information type: Public Security → Private Security
Julien Danjou (jdanjou) on 2014-03-21
 information type: Private Security → Public
ZhaoHangbo (497492840-9) on 2014-12-09
 information type: Public → Public Security information type: Public Security → Private Security
Jeremy Stanley (fungi) on 2014-12-09
 information type: Private Security → Public
gordon chung (chungg) on 2015-09-02
 Changed in ceilometer: assignee: Anirudh Vedantam (anirudh-vedantam) → nobody status: Confirmed → Triaged

 Changed in ceilometer: status: Triaged → Incomplete assignee: nobody → 董浩 (photodh)

 Changed in ceilometer: assignee: 董浩 (photodh) → nobody
Julien Danjou (jdanjou) on 2015-11-06
 Changed in ceilometer: status: Incomplete → Triaged
 Julien Danjou (jdanjou) wrote on 2016-08-16: #14

This won't be fixed in Ceilometer, and should be handled by Gnocchi instead.

 Changed in ceilometer: status: Triaged → Won't Fix