Efficient implementation of cumulative counter

Bug #1061817 reported by Julien Danjou on 2012-10-04
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Won't Fix

Bug Description

Cumulative counters can have values going back to 0, like:
0, 10, 20, 30, 0, 20, 30

which translates to:
0, 10, 20, 30, 30, 50, 60

Requesting a sum on this looks inefficient at first sight because we need to through every event to check that none of the value went back to 0.

We need to find an efficient way to manipulate this counters in every backend, when at least getting max() for example.

Julien Danjou (jdanjou) on 2012-10-04
Changed in ceilometer:
status: New → Confirmed
importance: Undecided → High
Julien Danjou (jdanjou) wrote :

I've submitted the problem to a panel of SQL experts and they provided 2 implementations of a MAX() function for such a counter.

This is probably the most portable approach, but it requires a double scan of the table, so it's not that efficient. It has been tested on MySQL.

SELECT SUM(IF(COALESCE((SELECT counter_volume FROM meter M2 WHERE M2.timestamp>M1.timestamp
      ORDER BY M2.TIMESTAMP ASC LIMIT 1),0)<,counter_volume,0)) FROM meter M1 ORDER BY timestamp

This uses window function and is more efficient, however I'm not sure it's available everywhere. It has been tested on PostgreSQL.

with t(tops) as (
  select case when lead(counter_volume) over w < counter_volume then counter_volume
               when lead(counter_volume) over w is null then counter_volume
               else null
           end as max
     from meter
   window w as (order by timestamp)
select sum(tops) from t;

(t returns the list of the higher values above all, and then we sum them to get max)

I think this is a good proof that this is definitely doable. For such counter, we likely want MAX() - FIRST() (to get the actual number of bytes sent via a NIC, for example ) and this is doable in an SQL query.

Doug Hellmann (doug-hellmann) wrote :

Additional info about finding counter values in ranges at http://tapoueh.org/blog/2012/10/05-reset-counter.html

Doug Hellmann (doug-hellmann) wrote :

We are taking all of our measurements from outside of the VM by asking libvirt (and eventually other hypervisors). Do the counters for disk I/O and bandwidth reset if a VM reboots, or only if the hypervisor itself reboots? If the latter, I think we're safe because either the instance is considered destroyed when the hypervisor reboots or we could add a "sequence id" to the event stream to help us identify the reboot condition.

Julien Danjou (jdanjou) wrote :

Hum, wow adding a sequence ID would help?

Doug Hellmann (doug-hellmann) wrote :

If the agent creates a unique sequence id (like a uuid) each time it starts, that gives us a value that is the same for all events for a given run of the agent. We can then use it to group the query results. The result for a meter would be the total of the max volumes for that meter grouped by resource id and sequence id.

This doesn't work if the agent restarts but the counter doesn't reset, though, so I'm not sure it really helps after all.

John Tran (jtran) wrote :

Would there need to be a solution implemented for mongodb too?

Jay Pipes (jaypipes) wrote :

Are you sure the MySQL solution above actually works?

mysql> desc meter;
| Field | Type | Null | Key | Default | Extra |
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(20) | NO | | NULL | |
| value | int(10) unsigned | NO | | NULL | |
| created | datetime | NO | | NULL | |
4 rows in set (0.00 sec)

mysql> select * from meter;
| id | name | value | created |
| 1 | meter1 | 0 | 2012-11-05 15:18:17 |
| 2 | meter1 | 10 | 2012-11-05 15:18:21 |
| 4 | meter1 | 20 | 2012-11-05 15:18:28 |
| 5 | meter1 | 30 | 2012-11-05 15:18:31 |
| 6 | meter1 | 0 | 2012-11-05 15:18:35 |
| 7 | meter1 | 20 | 2012-11-05 15:18:39 |
| 8 | meter1 | 30 | 2012-11-05 15:18:45 |
7 rows in set (0.00 sec)

mysql> SELECT SUM(IF(COALESCE((SELECT value FROM meter M2 WHERE M2.created>M1.created ORDER BY M2.created ASC LIMIT 1),0), value,0)) FROM meter M1 ORDER BY created ASC;
| SUM(IF(COALESCE((SELECT value FROM meter M2 WHERE M2.created>M1.created ORDER BY M2.created ASC LIMIT 1),0), value,0)) |
| 50 |
1 row in set (0.00 sec)

Jay Pipes (jaypipes) wrote :

To be honest, it is probably going to be much more efficient to handle cumulative counters at the time of collection than at aggregation.

In other words, if it is known that the meter is cumulative and resettable, then logic should be put into the trigger to account for new zero values....

John Tran (jtran) wrote :

Jay, thanks for the input I was thinking the same thing the only caveat is the amount of sql overhead of initiating a API call for every meter record.

Julien Danjou (jdanjou) wrote :

That raises again the issue on how to deal with that when the pollster restart and can't have a clue the counter has been reset.
I really don't think it would solve the problem at the right place.

Julien Danjou (jdanjou) on 2012-12-28
tags: added: effort-xl
Adrian Turjak (adriant-y) wrote :

Is there a reason that ceilometer resets a meter at all? It seems weird that a cumulative meter can be reset unless it is done so manually.

I'm encountering this issue when suspending an instance and tracking cpu time. Would the reset originate in ceilometer or nova? Does the reset even have a purpose or is it accidental?

Harri Hämäläinen (hhamalai) wrote :

The reason is that semantics of Telemetry's "Cumulative counter" promise an increassing sequence of metered values, but there's no implementation to keep that promise when the actual data source resets the metered value to zero. E.g. libvirt will reset the CPU seconds to zero when the machine reboots. Telemetry really should provide this implementation as breaking this promise of cumulative counters will result obscure errors in other applications using the data provided by Telemetry.

Changed in ceilometer:
assignee: nobody → Anirudh Vedantam (anirudh-vedantam)
information type: Public → Public Security
information type: Public Security → Private Security
Julien Danjou (jdanjou) on 2014-03-21
information type: Private Security → Public
ZhaoHangbo (497492840-9) on 2014-12-09
information type: Public → Public Security
information type: Public Security → Private Security
Jeremy Stanley (fungi) on 2014-12-09
information type: Private Security → Public
gordon chung (chungg) on 2015-09-02
Changed in ceilometer:
assignee: Anirudh Vedantam (anirudh-vedantam) → nobody
status: Confirmed → Triaged
董浩 (photodh) on 2015-11-06
Changed in ceilometer:
status: Triaged → Incomplete
assignee: nobody → 董浩 (photodh)
董浩 (photodh) on 2015-11-06
Changed in ceilometer:
assignee: 董浩 (photodh) → nobody
Julien Danjou (jdanjou) on 2015-11-06
Changed in ceilometer:
status: Incomplete → Triaged
Julien Danjou (jdanjou) wrote :

This won't be fixed in Ceilometer, and should be handled by Gnocchi instead.

Changed in ceilometer:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers