gnocchi-metricd cannot process new metrics on stable/2.0

Bug #1548909 reported by Nicolas Vila
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gnocchi
Fix Released
Low
Julien Danjou
2.0
Fix Committed
Low
Julien Danjou

Bug Description

Gnocchi-metricd fails to process new measures for metrics due to an overflowerror:

2016-02-23 16:56:20.336 27783 DEBUG gnocchi.storage._carbonara [-] Retrieve measuresfor 02a45193-04ff-423a-9b88-778a7cfdec0f/std/86400.0 in 0.01s _add_measures /usr/local/lib/python2.7/dist-packages/gnocchi/storage/_carbonara.py:210
2016-02-23 16:56:20.331 27785 DEBUG gnocchi.storage._carbonara [-] Store measures for 28618f18-9e84-4ef7-ac76-c1120d7a14d7/count/300.0 in 0.02s _add_measures /usr/local/lib/python2.7/dist-packages/gnocchi/storage/_carbonara.py:219
2016-02-23 16:56:20.344 27785 DEBUG gnocchi.storage._carbonara [-] Expire measures for 28618f18-9e84-4ef7-ac76-c1120d7a14d7/count/300.0 in 0.00s _add_measures /usr/local/lib/python2.7/dist-packages/gnocchi/storage/_carbonara.py:230
2016-02-23 16:56:20.345 27785 DEBUG gnocchi.storage._carbonara [-] Store measures for 28618f18-9e84-4ef7-ac76-c1120d7a14d7/count/1440.0 in 0.01s _add_measures /usr/local/lib/python2.7/dist-packages/gnocchi/storage/_carbonara.py:219
2016-02-23 16:56:20.347 27785 DEBUG gnocchi.storage._carbonara [-] Expire measures for 28618f18-9e84-4ef7-ac76-c1120d7a14d7/count/1440.0 in 0.00s _add_measures /usr/local/lib/python2.7/dist-packages/gnocchi/storage/_carbonara.py:230
2016-02-23 16:56:20.350 27783 DEBUG gnocchi.storage._carbonara [-] Retrieve measuresfor 02a45193-04ff-423a-9b88-778a7cfdec0f/std/604800.0 in 0.02s _add_measures /usr/local/lib/python2.7/dist-packages/gnocchi/storage/_carbonara.py:210
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara [-] Error processing new measures
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara Traceback (most recent call last):
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara File "/usr/local/lib/python2.7/dist-packages/gnocchi/storage/_carbonara.py", line 405, in process_measures
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara ignore_too_old_timestamps=True)
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara File "/usr/local/lib/python2.7/dist-packages/gnocchi/carbonara.py", line 223, in set_values
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara before_truncate_callback(self)
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara File "/usr/local/lib/python2.7/dist-packages/gnocchi/storage/_carbonara.py", line 398, in _map_add_measures
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara for aggregation in agg_methods
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara File "/usr/local/lib/python2.7/dist-packages/gnocchi/storage/_carbonara.py", line 502, in _map_in_thread
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara return list(executor.map(lambda args: method(*args), list_of_args))
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/_base.py", line 581, in result_iterator
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara yield future.result()
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/_base.py", line 398, in result
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara return self.__get_result()
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/thread.py", line 55, in run
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara result = self.fn(*self.args, **self.kwargs)
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara File "/usr/local/lib/python2.7/dist-packages/gnocchi/storage/_carbonara.py", line 502, in <lambda>
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara return list(executor.map(lambda args: method(*args), list_of_args))
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara File "/usr/local/lib/python2.7/dist-packages/gnocchi/storage/_carbonara.py", line 227, in _add_measures
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara oldest_point_to_keep)
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara File "/usr/local/lib/python2.7/dist-packages/gnocchi/storage/_carbonara.py", line 263, in _delete_metric_measures_before
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara timestamp, granularity)
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara File "/usr/local/lib/python2.7/dist-packages/gnocchi/carbonara.py", line 315, in get_split_key
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara cls.get_split_key_datetime(timestamp, sampling))
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara File "/usr/local/lib/python2.7/dist-packages/gnocchi/carbonara.py", line 303, in get_split_key_datetime
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara timestamp, freq=sampling * cls.POINTS_PER_SPLIT * 10e8)
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara File "/usr/local/lib/python2.7/dist-packages/gnocchi/carbonara.py", line 145, in _round_timestamp
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara (pandas.Timestamp(ts).value // freq) * freq)
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara File "pandas/tslib.pyx", line 299, in pandas.tslib.Timestamp.__new__ (pandas/tslib.c:8973)
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara File "pandas/tslib.pyx", line 1184, in pandas.tslib.convert_to_tsobject (pandas/tslib.c:22829)
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara File "pandas/tslib.pyx", line 3283, in pandas.tslib.cast_from_unit (pandas/tslib.c:56150)
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara OverflowError: Python int too large to convert to C long
2016-02-23 16:56:20.363 27785 ERROR gnocchi.storage._carbonara
2016-02-23 16:56:20.365 27783 DEBUG gnocchi.storage._carbonara [-] Store measures for 02a45193-04ff-423a-9b88-778a7cfdec0f/std/86400.0 in 0.01s _add_measures /usr/local/lib/python2.7/dist-packages/gnocchi/storage/_carbonara.py:219

The exact same host, using the same configuration and backend can process new measures if I deploy stable/1.3.

Revision history for this message
gordon chung (chungg) wrote :

just some notes:

this happens on an storage backend that hasn't been upgraded. there is a random entry of

DatetimeIndex(['2016-02-23 19:36:00', '2016-02-23 20:00:00'], dtype='datetime64[ns]', freq=None)

that is passed into _round_timestamp as it's timestamp.

gordon chung (chungg)
Changed in gnocchi:
status: New → Confirmed
Revision history for this message
Nicolas Vila (nvlan) wrote :

After running gnocchi-upgrade without errors, the API is working as expected, however gnocchi-metricd is still unable to process new measures: http://paste.openstack.org/show/488260/

Gnocchi version on both API and metricd is 2.0.1.dev1.

Revision history for this message
Nicolas Vila (nvlan) wrote :

I did a fresh install, with a new database, a new ceph pool, no prior measures and no upgrade involved. We have the same behavior, with the same OverflowError.

Revision history for this message
gordon chung (chungg) wrote :

seems related to this patch https://github.com/openstack/gnocchi/commit/7d67957e4503c3086d8fd113a10d22187eb5187a

will debug a bit more... but seems like expiry is broken.

Revision history for this message
gordon chung (chungg) wrote :

i'm not sure why you have a timeserie passed in. from what i can tell, this shouldn't be possible.

regardless, i'm also curious how your new db/pool is handling a timeseries that has data from feb 19th to feb 25th. how did it get a week's worth of data if it's new?

Revision history for this message
Nicolas Vila (nvlan) wrote :

Hello Gordon,

The logs that show data from feb 19 to feb 25 is with gnocchi 2.0.1.dev1 and the existing pool, if you need I can attach the log from gnocchi with the empty pool as well. Please advice.

Thanks, regards.

Revision history for this message
Nicolas Vila (nvlan) wrote :

Hello Gordon,

Here's a fresh log with a new pool using 2.0.1: https://gist.github.com/nvlan/2059ad499a8eff6b2d54

Revision history for this message
gordon chung (chungg) wrote :

when we split, we run a groupby which sends index to round_timestamp rather than the individual timestamp

https://github.com/openstack/gnocchi/blob/master/gnocchi/carbonara.py#L318

probably affects

https://github.com/openstack/gnocchi/blob/master/gnocchi/carbonara.py#L431

Changed in gnocchi:
status: Confirmed → Triaged
importance: Undecided → High
Revision history for this message
gordon chung (chungg) wrote :

hmm... i just tried this and it actually works as it should. groupby should take the function and apply it to each element of index. for some reason, when i was looking on your server, it was applying it against the index of series rather than each index.

maybe add

LOG.debug('type: %s, index: %s', type(self.ts), self.ts.index.values)

before https://github.com/openstack/gnocchi/blob/master/gnocchi/carbonara.py#L318

Revision history for this message
Nicolas Vila (nvlan) wrote :

Hello Gordon,

Here's the log with the added index logging: https://gist.github.com/nvlan/5e3e7f184fde2480508c

Thanks, regards.

Revision history for this message
gordon chung (chungg) wrote :
Download full text (5.1 KiB)

note to self (and others). this error is caused by the following policies:

+---------+-------------+-----------------------------------------------------------------------------+------------------------------------------------+
| name | back_window | definition | aggregation_methods |
+---------+-------------+-----------------------------------------------------------------------------+------------------------------------------------+
| default | 0 | - points: 288, granularity: 0:05:00, timespan: 1 day, 0:00:00 | std, count, 95pct, min, max, sum, median, mean |
| | | - points: 420, granularity: 0:24:00, timespan: 7 days, 0:00:00 | |
| | | - points: 240, granularity: 2:48:00, timespan: 28 days, 0:00:00 | |
| | | - points: 720, granularity: 12:00:00, timespan: 360 days, 0:00:00 | |
| | | - points: 720, granularity: 1:00:00, timespan: 30 days, 0:00:00 | |
| | | - points: 360, granularity: 1 day, 0:00:00, timespan: 360 days, 0:00:00 | |
| | | - points: 102, granularity: 7 days, 0:00:00, timespan: 714 days, 0:00:00 | |
| | | - points: 60, granularity: 30 days, 0:00:00, timespan: 1800 days, 0:00:00 | |
| | | - points: 60, granularity: 90 days, 0:00:00, timespan: 5400 days, 0:00:00 | |
| | | - points: 60, granularity: 180 days, 0:00:00, timespan: 10800 days, 0:00:00 | |
| | | - points: 60, granularity: 360 days, 0:00:00, timespan: 21600 days, 0:00:00 | |
+---------+-------------+-----------------------------------------------------------------------------+------------------------------------------------+

it's probably the 60 year timestamp+---------+-------------+-----------------------------------------------------------------------------+------------------------------------------------+
| name | back_window | definition | aggregation_methods |
+---------+-------------+-----------------------------------------------------------------------------+------------------------------------------------+
| default | 0 | - points: 288, granularity: 0:05:00, timespan: 1 day, 0:00:00 | std, count, 95pct, min, max, sum, median, mean |
| | | - points: 420, granularity: 0:24:00, timespan: 7 days, 0:00:00 | |
| | | - ...

Read more...

Changed in gnocchi:
importance: High → Low
Revision history for this message
gordon chung (chungg) wrote :

i think any policy that goes back farther than epoch time will cause issues (my hypothesis). we may want to cut of policies based on that. or 5 years. i'm guessing 5 year of data will cause mayhem regardless.

Revision history for this message
Nicolas Vila (nvlan) wrote :

Hello Gordon,

We're using this new archive policy:

root@rack01-nnsm01:~# gnocchi archive-policy list
+---------+-------------+---------------------------------------------------------------------------+------------------------------------------------+
| name | back_window | definition | aggregation_methods |
+---------+-------------+---------------------------------------------------------------------------+------------------------------------------------+
| default | 0 | - points: 288, granularity: 0:05:00, timespan: 1 day, 0:00:00 | std, count, 95pct, min, max, sum, median, mean |
| | | - points: 420, granularity: 0:24:00, timespan: 7 days, 0:00:00 | |
| | | - points: 240, granularity: 2:48:00, timespan: 28 days, 0:00:00 | |
| | | - points: 720, granularity: 12:00:00, timespan: 360 days, 0:00:00 | |
| | | - points: 720, granularity: 1:00:00, timespan: 30 days, 0:00:00 | |
| | | - points: 360, granularity: 1 day, 0:00:00, timespan: 360 days, 0:00:00 | |
| | | - points: 102, granularity: 7 days, 0:00:00, timespan: 714 days, 0:00:00 | |
| | | - points: 60, granularity: 30 days, 0:00:00, timespan: 1800 days, 0:00:00 | |
| | | - points: 60, granularity: 90 days, 0:00:00, timespan: 5400 days, 0:00:00 | |
+---------+-------------+---------------------------------------------------------------------------+------------------------------------------------+

With it, metricd is able to process measures without showing the overflow error.

Thanks!
Kind regards.

Revision history for this message
Julien Danjou (jdanjou) wrote :

Yeah the root of the problem is that you can't have points before epoch. We store unsigned integer from epoch for timestamps, so that won't work. I'm going to add some sanity check on that.

Changed in gnocchi:
assignee: nobody → Julien Danjou (jdanjou)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to gnocchi (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/289398

Changed in gnocchi:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to gnocchi (master)

Fix proposed to branch: master
Review: https://review.openstack.org/289399

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on gnocchi (master)

Change abandoned by Julien Danjou (<email address hidden>) on branch: master
Review: https://review.openstack.org/289398

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to gnocchi (master)

Reviewed: https://review.openstack.org/289399
Committed: https://git.openstack.org/cgit/openstack/gnocchi/commit/?id=ac2e09eb48fc9e1883f5db1c3ba4cc3c9ef8b0bd
Submitter: Jenkins
Branch: master

commit ac2e09eb48fc9e1883f5db1c3ba4cc3c9ef8b0bd
Author: Julien Danjou <email address hidden>
Date: Mon Mar 7 15:37:00 2016 +0100

    rest: make sure all timestamps are after Epoch

    Change-Id: I95e8b0a1a86fe886a13a7ae96eb70e7a7343e78f
    Closes-Bug: #1548909

Changed in gnocchi:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to gnocchi (stable/2.0)

Fix proposed to branch: stable/2.0
Review: https://review.openstack.org/290601

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to gnocchi (stable/2.0)

Reviewed: https://review.openstack.org/290601
Committed: https://git.openstack.org/cgit/openstack/gnocchi/commit/?id=cb956e295e6b6e9b30e4a5d150232f38285a88f3
Submitter: Jenkins
Branch: stable/2.0

commit cb956e295e6b6e9b30e4a5d150232f38285a88f3
Author: Julien Danjou <email address hidden>
Date: Mon Mar 7 15:37:00 2016 +0100

    rest: make sure all timestamps are after Epoch

    Change-Id: I95e8b0a1a86fe886a13a7ae96eb70e7a7343e78f
    Closes-Bug: #1548909
    (cherry picked from commit ac2e09eb48fc9e1883f5db1c3ba4cc3c9ef8b0bd)

Julien Danjou (jdanjou)
Changed in gnocchi:
milestone: none → 2.1.0
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.