data corruption errors when getting unaggregated

Bug #1676519 reported by gordon chung on 2017-03-27
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fix Released
gordon chung

Bug Description

i am sending individual timestamp+value posts across 20K metrics over an hour with minute granularity.
if i stop and start a metricd agent, i get quite a few data corruption errors when i start it back up. this only seems to happen after a restart. still debugging why

2017-03-27 13:10:52.890 21207 ERROR [-] Data corruption detected for 56106987-85b6-4db3-8224-86872d6e62f0 unaggregated timeserie

gordon chung (chungg) wrote :

seems related to new lz4. prior to 0.9.0, it allowed compression/decompression of empty value to be valid. for example, lz4.loads(lz4.dumps(b'')) works in 0.8.2 but in 0.9.0, it will throw an error instead.

in theory, it actually is an error that unaggregated object is empty... i imagine it's because metricd is killed in process of initial aggregation and saving of unaggregated.

Fix proposed to branch: master

Changed in gnocchi:
assignee: nobody → gordon chung (chungg)
status: New → In Progress

Submitter: Jenkins
Branch: master

commit 4ac9d53383b0db8fd07f8073df63d61334e22cd6
Author: gord chung <email address hidden>
Date: Mon Mar 27 19:54:57 2017 +0000

    don't raise error if unaggregated empty

    new lz4 library doesn't like handling empty binary. if we kill
    agent during computation of aggregates, the unaggregated object
    might have been created (in ceph/redis case) but it may not have saved
    unaggregated measures leaving the object blank.

    this patch returns None and let's workflow proceed as if new if
    object is empty since in scenario above, the original raw measures
    will not have been cleared from unprocessed so they will still be
    processed again.

    also, fixes redis issue where passing in None makes the redis actually
    store 'None'.

    Change-Id: I358e50ccadff721348630688c47544db6553e96b
    Closes-Bug: #1676519

Changed in gnocchi:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers