release package counter's ttl does not work

Bug #1154356 reported by Brian Murray
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Daisy
Triaged
Medium
Unassigned

Bug Description

We have some data from 20130117 and 20130118 in the counters column family for a release and sourcepackage:

[default@crashdb] get Counters[utf8('Ubuntu 12.10:software-center')];
=> (counter=20130117, value=983)
=> (counter=20130118, value=123)
=> (counter=20130226, value=1670)

This should not be there though as we set a time to live of 4 weeks for the data. From submit.py:

def update_release_pkg_counter(counters_fam, release, src_package, date):
    # only store four weeks worth of data
    time_to_live = 60*60*24*28
    counters_fam.insert('%s:%s' % (release, src_package), {date: 1},
        ttl=time_to_live)

Come to find out that ttl is not supported with counters. See: http://www.datastax.com/dev/blog/whats-new-in-cassandra-0-8-part-2-counters and the related issue.

We should definitely remove the code in submit.py that sets the ttl. Additionally, since we only need 2 weeks of data (using 4 just in case) perhaps we should have a job that deletes anything more than that.

Changed in daisy:
importance: Undecided → Medium
Revision history for this message
Evan (ev) wrote :

Yeah, I think a daily cron job would work fine here. I think the safest approach would be a range scan (counters.get(column_finish=fourweeksago)) that then iterated over the columns and called counter_remove one-by-one. This way if we happened to timeout or otherwise fail when running this cron job, the next run of it would clean things up.

Changed in daisy:
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.