Repair overcounts

Bug #1152206 reported by Evan
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Daisy
Confirmed
High
Unassigned

Bug Description

In a number of places we're now retrying counter mutations. This was seen as being better than failing hard and the client sending all the data again, causing disruption to column families beyond the counter-based CF that was being incremented.

This is actually okay so long as we do two things:
1) Keep an equal number of columns somewhere else in the database. For example, we have DayBuckets and DayBucketsCount, the latter being the length of each row in DayBuckets.
2) We periodically repair the potentially-overcounted column families by processing the columns from #1 into the counters.

Fixing this will address one of the two causes of OOPSes on daisy.ubuntu.com at present, a MaximumRetryException ("Retried 1 times. Last failure was TTransportException: TSocket read 0 bytes"). The other cause is reports with signatures larger than 65535, which is fixed in daisy trunk.

Ebay talks a bit about the repair approach in their Cassandra data modelling blog series:
http://www.ebaytechblog.com/2012/08/14/cassandra-data-modeling-best-practices-part-2/

Evan (ev)
description: updated
Changed in daisy:
importance: Undecided → High
status: New → Confirmed
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.