Yes. Anytime you see "Retried 1 times", that's a counter increment that failed. By default, Cassandra does not retry counters, since it is not an idempotent operation.
We should resolve this by doing three things:
1. Never retry counters. The result is unpredictable and can lead to over-counting by a lot whenever the Cassandra nodes come under heavy load (like during a compaction).
2. Catch the exceptions and pass on them. We have pycassa wired to statsd, and it should be producing graphs for retries at http://graphite.engineering.canonical.com. If it's not, we need to implement that.
3. Anytime we care about the accuracy of a counter, it should be matched with a column family that uses timeuuids (like the oops identifiers) or something else unique in a wide row. This should be matched with a cron job to count the wide row and repair the counter. See https://bugs.launchpad.net/daisy/+bug/1152206 for more details on this.
Yes. Anytime you see "Retried 1 times", that's a counter increment that failed. By default, Cassandra does not retry counters, since it is not an idempotent operation.
We should resolve this by doing three things:
1. Never retry counters. The result is unpredictable and can lead to over-counting by a lot whenever the Cassandra nodes come under heavy load (like during a compaction). graphite. engineering. canonical. com. If it's not, we need to implement that. /bugs.launchpad .net/daisy/ +bug/1152206 for more details on this.
2. Catch the exceptions and pass on them. We have pycassa wired to statsd, and it should be producing graphs for retries at http://
3. Anytime we care about the accuracy of a counter, it should be matched with a column family that uses timeuuids (like the oops identifiers) or something else unique in a wide row. This should be matched with a cron job to count the wide row and repair the counter. See https:/
Ebay covered this approach while back:
http:// www.ebaytechblo g.com/2012/ 08/14/cassandra -data-modeling- best-practices- part-2/