Daisy

setup a counter for the size of the retracing index table

Bug #1331212 reported by Brian Murray on 2014-06-17

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Daisy	Triaged	Medium	Unassigned

Bug Description

I recently (r459, r460) fixed an issue with submit_core.py where a core file could be written to swift but not written to the rabbit queue. (This happened because the connection to rabbit wasn't established and no attempts were made to retry it. This resulted in a "IOError: Socket closed" traceback which can be found in the OOPS reports for the error tracker.) So its possible that there are core files in swift that will never be retraced because they aren't in the queue.

https://oops.canonical.com/static-reports/WHOOPSIE-PROD-2014-06-12.html

We can see 535 of these on the 12th of June. These OOPSes should exist in the OOPS CF and in the retracing index, so we should be able to update them. I think it'd be worthwhile to go through the "IOError: Socket closed" tracebacks since the changeover to the DSE temp ring and check to see if they are still in the retracing index (they may have been removed if a crash with the same SAS was found).

If they are in the retracing index then we should readd them to the retracing queue for their arch in rabbit.

If they are not in the retracing index and have been bucketed in a problem then we should remove the core file from swift.

Revision history for this message

Evan (ev) wrote on 2014-06-18:

That sounds workable. What do you propose for handling Rabbit failures in the future? Should we just eat the exception and discard the core file, knowing we'll get more until we receive and write a complete one? Or perhaps we should write to a CF in Cassandra that can be read from by the retracers as well?

Revision history for this message

Brian Murray (brian-murray) wrote on 2014-06-19:

Regardless of whether there is or isn't a rabbit failure it is written to the retracing indexes column family. So we could just use that as a source instead of the list of core files in swift.

I think it'd be good to setup a counter or metric for the retracing index so we can monitor whether or not it is greater than the length of the retracing queues. If it is greater than we have some items to readd to the rabbit retracing queue.

Changed in daisy:
importance:	Undecided → Medium
status:	New → Triaged

Revision history for this message

Evan (ev) wrote on 2014-06-25:

Yeah, that sounds entirely sensible. You might be able to get that data out of JMX.

Brian Murray (brian-murray) on 2014-11-20

summary:

- more core files in swift than in the retracing queue
+ setup a counter for the size of the retracing queue

Brian Murray (brian-murray) on 2016-06-01

summary:

- setup a counter for the size of the retracing queue
+ setup a counter for the size of the retracing index table

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.