apport transient errors cause retracers to hang

Bug #1310809 reported by Brian Murray
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Daisy
Fix Released
Critical
Brian Murray

Bug Description

David Ames mentioned that the amd64 queues were growing due to the retracers having hung after receiving a "Transient apport error".

This is trivially reproducable by modifying apport-retrace to exit using sys.exit(99) after paring arguments.

Here's a log of the failure:

2014-04-21 21:20:18,310:27635:140503280924416:INFO:root:bb93e072-c99a-11e3-a870-fa163eed9ff8:swift-storage:Retracing bb93e072-c99a-11e3-a870-fa163eed9ff8:swift-storage
2014-04-21 21:20:18,433:27635:140503280924416:INFO:root:bb93e072-c99a-11e3-a870-fa163eed9ff8:swift-storage:Transient apport error.

I submitted a second crash report after this and saw no attempts to retrace it. I then restarted the amd64 retracer (after fixing apport-retrace) and saw this in the log file:

2014-04-21 21:28:28,258:27983:140649046550272:INFO:root:bb93e072-c99a-11e3-a870-fa163eed9ff8:swift-storage:Writing back to Cassandra
2014-04-21 21:28:28,299:27983:140649046550272:INFO:root:bb93e072-c99a-11e3-a870-fa163eed9ff8:swift-storage:Successfully retraced.
2014-04-21 21:28:28,551:27983:140649046550272:INFO:root:bb93e072-c99a-11e3-a870-fa163eed9ff8:swift-storage:Done processing /tmp/tmpU3UJUT-swift.bb93e072-c99a-11e3-a870-fa163eed9ff8.oopsid
2014-04-21 21:28:28,551:27983:140649046550272:INFO:root:bb93e072-c99a-11e3-a870-fa163eed9ff8:swift-storage:swift token: 0ef696d2cac640ccb67938044e4bdf8f
2014-04-21 21:28:29,696:27983:140649046550272:INFO:root:dd2e6a7c-c99a-11e3-a870-fa163eed9ff8:swift-storage:Processing.

So it retraced the crash report that had failed and moved on to the 2nd crash report it had received.

Related branches

Changed in daisy:
importance: Undecided → Critical
Revision history for this message
Brian Murray (brian-murray) wrote :

It hangs because the msg in the queue being processed is never dealt with.

Revision history for this message
Brian Murray (brian-murray) wrote :

revno: 424
fixes bug: https://launchpad.net/bugs/1310809
committer: Brian Murray <email address hidden>
branch nick: daisy
timestamp: Mon 2014-04-21 15:47:04 -0700
message:
  retracer.py: ensure that retracing failures due to transient apport errors do not cause the retracer to hang
diff:
=== modified file 'daisy/retracer.py'
--- daisy/retracer.py 2014-04-11 17:27:54 +0000
+++ daisy/retracer.py 2014-04-21 22:47:04 +0000
@@ -569,8 +569,10 @@
             if proc.returncode != 0:
                 if proc.returncode == 99:
                     # Transient apt error, like "failed to fetch ... size
- # mismatch" Throw back onto the queue by not ack'ing it.
+ # mismatch"
                     log('Transient apport error.')
+ # Throw back onto the queue
+ msg.channel.basic_reject(msg.delivery_tag, True)
                     return
                 # apport-retrace will exit 0 even on a failed retrace unless
                 # something has gone wrong at a lower level, as was the case

Changed in daisy:
assignee: nobody → Brian Murray (brian-murray)
status: New → In Progress
status: In Progress → Fix Committed
Changed in daisy:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.