Comment 12 for bug 906211

Revision history for this message
Robert Collins (lifeless) wrote : Re: poppy loses connectivity to the auth server

So, when thedac tried manually they didn't get a timeout, it failed instantly. That suggests that it wasn't a tcp timeout but something more nefarious causing the issue.

The short list of potential candidates are:
 - a bug in the LP preparation of the oops_twisted.Config object
 - a bug in oops_twisted.Config.publish
 - a bug in oops_twisted's adapter for non-twisted publishers.

In addition to that:
 - if we were getting socket timeout errors (e.g. due to black-hole firewalling) we should fix that and use a low (0.5->1.0 second timeout)
 - we probably want to do that on all services - e.g. in oops_amqp or even globally.

I think the next actions to take are:
 - reproduce this problem and craft a fix for it. I'm going to reopen this bug accordingly: our amqp code must fail-soft.
 - separately, high priority not critical, lower the socket timeout for amqp connection attempts. Doing that may be sufficient to solve this bug, but as its conceptually a separate issue, I'd start with a separate bug.