Uncaught qpid error can break a consumer

Bug #1303890 reported by Russell Bryant
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Low
Ihar Hrachyshka
Havana
Fix Released
Undecided
Unassigned
Icehouse
Fix Released
Low
Ihar Hrachyshka
oslo-incubator
Fix Released
Low
Russell Bryant
Havana
Fix Committed
Low
Russell Bryant
Icehouse
Fix Committed
Low
Russell Bryant
oslo.messaging
Fix Released
Medium
Russell Bryant

Bug Description

The following exception was originally observed against the old rpc code, but the same problem exists in oslo.messaging.

 Traceback (most recent call last):
   File "/usr/lib/python2.6/site-packages/nova/openstack/common/excutils.py", line 78, in inner_func
     return infunc(*args, **kwargs)
   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 698, in _consumer_thread
     self.consume()
   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 689, in consume
     it.next()
   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 606, in iterconsume
     yield self.ensure(_error_callback, _consume)
   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 540, in ensure
     return method(*args, **kwargs)
   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/impl_qpid.py", line 597, in _consume
     nxt_receiver = self.session.next_receiver(timeout=timeout)
   File "<string>", line 6, in next_receiver
   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 665, in next_receiver
     if self._ecwait(lambda: self.incoming, timeout):
   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait
     result = self._ewait(lambda: self.closed or predicate(), timeout)
   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 571, in _ewait
     result = self.connection._ewait(lambda: self.error or predicate(), timeout)
   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 214, in _ewait
     self.check_error()
   File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 207, in check_error
     raise self.error
 InternalError: Traceback (most recent call last):
   File "/usr/lib/python2.6/site-packages/qpid/messaging/driver.py", line 667, in write
     self._op_dec.write(*self._seg_dec.read())
   File "/usr/lib/python2.6/site-packages/qpid/framing.py", line 269, in write
     if self.op.headers is None:
 AttributeError: 'NoneType' object has no attribute 'headers'

It's possible for something to put the qpid client into a bad state. In particular, I have observed a case that will cause session.next_receiver() to immediately raise an InternalError. This exception makes it all the way out. If the eventlet executor is used, the forever_retry_uncaught_exceptions() decorator will get hit. It will go back into this code and get the same error, stuck in an infinite loop of retrying.

The connection needs to be reset in this case to recover.

Changed in oslo.messaging:
assignee: nobody → Russell Bryant (russellb)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (master)

Fix proposed to branch: master
Review: https://review.openstack.org/85750

Changed in oslo.messaging:
status: New → In Progress
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/85750
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=bb6f3f11f3d65e104faa8dc4ee1b880d0c6394a4
Submitter: Jenkins
Branch: master

commit bb6f3f11f3d65e104faa8dc4ee1b880d0c6394a4
Author: Russell Bryant <email address hidden>
Date: Mon Apr 7 11:59:24 2014 -0400

    Update ensure()/reconnect() to catch MessagingError

    The error handling code that gets connections reset if necessary
    caught ConnectionError. It really needs to catch MessagingError,
    which ConnectionError inherits from. There are other types of
    MessagingErrors that may occur, such as InternalError, and they need
    to cause the connection to reset, as well.

    Closes-bug: #1303890
    Change-Id: Ic5082b74a362ded8b35cbc75cf178fe6e0db62d0

Changed in oslo.messaging:
status: In Progress → Fix Committed
Changed in oslo:
assignee: nobody → Russell Bryant (russellb)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo-incubator (master)

Fix proposed to branch: master
Review: https://review.openstack.org/86368

Changed in oslo:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo-incubator (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/86370

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo-incubator (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/86371

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo-incubator (stable/icehouse)

Reviewed: https://review.openstack.org/86370
Committed: https://git.openstack.org/cgit/openstack/oslo-incubator/commit/?id=4c738572b2773a822639de9a757e982204360774
Submitter: Jenkins
Branch: stable/icehouse

commit 4c738572b2773a822639de9a757e982204360774
Author: Russell Bryant <email address hidden>
Date: Wed Apr 9 11:32:44 2014 -0400

    Update ensure()/reconnect() to catch MessagingError

    The error handling code that gets connections reset if necessary
    caught ConnectionError. It really needs to catch MessagingError,
    which ConnectionError inherits from. There are other types of
    MessagingErrors that may occur, such as InternalError, and they need
    to cause the connection to reset, as well.

    This fix has already been merged into oslo.messaging.

    Closes-bug: #1303890
    Change-Id: Ic5082b74a362ded8b35cbc75cf178fe6e0db62d0
    (cherry picked from commit 234f64d608266f43d8856ff98c89ceba6699d752)

tags: added: in-stable-icehouse
Changed in oslo:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo-incubator (master)

Reviewed: https://review.openstack.org/86368
Committed: https://git.openstack.org/cgit/openstack/oslo-incubator/commit/?id=234f64d608266f43d8856ff98c89ceba6699d752
Submitter: Jenkins
Branch: master

commit 234f64d608266f43d8856ff98c89ceba6699d752
Author: Russell Bryant <email address hidden>
Date: Wed Apr 9 11:32:44 2014 -0400

    Update ensure()/reconnect() to catch MessagingError

    The error handling code that gets connections reset if necessary
    caught ConnectionError. It really needs to catch MessagingError,
    which ConnectionError inherits from. There are other types of
    MessagingErrors that may occur, such as InternalError, and they need
    to cause the connection to reset, as well.

    This fix has already been merged into oslo.messaging.

    Closes-bug: #1303890
    Change-Id: Ic5082b74a362ded8b35cbc75cf178fe6e0db62d0

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/86940

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo-incubator (stable/havana)

Reviewed: https://review.openstack.org/86371
Committed: https://git.openstack.org/cgit/openstack/oslo-incubator/commit/?id=fae88f6805f6a16a60c23da67ce06cc0c78a706d
Submitter: Jenkins
Branch: stable/havana

commit fae88f6805f6a16a60c23da67ce06cc0c78a706d
Author: Russell Bryant <email address hidden>
Date: Wed Apr 9 11:32:44 2014 -0400

    Update ensure()/reconnect() to catch MessagingError

    The error handling code that gets connections reset if necessary
    caught ConnectionError. It really needs to catch MessagingError,
    which ConnectionError inherits from. There are other types of
    MessagingErrors that may occur, such as InternalError, and they need
    to cause the connection to reset, as well.

    This fix has already been merged into oslo.messaging.

    Closes-bug: #1303890
    Change-Id: Ic5082b74a362ded8b35cbc75cf178fe6e0db62d0
    (cherry picked from commit 234f64d608266f43d8856ff98c89ceba6699d752)

tags: added: in-stable-havana
Alan Pevec (apevec)
tags: removed: in-stable-havana in-stable-icehouse
Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Fix merged to oslo.messaging (stable/icehouse)

Reviewed: https://review.openstack.org/86940
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=f61a889786ad0be04bb3dcf5cffb3c659c2a8cef
Submitter: Jenkins
Branch: stable/icehouse

commit f61a889786ad0be04bb3dcf5cffb3c659c2a8cef
Author: Russell Bryant <email address hidden>
Date: Mon Apr 7 11:59:24 2014 -0400

    Update ensure()/reconnect() to catch MessagingError

    The error handling code that gets connections reset if necessary
    caught ConnectionError. It really needs to catch MessagingError,
    which ConnectionError inherits from. There are other types of
    MessagingErrors that may occur, such as InternalError, and they need
    to cause the connection to reset, as well.

    Closes-bug: #1303890
    Change-Id: Ic5082b74a362ded8b35cbc75cf178fe6e0db62d0
    (cherry picked from commit bb6f3f11f3d65e104faa8dc4ee1b880d0c6394a4)

Thierry Carrez (ttx)
Changed in neutron:
status: New → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/icehouse)

Reviewed: https://review.openstack.org/90096
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c5040b4e0bcfcb506be2b4100ec8e0a0e5436014
Submitter: Jenkins
Branch: stable/icehouse

commit c5040b4e0bcfcb506be2b4100ec8e0a0e5436014
Author: Russell Bryant <email address hidden>
Date: Wed Apr 9 11:32:44 2014 -0400

    Update ensure()/reconnect() to catch MessagingError

    The error handling code that gets connections reset if necessary
    caught ConnectionError. It really needs to catch MessagingError,
    which ConnectionError inherits from. There are other types of
    MessagingErrors that may occur, such as InternalError, and they need
    to cause the connection to reset, as well.

    This fix has already been merged into oslo.messaging.

    --

    Cherry-picked from oslo-incubator 234f64d608266f43d8856ff98c89ceba6699d752
    See also https://bugzilla.redhat.com/show_bug.cgi?id=1086077

    Closes-bug: #1303890
    Change-Id: Ic5082b74a362ded8b35cbc75cf178fe6e0db62d0
    (cherry picked from commit 9a830b370551019a4bd3a0c7504f48961e755bd4)

Kyle Mestery (mestery)
Changed in neutron:
milestone: none → juno-1
importance: Undecided → Low
assignee: nobody → Ihar Hrachyshka (ihar-hrachyshka)
Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Changed in oslo:
milestone: none → juno-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in oslo.messaging:
milestone: none → juno-1
status: Fix Committed → Fix Released
Mark McLoughlin (markmc)
Changed in oslo.messaging:
importance: Undecided → Medium
Alan Pevec (apevec)
Changed in oslo-incubator:
importance: Undecided → Low
Thierry Carrez (ttx)
Changed in neutron:
milestone: juno-1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.