rpc.server do not consume messages after message acknowledge failure

Bug #1448650 reported by QingchuanHao
24
This bug affects 2 people
Affects Status Importance Assigned to Milestone
oslo.messaging
Fix Released
Medium
Mehdi Abaakouk
oslo.messaging (Ubuntu)
Fix Released
High
Unassigned
Trusty
Fix Released
High
Unassigned
Utopic
Won't Fix
High
Unassigned
Vivid
Fix Released
High
Unassigned

Bug Description

def start(self):

    @excutils.forever_retry_uncaught_exceptions
    def _executor_thread():
        try:
         while self._running:
         incoming = self.listener.poll()
         if incoming is not None:
      self._dispatch(incoming)
        except greenlet.GreenletExit:
     return

class Connection did not a lot work to ensure the operation on a connection can recovered after a reconnection. But after we get the incoming message, connection error on message acknowledgement can be raised and caught by the excutils.forever_retry_uncaught_exceptions. At this time, do_consume will be False, which means connection will drain_events acrocss "registering" consumer on the queues. kombu.Connection.drain_events establish a connection instead of raising a connection error.
Kombu related code is listed below.
def drain_events(self, **kwargs):
    return self.transport.drain_events(self.connection, **kwargs)

@property
def connection(self):
    if not self._closed:
        if not self.connected:
            self.declared_entities.clear()
            self._default_channel = None
            self._connection = self._establish_connection()
            self._closed = False
        return self._connection

---------------------------

[Impact]

This patch addresses an issue where the underlying kombu library disconnects from the rabbitmq-servers, which prevents oslo.messaging
from properly going through the reconnect sequence including the recreation of expected queues. This causes messages to be lost and a generally dysfunctional cloud without restarting services.

[Test Case]

Note steps are for trusty-icehouse, including latest oslo.messaging library (1.3.0-0ubuntu1.1 at the time of this writing).

Deploy an OpenStack cloud w/ multiple rabbit nodes and then abruptly kill one of the rabbit nodes (e.g. force panic, etc). Observe that the nova services do detect that the node went down and report that they are reconnected, but messages are still reporting as timed out, nova service-list still reports compute nodes as down, etc.

[Regression Potential]

There is the possibility that there will be more reconnect attempts from the oslo.messaging library if there is a false positive in the underlying kombu connection reported as disconnected. This should be unlikely since this is bringing the oslo.messaging code into sync with the underlying library, but it is a possibility.

[Other Info]

The attempt to drive reconnection logic was fixed in a recent SRU of oslo.messaging (version 1.3.0-0ubuntu1.1). This is an additional fix that is required in order to allow the oslo.messaging library to not go into a zombie-fied connection state.

Changed in oslo.messaging:
assignee: nobody → QingchuanHao (haoqingchuan-28)
Changed in oslo.messaging:
importance: Undecided → Medium
status: New → Confirmed
Changed in oslo.messaging:
assignee: QingchuanHao (haoqingchuan-28) → Mehdi Abaakouk (sileht)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/180059
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=415db68b67368d7c8aa550e7108122200816e665
Submitter: Jenkins
Branch: master

commit 415db68b67368d7c8aa550e7108122200816e665
Author: Mehdi Abaakouk <email address hidden>
Date: Tue May 5 10:29:22 2015 +0200

    rabbit: redeclare consumers when ack/requeue fail

    In case the acknowledgement or requeue of a message fail,
    the kombu transport can be disconnected

    In this case, we must redeclare our consumers.

    This changes fixes that.

    This have no tests because the kombu memory transport we use in our tests
    cannot be in disconnected state.

    Closes-bug: #1448650

    Change-Id: I5991a4cf827411bc27c857561d97461212a17f40

Changed in oslo.messaging:
status: In Progress → Fix Committed
Changed in oslo.messaging:
milestone: none → 1.11.0
status: Fix Committed → Fix Released
description: updated
no longer affects: python-oslo.messaging (Ubuntu)
Revision history for this message
Billy Olsen (billy-olsen) wrote :
Revision history for this message
Billy Olsen (billy-olsen) wrote :
James Page (james-page)
Changed in oslo.messaging (Ubuntu Wily):
status: New → Fix Released
Changed in oslo.messaging (Ubuntu Vivid):
importance: Undecided → High
Changed in oslo.messaging (Ubuntu Trusty):
importance: Undecided → High
Changed in oslo.messaging (Ubuntu Wily):
importance: Undecided → High
Revision history for this message
James Page (james-page) wrote :

trusty and vivid patches reviewed and uploaded for SRU team review.

We're also going to need a fix for utopic, otherwise upgrades to Utopic or Juno from the CA will regress.

Changed in oslo.messaging (Ubuntu Utopic):
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/195688

Revision history for this message
Billy Olsen (billy-olsen) wrote :

Here's the debdiff for utopic/juno. This one also includes the pre-requisite patches in LP #1338732.

Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello QingchuanHao, or anyone else affected,

Accepted oslo.messaging into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/oslo.messaging/1.3.0-0ubuntu1.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in oslo.messaging (Ubuntu Trusty):
status: New → Fix Committed
tags: added: verification-needed
Revision history for this message
Billy Olsen (billy-olsen) wrote :

Verification is done for trusty, but I haven't seen the packages for utopic or vivid yet.

tags: added: verification-done-trusty
Mathew Hodson (mhodson)
tags: removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package oslo.messaging - 1.3.0-0ubuntu1.2

---------------
oslo.messaging (1.3.0-0ubuntu1.2) trusty; urgency=medium

  * Detect when underlying kombu connection to rabbitmq server has been
    disconnected and allow oslo.messaging to go through the reconnect
    logic (LP: #1448650):
    - d/p/redeclare-consumers-when-ack-requeue-fails.patch: redeclare
      consumers when ack/requeue fails.

 -- Billy Olsen <email address hidden> Thu, 25 Jun 2015 09:59:42 +0100

Changed in oslo.messaging (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Scott Kitterman (kitterman) wrote : Update Released

The verification of the Stable Release Update for oslo.messaging has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (stable/juno)

Reviewed: https://review.openstack.org/195688
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=b6b6edca4672cd5a6570c79f0480af286af09386
Submitter: Jenkins
Branch: stable/juno

commit b6b6edca4672cd5a6570c79f0480af286af09386
Author: Mehdi Abaakouk <email address hidden>
Date: Tue May 5 10:29:22 2015 +0200

    rabbit: redeclare consumers when ack/requeue fail

    In case the acknowledgement or requeue of a message fail,
    the kombu transport can be disconnected

    In this case, we must redeclare our consumers.

    This changes fixes that.

    This have no tests because the kombu memory transport we use in our tests
    cannot be in disconnected state.

    Closes-bug: #1448650

    (cherry picked from commit 415db68b67368d7c8aa550e7108122200816e665)

    Conflicts are due to the refactoring to oslo_messaging namespace.

    Conflicts:
     oslo_messaging/_drivers/impl_rabbit.py
     oslo_messaging/tests/drivers/test_impl_rabbit.py

    Change-Id: I5991a4cf827411bc27c857561d97461212a17f40

tags: added: in-stable-juno
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello QingchuanHao, or anyone else affected,

Accepted oslo.messaging into utopic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/oslo.messaging/1.4.1-0ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in oslo.messaging (Ubuntu Utopic):
status: New → Fix Committed
tags: added: verification-needed
Revision history for this message
Chris J Arges (arges) wrote :

Hello QingchuanHao, or anyone else affected,

Accepted oslo.messaging into vivid-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/oslo.messaging/1.8.3-0ubuntu0.15.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in oslo.messaging (Ubuntu Vivid):
status: New → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/213780

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (stable/kilo)

Reviewed: https://review.openstack.org/213780
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=d1e3c38dac2b5586ddbf2defefb421841db5bcf8
Submitter: Jenkins
Branch: stable/kilo

commit d1e3c38dac2b5586ddbf2defefb421841db5bcf8
Author: Mehdi Abaakouk <email address hidden>
Date: Tue May 5 10:29:22 2015 +0200

    rabbit: redeclare consumers when ack/requeue fail

    In case the acknowledgement or requeue of a message fail,
    the kombu transport can be disconnected

    In this case, we must redeclare our consumers.

    This changes fixes that.

    This have no tests because the kombu memory transport we use in our tests
    cannot be in disconnected state.

    Closes-bug: #1448650

    (cherry picked from commit 415db68b67368d7c8aa550e7108122200816e665)

    Conflict is due to refactoring on master branch.

    Conflicts:
        oslo_messaging/_drivers/impl_rabbit.py

    Change-Id: I5991a4cf827411bc27c857561d97461212a17f40

tags: added: in-stable-kilo
Chris J Arges (arges)
Changed in oslo.messaging (Ubuntu Utopic):
status: Fix Committed → Won't Fix
Revision history for this message
Billy Olsen (billy-olsen) wrote :

Completed the verification for the vivid-proposed package.

Deployed a vivid-kilo cloud. Booted several instances, in the middle restarted rabbitmq-server. Verified that the amqp messaging layer gets re-established and instances continue to be created.

tags: added: verification-done-vivid
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package oslo.messaging - 1.8.3-0ubuntu0.15.04.2

---------------
oslo.messaging (1.8.3-0ubuntu0.15.04.2) vivid; urgency=medium

  * Detect when underlying kombu connection to rabbitmq server has been
    disconnected and allow oslo.messaging to go through the reconnect
    logic (LP: #1448650):
    - d/p/redeclare-consumers-when-ack-requeue-fails.patch: redeclare
      consumers when ack/requeue fails.

oslo.messaging (1.8.3-0ubuntu0.15.04.1) vivid; urgency=medium

  * New upstream point release (LP: #1467959):
    - RabbitMQ driver:
      + Adding publisher acknowledgements/confirms for better handling
        of messages during broker shutdown/network failure.
      + Ensure consumer connections closed properly (LP: #1458917).
      + Set timeout on the underlying socket (LP: #1436788).
      + Disable and mark heartbeat as experimental (LP: #1436769).
      + Fix ipv6 support.
    - ZeroMQ driver:
      + Don't raise Timeout on no-matchmaker results (LP: #1186310).
      + Fix issue with Redis not deleting expired keys (LP: #1417464).
      + d/p/Fix-changing-keys-during-iteration-in-matchmaker-hea.patch,
        d/p/Add-pluggability-for-matchmakers.patch: Dropped, included
        upstream.

 -- Billy Olsen <email address hidden> Thu, 25 Jun 2015 09:54:13 +0100

Changed in oslo.messaging (Ubuntu Vivid):
status: Fix Committed → Fix Released
Mathew Hodson (mhodson)
no longer affects: oslo.messaging (Ubuntu Wily)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.