nova-compute do not hold connection for rpcs erver after reconnecting

Bug #1408830 reported by QingchuanHao
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
oslo.messaging
Fix Released
Undecided
Mehdi Abaakouk

Bug Description

nova-compute service starts with as rpc server for queues compute, compute.host, compute_fanout_uuid ,normally. But in my environment, I found no consumers of compute queue, and netstat showed 4 connections. One is for reply queue, two maybe for publishing, last one for nothing without channel, as I found in the rabbitmq web management UI.

I can not reproduce the problem, but our test colleagues can drive me crazy.

Revision history for this message
QingchuanHao (haoqingchuan-28) wrote :

In kilo oslo.messaing, the reconnect is implemented in kombu, autoretry. Continous connection error will raise the exception and the error_callback will never be called, which results in a connection having no consumer. THe higher version Kombu fixed the problem by catching continous connection error.
def ensure(self, obj, fun, errback=None, max_retries=None,
           interval_start=1, interval_step=1, interval_max=1,
           on_revive=None):
    def _ensured(*args, **kwargs):
            got_connection = 0
        for retries in count(0): # for infinity
            try:
                return fun(*args, **kwargs)
            except recoverable_connection_errors as exc:
                if got_connection:
                    raise
                ......

Revision history for this message
Mehdi Abaakouk (sileht) wrote :

Can you tell which version of kombu fix that ?

Changed in oslo.messaging:
status: New → Triaged
Revision history for this message
QingchuanHao (haoqingchuan-28) wrote :

It is fixed in kombu 3.0.0 by the bug fix:
- Transports may now distinguish between recoverable and irrecoverable
  connection and channel errors.

def _ensured(*args, **kwargs):
        got_connection = 0
        conn_errors = self.recoverable_connection_errors
        chan_errors = self.recoverable_channel_erro
        has_modern_errors = hasattr(
                self.transport, 'recoverable_connection_errors',
            )
        for retries in count(0): # for infinity
            try:
                return fun(*args, **kwargs)
            except conn_errors as exc:
                if got_connection and not has_modern_errors:
                    # transport can not distinguish between
                    # recoverable/irrecoverable errors, so we propagate
                    # the error if it persists after a new connection was
                    # successfully established.
                    raise
                    ... ...

Mehdi Abaakouk (sileht)
Changed in oslo.messaging:
assignee: nobody → Mehdi Abaakouk (sileht)
milestone: none → next-liberty
Changed in oslo.messaging:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/179356
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=0c954cffa2f3710acafa79f01b958a8955823640
Submitter: Jenkins
Branch: master

commit 0c954cffa2f3710acafa79f01b958a8955823640
Author: Mehdi Abaakouk <email address hidden>
Date: Fri May 1 13:27:15 2015 +0200

    Bump kombu and amqp requirements

    We at least need these versions of amqp and kombu to have
    a working heartbeat support.

    Related-bug: #1436788
    Closes-bug: #1436769
    Closes-bug: #1408830

    Change-Id: I61440c5ccf2b540fe9a1e868bdcae9f5d2cf8422

Changed in oslo.messaging:
status: In Progress → Fix Committed
Changed in oslo.messaging:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.