amqp error looping for reply queue not found

Bug #1477914 reported by JohnsonYi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
oslo.messaging
Won't Fix
Undecided
Unassigned

Bug Description

Environment:
Fuel 6.0 2014.2 based environment with fresh oslo.messaging 1.4.1 from fuel 6.1
3 controller nodes, 2 compute nodes, 3 ceph nodes

RabbitMQ went into a looping on one controller node as below,
tail -f /<email address hidden>

=ERROR REPORT==== 23-Jul-2015::07:41:47 ===
connection <0.14200.0>, channel 1 - soft error:
{amqp_error,not_found,
            "no queue 'reply_2e48d0e4650e4de3a200022406c27dea' in vhost '/'",
            'basic.consume'}

=ERROR REPORT==== 23-Jul-2015::07:41:48 ===
connection <0.14200.0>, channel 1 - soft error:
{amqp_error,not_found,
            "no queue 'reply_2e48d0e4650e4de3a200022406c27dea' in vhost '/'",
            'basic.consume'}

=ERROR REPORT==== 23-Jul-2015::07:41:49 ===
connection <0.14200.0>, channel 1 - soft error:
{amqp_error,not_found,
            "no queue 'reply_2e48d0e4650e4de3a200022406c27dea' in vhost '/'",
            'basic.consume'}
^C
root@node-2:/etc# rabbitmqctl list_queues reply_2e48d0e4650e4de3a200022406c27dea
Listing queues ...
Error: {bad_argument,reply_2e48d0e4650e4de3a200022406c27dea}

When I restart this node, the error switch to another controller node, once it switch to rabbitmq master node, the horizon service for create & delete instance will be unavailable.

I restored the service by command: crm resource restart p_rabbitmq-server

JohnsonYi (yichengli)
affects: nova → rabbitmq
JohnsonYi (yichengli)
affects: rabbitmq → oslo.messaging
Revision history for this message
QingchuanHao (haoqingchuan-28) wrote :

this bug seems to be a race condition that queue.declare is handled before mq delete the.queue.
and the client continous basic.consume on an non-exist queue is definitely raise an.error.
This bug has been fixed in 1.5.1 by using kombu implemented ensure.

Maybe you should use cli like this, rabbitmqctl list_queues |grep replyxxx

Revision history for this message
JohnsonYi (yichengli) wrote :

@QingchuanHao

Thanks for quick response, please check the output below,
root@node-3:/etc/keystone# rabbitmqctl list_queues |grep reply_63b7d1e11a584c4d9cbc53bbed37e14e
root@node-3:

the queue does not exist.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to oslo.messaging (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/252351

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to oslo.messaging (stable/liberty)

Related fix proposed to branch: stable/liberty
Review: https://review.openstack.org/252359

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to oslo.messaging (stable/kilo)

Related fix proposed to branch: stable/kilo
Review: https://review.openstack.org/252361

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/252351
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=daddb82788918296f8b34d6cdeb40d01620fb183
Submitter: Jenkins
Branch: master

commit daddb82788918296f8b34d6cdeb40d01620fb183
Author: Mehdi Abaakouk <email address hidden>
Date: Wed Dec 2 11:38:27 2015 +0100

    Don't hold the connection when reply fail

    This change moves the reply retry code to upper layer
    to be able to release the connection while we wait between
    two retries.

    In the worse scenario, a client waits for more than 30 replies
    and died/restart, the server tries to send this 30 replies to this
    this client and can wait too 60s per replies. During this
    replies for other clients are just stuck.

    This change fixes that.

    Related-bug: #1477914
    Closes-bug: #1521958

    Change-Id: I0d3c16ea6d2c1da143de4924b3be41d1cea159bd

Revision history for this message
Mehdi Abaakouk (sileht) wrote :

1.4.1 is security maintenance only and this issue doesn't exists anymore in newer releases.

Changed in oslo.messaging:
status: New → Won't Fix
Revision history for this message
Mehdi Abaakouk (sileht) wrote :

It's even end of life since yesterday.

Revision history for this message
JohnsonYi (yichengli) wrote :

@Mehdi
Can I merge the fix for kilo to oslo.messaging 1.4.1?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to oslo.messaging (stable/kilo)

Reviewed: https://review.openstack.org/252361
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=53256e990d3632e0120e9a10ede1de9b3b2c9a0a
Submitter: Jenkins
Branch: stable/kilo

commit 53256e990d3632e0120e9a10ede1de9b3b2c9a0a
Author: Mehdi Abaakouk <email address hidden>
Date: Wed Dec 2 11:38:27 2015 +0100

    Don't hold the connection when reply fail

    This change moves the reply retry code to upper layer
    to be able to release the connection while we wait between
    two retries.

    In the worse scenario, a client waits for more than 30 replies
    and died/restart, the server tries to send this 30 replies to this
    this client and can wait too 60s per replies. During this
    replies for other clients are just stuck.

    This change fixes that.

    Related-bug: #1477914
    Closes-bug: #1521958

    (cherry picked from commit I0d3c16ea6d2c1da143de4924b3be41d1cea159bd)

    Conflicts:
     oslo_messaging/_drivers/amqpdriver.py
     oslo_messaging/_drivers/impl_rabbit.py

    Change-Id: I492b82082a372763e60cf06ce0b8135ade7a6e71

tags: added: in-stable-kilo
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to oslo.messaging (stable/liberty)

Reviewed: https://review.openstack.org/252359
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=8504e2508bbec963ae817cc31fec509d058c0e96
Submitter: Jenkins
Branch: stable/liberty

commit 8504e2508bbec963ae817cc31fec509d058c0e96
Author: Mehdi Abaakouk <email address hidden>
Date: Wed Dec 2 11:38:27 2015 +0100

    Don't hold the connection when reply fail

    This change moves the reply retry code to upper layer
    to be able to release the connection while we wait between
    two retries.

    In the worse scenario, a client waits for more than 30 replies
    and died/restart, the server tries to send this 30 replies to this
    this client and can wait too 60s per replies. During this
    replies for other clients are just stuck.

    This change fixes that.

    Related-bug: #1477914
    Closes-bug: #1521958

    (cherry picked from commit I0d3c16ea6d2c1da143de4924b3be41d1cea159bd)

    Conflicts:
     oslo_messaging/_drivers/amqpdriver.py
     oslo_messaging/_drivers/impl_rabbit.py

    Depends-On: Ibce834c3e76d71a770013cf1b469aa86396751b9
    Change-Id: I18144ede387e1d28f7b5de0131b6b6cc7d57bb86

tags: added: in-stable-liberty
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to oslo.messaging (feature/pika)

Related fix proposed to branch: feature/pika
Review: https://review.openstack.org/257373

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to oslo.messaging (feature/pika)
Download full text (39.3 KiB)

Reviewed: https://review.openstack.org/257373
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=cc0f8cc8a9ff25c9fb081cac5366c12a0c06ec53
Submitter: Jenkins
Branch: feature/pika

commit a5d78891745b6b9e5827271dc305f00acae1392f
Author: OpenStack Proposal Bot <email address hidden>
Date: Fri Dec 11 15:24:05 2015 +0000

    Updated from global requirements

    Change-Id: Ifd78016c067740477a82dbe06d74d5944ba91893

commit 17ccb2306d03a74304c57d31716a54ba2b3b4311
Author: Mehdi Abaakouk <email address hidden>
Date: Fri Dec 11 10:59:54 2015 +0100

    Move to debug a too verbose log

    When a client is gone (died/restart) and somes replies cannot be sent because
    the the exchange of this client will never comeback. We log one message per
    reply every 0.25 messages during 60 seconds. When the only useful log
    is the one where we decide to drop this replies.

    This change moves the less important message to debug level.

    Change-Id: I508787c0db4dcec2c0027b89eb4e65c4f98022b9
    Related-bug: #1524418

commit 46daf858144202a072c4bf8580aeafec11d20e13
Author: Davanum Srinivas <email address hidden>
Date: Fri Dec 11 11:04:13 2015 +0300

    Cleanup parameter docstrings

    Change-Id: I301fdd51446bf0c0a6dd0d05b26da0556db8367d

commit 3ee86964fa460882d8fcac8686edd0e6bfb12008
Author: Mehdi Abaakouk <email address hidden>
Date: Wed Dec 9 19:37:40 2015 +0100

    Revert "default of kombu_missing_consumer_retry_timeout"

    This reverts commit 8c03a6db6c0396099e7425834998da5478a1df7c.

    Closes-bug: #1524418
    Change-Id: I35538a6c15d6402272e4513bc1beaa537b0dd7b9

commit e72599435c59c09277a9da7686b32aa4f9df7ba4
Author: Mehdi Abaakouk <email address hidden>
Date: Wed Dec 9 18:49:19 2015 +0100

    Don't trigger error_callback for known exc

    When AMQPDestinationNotFound is raised, we must not
    call the error_callback method. The exception is logged
    only if needed in upper layer (amqpdriver.py).

    Related-bug: #1524418

    Change-Id: Ic1ddec2d13172532dbaa572d04a4c22c97ac4fe7

commit 185693a6ed57e02b2f94b0fb8f14a91471605969
Author: Mehdi Abaakouk <email address hidden>
Date: Wed Dec 9 11:23:52 2015 +0100

    Improves comment

    Change-Id: Idc8002e6d622435aac48304857985c0f82be3e32

commit 148e8380ce1cc4f60716300b95104aaa2cf8c543
Author: Mehdi Abaakouk <email address hidden>
Date: Fri Dec 4 14:57:03 2015 +0100

    Fix reconnection when heartbeat is missed

    When a heartbeat is missing we call ensure_connection()
    that runs a dummy method to trigger the reconnection
    code in kombu. But also the code is triggered only if the
    channel is None.

    In case of the heartbeat threads we didn't reset the channel
    before reconnecting, so the dummy method doesn't do anything.

    This change sets the channel to None to ensure the connection
    is reestablished before the dummy method is run.

    Also it replaces the dummy method by checking the kombu connection
    object. So we are sure the connection is reestablished.

    Change-Id: I39f8cd23c5a5498e6f4c1aa3236ed27f3b5d7c9a
    Closes-bug: #1493890

commit 05002...

tags: added: in-feature-pika
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.