Cinder loses rabbitmq exchanges and get stuck in reply loop

Bug #1457055 reported by Denis Meltsaykin on 2015-05-20
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
High
Alexey Khivin
6.0.x
High
Alexey Khivin
6.1.x
High
Alexey Khivin
7.0.x
High
Alexey Khivin
8.0.x
High
Alexey Khivin

Bug Description

During failover tests cinder got stuck in endless loop trying to reply in non-existent exchange.
It tried to publish message about 100 times for 20 minutes and was fixed only when we have created the missing exchange/queue in rabbitmq manually.
All this time during erroneous reply publish cinder-volume was not available to process queries.

Small example of logs: http://paste.mirantis.net/show/445/
Snapshot: https://drive.google.com/a/mirantis.com/file/d/0B5oyxJmUjFQcWVo0OGVOdkhvTDQ/view?usp=sharing
Version:
DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
api: '1.0'
astute_sha: cb655a9a9ad26848bcd9d9ace91857b6f4a0ec15
auth_required: true
build_id: 2015-05-18_03-43-53
build_number: '432'
feature_groups:
- mirantis
fuel-library_sha: 1621cb350af744f497c35f2b3bb889c2041465d8
fuel-ostf_sha: 9ce1800749081780b8b2a4a7eab6586583ffaf33
fuelmain_sha: 0e970647a83d9a7d336c4cc253606d4dd0d59a60
nailgun_sha: 076566b5df37f681c3fd5b139c966d680d81e0a5
openstack_version: 2014.2.2-6.1
production: docker
python-fuelclient_sha: 38765563e1a7f14f45201fd47cf507393ff5d673
release: '6.1'
release_versions:
  2014.2.2-6.1:
    VERSION:
      api: '1.0'
      astute_sha: cb655a9a9ad26848bcd9d9ace91857b6f4a0ec15
      build_id: 2015-05-18_03-43-53
      build_number: '432'
      feature_groups:
      - mirantis
      fuel-library_sha: 1621cb350af744f497c35f2b3bb889c2041465d8
      fuel-ostf_sha: 9ce1800749081780b8b2a4a7eab6586583ffaf33
      fuelmain_sha: 0e970647a83d9a7d336c4cc253606d4dd0d59a60
      nailgun_sha: 076566b5df37f681c3fd5b139c966d680d81e0a5
      openstack_version: 2014.2.2-6.1
      production: docker
      python-fuelclient_sha: 38765563e1a7f14f45201fd47cf507393ff5d673
      release: '6.1'

Changed in mos:
milestone: none → 6.1
importance: Undecided → High
status: New → Confirmed
tags: removed: cinder
Changed in mos:
status: Confirmed → In Progress

Fix proposed to branch: openstack-ci/fuel-6.1/2014.2
Change author: Alex Khivin <email address hidden>
Review: https://review.fuel-infra.org/6951

Fix proposed to branch: openstack-ci/fuel-6.0-updates/2014.2
Change author: Alex Khivin <email address hidden>
Review: https://review.fuel-infra.org/6952

tags: added: customer-found
Alexey Khivin (akhivin) wrote :

Reply-queue disappeared when RabbitMQ has been rebooted. RPC server tries to send a reply to a client using "passive" mode and gets an exception "Queue not found". Server tries to send reply again and again. With this path server will just recreate a reply queue

Reviewed: https://review.fuel-infra.org/6952
Submitter: Vitaly Sedelnik <email address hidden>
Branch: openstack-ci/fuel-6.0-updates/2014.2

Commit: 9c46c92e60fdaa30e32050c83b5b07a96ee87c35
Author: Alex Khivin <email address hidden>
Date: Tue May 26 09:57:29 2015

Cinder loses rabbitmq exchanges and get stuck in reply loop

Reply-queue disappeared when RabbitMQ has been rebooted.
RPC server tries to send a reply to a client using "passive"
mode and gets an exception "Queue not found". Server tries
to send reply again and again. With this path server will
just recreate a reply queue

Change-Id: Ia06fa3dd54562e94c96ef6d7a5895209a168ab95
Closes-Bug: #1457055

Ivan Kolodyazhny (e0ne) on 2015-05-26
tags: added: cinder

Reviewed: https://review.fuel-infra.org/6951
Submitter: Oleksii Zamiatin <email address hidden>
Branch: openstack-ci/fuel-6.1/2014.2

Commit: c238331b29992c3148c05c49199c4e2b7a37b59a
Author: Alex Khivin <email address hidden>
Date: Tue May 26 17:29:29 2015

Cinder loses rabbitmq exchanges and get stuck in reply loop

Reply-queue disappeared when RabbitMQ has been rebooted.
RPC server tries to send a reply to a client using "passive"
mode, gets an exception "Queue not found" and goes to reconnect.
Server tries to send reply again and again. With this path server
will just recreate a reply queue

Closes-Bug: #1457055

Change-Id: Ia06fa3dd54562e94c96ef6d7a5895209a168ab95

Alexey Khivin (akhivin) on 2015-05-27
Changed in mos:
status: In Progress → Fix Committed
Serg Lystopad (slystopad) wrote :

Please backport the fix to 6.0.1

Serg Lystopad (slystopad) wrote :

We need ubuntu package with fix included

Fix proposed to branch: openstack-ci/fuel-6.0.1/2014.2
Change author: Alex Khivin <email address hidden>
Review: https://review.fuel-infra.org/7306

Reviewed: https://review.fuel-infra.org/7306
Submitter: Oleksii Zamiatin <email address hidden>
Branch: openstack-ci/fuel-6.0.1/2014.2

Commit: dd3ae5452c5238f2e61bf82adbd72e5683686553
Author: Alex Khivin <email address hidden>
Date: Tue Jun 2 15:10:20 2015

Cinder loses rabbitmq exchanges and get stuck in reply loop

Reply-queue disappeared when RabbitMQ has been rebooted.
RPC server tries to send a reply to a client using "passive"
mode, gets an exception "Queue not found" and goes to reconnect.
Server tries to send reply again and again. With this path server
will just recreate a reply queue

Closes-Bug: #1457055

Change-Id: Ia06fa3dd54562e94c96ef6d7a5895209a168ab95
(cherry picked from commit c238331b29992c3148c05c49199c4e2b7a37b59a)

Hi,

We encountered the same problem with reply_<id> exchanges declared, the id being autogenerated (seems like an UUID or sth like that).
We could not see if cinder was the process declaring those bizarre exchanges.

When creating this special exchange then the rabbitmq delivers again... until there is a problem again.
In order to make our platform run again, is it preferable to delete all those bizarre exchanges ?

What is the method to avoid those endless loops for good ?

Thanks for your feedback

Nicolas ROUYER

Alexey Khivin (akhivin) wrote :

@Nicolas
 you can try to apply this fix and if it will not help then fill free to create new launchpad bug with related logs.

Download full text (3.7 KiB)

Thanks Alex, Where do we find the fix ?

Nicolas

Le 5 juin 2015 à 12:01, "Alex Khivin" <email address hidden> a écrit :

> @Nicolas
> you can try to apply this fix and if it will not help then fill free to create new launchpad bug with related logs.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1457055
>
> Title:
> Cinder loses rabbitmq exchanges and get stuck in reply loop
>
> Status in Mirantis OpenStack:
> Fix Committed
> Status in Mirantis OpenStack 6.0-updates series:
> Fix Committed
> Status in Mirantis OpenStack 6.0.x series:
> Fix Committed
>
> Bug description:
> During failover tests cinder got stuck in endless loop trying to reply in non-existent exchange.
> It tried to publish message about 100 times for 20 minutes and was fixed only when we have created the missing exchange/queue in rabbitmq manually.
> All this time during erroneous reply publish cinder-volume was not available to process queries.
>
> Small example of logs: http://paste.mirantis.net/show/445/
> Snapshot: https://drive.google.com/a/mirantis.com/file/d/0B5oyxJmUjFQcWVo0OGVOdkhvTDQ/view?usp=sharing
> Version:
> DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
> api: '1.0'
> astute_sha: cb655a9a9ad26848bcd9d9ace91857b6f4a0ec15
> auth_required: true
> build_id: 2015-05-18_03-43-53
> build_number: '432'
> feature_groups:
> - mirantis
> fuel-library_sha: 1621cb350af744f497c35f2b3bb889c2041465d8
> fuel-ostf_sha: 9ce1800749081780b8b2a4a7eab6586583ffaf33
> fuelmain_sha: 0e970647a83d9a7d336c4cc253606d4dd0d59a60
> nailgun_sha: 076566b5df37f681c3fd5b139c966d680d81e0a5
> openstack_version: 2014.2.2-6.1
> production: docker
> python-fuelclient_sha: 38765563e1a7f14f45201fd47cf507393ff5d673
> release: '6.1'
> release_versions:
> 2014.2.2-6.1:
> VERSION:
> api: '1.0'
> astute_sha: cb655a9a9ad26848bcd9d9ace91857b6f4a0ec15
> build_id: 2015-05-18_03-43-53
> build_number: '432'
> feature_groups:
> - mirantis
> fuel-library_sha: 1621cb350af744f497c35f2b3bb889c2041465d8
> fuel-ostf_sha: 9ce1800749081780b8b2a4a7eab6586583ffaf33
> fuelmain_sha: 0e970647a83d9a7d336c4cc253606d4dd0d59a60
> nailgun_sha: 076566b5df37f681c3fd5b139c966d680d81e0a5
> openstack_version: 2014.2.2-6.1
> production: docker
> python-fuelclient_sha: 38765563e1a7f14f45201fd47cf507393ff5d673
> release: '6.1'
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/mos/+bug/1457055/+subscriptions

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pi...

Read more...

Alexey Khivin (akhivin) wrote :

@Nicolas
Note, bug related to 6.x.x versions

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Alex Khivin <email address hidden>
Review: https://review.fuel-infra.org/8284

tags: added: done release-notes

Change abandoned by Victor Sergeyev <email address hidden> on branch: openstack-ci/fuel-7.0/2015.1.0
Review: https://review.fuel-infra.org/8284
Reason: This code was refactored during the K release cycle, so let's wait for a while with this patch. Will restore and merge it, if bug 1457055 affect 7.0

Andrew Woodward (xarses) wrote :

updated series to reflect comment of abandoned 7.0 patch

Roman Rufanov (rrufanov) on 2015-09-16
tags: added: support
Jay Xiong (jay-xiong) wrote :

Can someone send me the diff for the fix? I saw similar issue when updating ironic node.

Denis Meltsaykin (dmeltsaykin) wrote :

Jay, it was a problem of oslo.messaging in our Juno branch, the upstream's one is not suffering. Besides, in the newer openstack releases the code was heavily refactored, so the problem is not present in Kilo/Liberty/...

Jay Xiong (jay-xiong) wrote :

Dennis,

Thanks for your reply.

I am using Liberty/Ironic. The issue is different, but the root cause may be similar to this one. Here are more information regarding the issue I observed using ironic/liberty. I am running the command: ironic node-validate $NODE_UUID

After I reboot the whole system, the reply exchange for ironic is there again. Then, after a little while it disappeared again. I am using rabbitmqctl list_exchanges to monitor the rabbitmq exchange.

Shortly after reboot:

user@ironic-server:~$ sudo rabbitmqctl list_exchanges
Listing exchanges ...
 direct
amq.direct direct
amq.fanout fanout
amq.headers headers
amq.match headers
amq.rabbitmq.log topic
amq.rabbitmq.trace topic
amq.topic topic
cert_fanout fanout
conductor_fanout fanout
consoleauth_fanout fanout
dhcp_agent_fanout fanout
ironic topic
ironic.conductor_manager_fanout fanout
neutron topic
nova topic
q-agent-notifier-dvr-update_fanout fanout
q-agent-notifier-network-update_fanout fanout
q-agent-notifier-port-delete_fanout fanout
q-agent-notifier-port-update_fanout fanout
q-agent-notifier-security_group-update_fanout fanout
q-agent-notifier-tunnel-delete_fanout fanout
q-agent-notifier-tunnel-update_fanout fanout
q-plugin_fanout fanout
reply_328defc9a8164ebd91e774be2cf4f50f direct
reply_8529a09e2f824e018beacabe28a89b90 direct <-- reply exchange for ironic
reply_c1b83eda10a44808b2eeb44e1e0a95d9 direct
reply_e0fdef537207404d921718d9341331c6 direct
scheduler_fanout fanout

After a little while: (Notice that the reply exchange is gone.)

user@ironic-server:~$ sudo rabbitmqctl list_exchanges
Listing exchanges ...
 direct
amq.direct direct
amq.fanout fanout
amq.headers headers
amq.match headers
amq.rabbitmq.log topic
amq.rabbitmq.trace topic
amq.topic topic
cert_fanout fanout
conductor_fanout fanout
consoleauth_fanout fanout
dhcp_agent_fanout fanout
ironic topic
ironic.conductor_manager_fanout fanout
neutron topic
nova topic
q-agent-notifier-dvr-update_fanout fanout
q-agent-notifier-network-update_fanout fanout
q-agent-notifier-port-delete_fanout fanout
q-agent-notifier-port-update_fanout fanout
q-agent-notifier-security_group-update_fanout fanout
q-agent-notifier-tunnel-delete_fanout fanout
q-agent-notifier-tunnel-update_fanout fanout
q-plugin_fanout fanout
reply_328defc9a8164ebd91e774be2cf4f50f direct
reply_c1b83eda10a44808b2eeb44e1e0a95d9 direct
reply_e0fdef537207404d921718d9341331c6 direct
scheduler_fanout fanout

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers