oslo.messaging does not redeclare exchange if it is missing

Bug #1609741 reported by Dmitry Mescheryakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
Critical
Kirill Bespalov

Bug Description

Version: 9.0

Steps to reproduce:
1. Create a load on OpenStack, trigger restart of one of RabbitMQ nodes (exact reason unknown).
2. Observe a lot of
operation basic.publish caused a channel exception not_found: "no exchange 'reply_d8786e66456a4660bebb362668a027e4' in vhost '/'"
   entries in RabbitMQ log for various reply queues.

Looking earlier in the RabbitMQ log one can found
2016-08-03T13:33:12.945437+00:00 notice: operation queue.declare caused a channel exception not_found: "failed to perform operation on queue 'reply_d8786e66456a4660bebb362668a027e4' in vhost '/' due to timeout"
2016-08-03T13:34:43.006600+00:00 notice: operation queue.bind caused a channel exception not_found: "no exchange 'reply_d8786e66456a4660bebb362668a027e4' in vhost '/'"

The following stacktrace corresponds to the later message in nova-compute.log: http://paste.openstack.org/show/548803/

It seems that during RabbitMQ failover we might end up with declared queue, which is not bound to an exchange (there an exception listed in paste above is thrown). Later oslo.messaging successfully starts consuming from that queue, since the queue exists, but it is useless because it is not bound to an exchange.

You may find the whole logs containing snippets from above attached - nova-compute.log and rabbitmq.log.3.gz

Tags: area-oslo
tags: added: area-oslo
Changed in mos:
importance: Undecided → High
assignee: nobody → MOS Oslo (mos-oslo)
milestone: none → 9.1
status: New → Confirmed
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :
description: updated
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

I'm raising the importance.
That kind of bug will create whole-cloud outage cases.

Changed in mos:
importance: High → Critical
Changed in mos:
assignee: MOS Oslo (mos-oslo) → Kirill Bespalov (k-besplv)
Changed in mos:
status: Confirmed → Fix Committed
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :
Revision history for this message
Alexey Galkin (agalkin) wrote :

We can't reproduce this bug because of "1. Create a load on OpenStack, trigger restart of one of RabbitMQ nodes (exact reason unknown)." , consequently we set 'Fix released' status.

Changed in mos:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.