Orphan exchanges in Qpid and lack of option for making queues [un]durable

Bug #1178375 reported by Salman Baset on 2013-05-09
48
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Ceilometer
Undecided
Russell Bryant
Cinder
Undecided
Russell Bryant
OpenStack Compute (nova)
Undecided
Russell Bryant
OpenStack Heat
Fix Released
Undecided
Russell Bryant
OpenStack Identity (keystone)
Undecided
Unassigned
Havana
Medium
Alan Pevec
neutron
Medium
Russell Bryant
oslo-incubator
Medium
Russell Bryant
oslo.messaging
Medium
Ken Giusti

Bug Description

Start qpid, nova-api, nova-scheduler, and nova-conductor, and nova-compute.

There are orphan direct exchanges in qpid. Checked using qpid-config exchanges. The exchanges continue to grow, presumably, whenever nova-compute does a periodic update over AMQP.

Moreover, the direct and topic exchanges are by default durable which is a problem. We want the ability to turn on/off the durable option just like Rabbit options.

Salman Baset (salman-h) wrote :
Salman Baset (salman-h) on 2013-05-18
affects: oslo → nova
affects: nova → oslo
Mathew Odden (locke105) wrote :

I've been seeing this a lot lately using QPID. I think they are RPC reply queues from conductor but I'm not sure. Fortunately on most systems, QPID doesn't support durable queues since it requires extra packages. Restarting QPID will clear the orphaned queues out.

Also, I think there is a Redhat bug related to this in case anyone is interested.

https://bugzilla.redhat.com/show_bug.cgi?id=960539

Ben Nemec (bnemec) wrote :

The durable option is being addressed in this review: https://review.openstack.org/#/c/29617/

William Henry (whenry) wrote :

Do you really ever require durable? Are durable ever used?

William Henry (whenry) wrote :

Please see:
https://review.openstack.org/#/c/32179/
Suggested fix.

Fix proposed to branch: master
Review: https://review.openstack.org/32187

Changed in oslo:
assignee: nobody → William Henry (whenry)
status: New → In Progress
Russell Bryant (russellb) wrote :

It appears that this issue is grizzly specific. It's (accidentally) fixed in havana already.

In havana, we use a single queue for all replies. We're not creating an exchange/queue for every reply. This behavior was actually introduced in grizzly, but is off by default. If you set amqp_rpc_single_reply_queue=True in your config file, you should not be seeing this problem.

Russell Bryant (russellb) wrote :

Actually, not grizzly specific, I meant grizzly and earlier that has qpid support

Changed in oslo:
assignee: William Henry (whenry) → Russell Bryant (russellb)

Reviewed: https://review.openstack.org/32187
Committed: http://github.com/openstack/oslo-incubator/commit/76972e2949634abe9fcc9ad36c103cca94300237
Submitter: Jenkins
Branch: master

commit 76972e2949634abe9fcc9ad36c103cca94300237
Author: Russell Bryant <email address hidden>
Date: Wed Aug 28 12:09:54 2013 -0400

    Support a new qpid topology

    There has been a bug open for a while pointing out that the way we
    create direct exchanges with qpid results in leaking exchanges since
    qpid doesn't support auto-deleting exchanges. This was somewhat
    mitigated by change to use a single reply queue. This meant we created
    far fewer direct exchanges, but the problem persists anyway.

    A Qpid expert, William Henry, originally proposed a change to address
    this issue. Unfortunately, it wasn't backwards compatible with existing
    installations. This patch takes the same approach, but makes it
    optional and off by default. This will allow a migration period.

    As a really nice side effect, the Qpid experts have told us that this
    change will also allow us to use Qpid broker federation to provide HA.

    DocImpact
    Closes-bug: #1178375
    Co-authored-by: William Henry <email address hidden>
    Change-Id: I09b8317c0d8a298237beeb3105f2b90cb13933d8

Changed in oslo:
status: In Progress → Fix Committed

Fix proposed to branch: master
Review: https://review.openstack.org/44523

Changed in nova:
assignee: nobody → Russell Bryant (russellb)
status: New → In Progress

Fix proposed to branch: master
Review: https://review.openstack.org/44532

Changed in cinder:
assignee: nobody → Russell Bryant (russellb)
status: New → In Progress

Fix proposed to branch: master
Review: https://review.openstack.org/44541

Changed in neutron:
assignee: nobody → Russell Bryant (russellb)
status: New → In Progress

Fix proposed to branch: master
Review: https://review.openstack.org/44556

Changed in heat:
assignee: nobody → Russell Bryant (russellb)
status: New → In Progress

Fix proposed to branch: master
Review: https://review.openstack.org/44561

Changed in ceilometer:
assignee: nobody → Russell Bryant (russellb)
status: New → In Progress

Reviewed: https://review.openstack.org/44523
Committed: http://github.com/openstack/nova/commit/579109cab946a54f86b04724f0b6cb71fb027c04
Submitter: Jenkins
Branch: master

commit 579109cab946a54f86b04724f0b6cb71fb027c04
Author: Russell Bryant <email address hidden>
Date: Fri Aug 30 14:21:34 2013 -0400

    Sync rpc from oslo-incubator

    This includes:

    76972e2 Support a new qpid topology
    284b13a Raise timeout in fake RPC if no consumers found
    9721129 exception: remove
    7b0cb37 Don't eat callback exceptions
    69abf38 requeue instead of reject
    28395d9 Fixes files with wrong bitmode
    bec54ac Fix case error in qpid exchange type "direct"
    61c4cde Ensure context type is handled when using to_dict
    223f9e1 Clarify precedence of secret_key_file
    a035f95 Don't shadow cfg import in securemessage
    0f88575 Remove redundant global keyword in securemessage
    848c4d5 Some nitpicky securemessage cleanups
    5c71c25 Allow non-use of cfg.CONF in securemessage
    9157286 RPC: Add MessageSecurity implementation
    2031e60 Refactors boolean returns
    a047a35 Make ZeroMQ based RPC consumer threads more robust
    34a6842 On reconnecting a FanoutConsumer, don't grow the topic name
    f52446c Add serializer param to RPC service
    5ff534d Add config for amqp durable/auto_delete queues
    7bfd443 Avoid shadowing Exception 'message' attribute

    Closes-bug: #1178375
    Change-Id: Ib5d4733743041eb2324020f9b1dc553260e79b21

Changed in nova:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/44532
Committed: http://github.com/openstack/cinder/commit/a1fe496e1113737d0b133a64078bc45c485dd3b2
Submitter: Jenkins
Branch: master

commit a1fe496e1113737d0b133a64078bc45c485dd3b2
Author: Russell Bryant <email address hidden>
Date: Fri Aug 30 15:36:36 2013 -0400

    Sync rpc fix from oslo-incubator

    Sync the following fix from oslo-incubator:

    76972e2 Support a new qpid topology

    This includes one other commit, so that the above fix could be brought
    over cleanly:

    5ff534d Add config for amqp durable/auto_delete queues

    Change-Id: I1fd5aaf87ec87836df3e44e83247bf82301475f5
    Closes-bug: #1178375

Changed in cinder:
status: In Progress → Fix Committed
Steven Hardy (shardy) on 2013-09-02
Changed in heat:
milestone: none → havana-3

Reviewed: https://review.openstack.org/44556
Committed: http://github.com/openstack/heat/commit/7f1c6e97736ba36de27e7faf76442c3706557811
Submitter: Jenkins
Branch: master

commit 7f1c6e97736ba36de27e7faf76442c3706557811
Author: Russell Bryant <email address hidden>
Date: Fri Aug 30 17:57:29 2013 -0400

    Sync rpc from oslo-incubator

    This includes the following changes:

    76972e2 Support a new qpid topology
    284b13a Raise timeout in fake RPC if no consumers found
    9721129 exception: remove
    7b0cb37 Don't eat callback exceptions
    69abf38 requeue instead of reject

    Change-Id: I9113991aebe7d566c8877d74aad9d55b65fdfb9e
    Closes-bug: #1178375

Changed in heat:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/44561
Committed: http://github.com/openstack/ceilometer/commit/5f7235e5c9138bbf7b9ebce5b796c0474b623b7e
Submitter: Jenkins
Branch: master

commit 5f7235e5c9138bbf7b9ebce5b796c0474b623b7e
Author: Russell Bryant <email address hidden>
Date: Fri Aug 30 18:15:45 2013 -0400

    Sync rpc from oslo-incubator

    This includes the following changes:

    76972e2 Support a new qpid topology
    284b13a Raise timeout in fake RPC if no consumers found
    9721129 exception: remove
    7b0cb37 Don't eat callback exceptions
    69abf38 requeue instead of reject

    Change-Id: I58051558345cdb94a9ad29edf02acba9952f6f60
    Closes-bug: #1178375

Changed in ceilometer:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/44541
Committed: http://github.com/openstack/neutron/commit/34a208d1f3829173815beca81d07b53633a12989
Submitter: Jenkins
Branch: master

commit 34a208d1f3829173815beca81d07b53633a12989
Author: Russell Bryant <email address hidden>
Date: Tue Sep 3 02:51:14 2013 -0400

    Sync rpc fix from oslo-incubator

    Sync the following fix from oslo-incubator:

    76972e2 Support a new qpid topology

    This includes one other commit, so that the above fix could be brought
    over cleanly:

    5ff534d Add config for amqp durable/auto_delete queues

    Closes-bug: #1178375
    Change-Id: I99d6a1771bc3223f86db0132525bf22c271fe862

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2013-09-05
Changed in ceilometer:
milestone: none → havana-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2013-09-05
Changed in heat:
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2013-09-05
Changed in cinder:
milestone: none → havana-3
status: Fix Committed → Fix Released
Changed in neutron:
milestone: none → havana-3
importance: Undecided → Medium
Thierry Carrez (ttx) on 2013-09-05
Changed in neutron:
status: Fix Committed → Fix Released
Changed in nova:
milestone: none → havana-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2013-09-05
Changed in oslo:
milestone: none → havana-3
status: Fix Committed → Fix Released

I will be out of the office starting 2013-09-05 and will not return until
2013-09-15.

I will be on my marriage leave from 9/5 to 9/15, for any urgent issue
please call me before 9/7.

For daily work, please ask my scrum master Zhu Zhu for help.
For glance issue, please ask glance SME Feilong Wang for help.
For defect report, there will be no report next week.

Thierry Carrez (ttx) on 2013-10-17
Changed in oslo:
milestone: havana-3 → 2013.2
Thierry Carrez (ttx) on 2013-10-17
Changed in ceilometer:
milestone: havana-3 → 2013.2
Thierry Carrez (ttx) on 2013-10-17
Changed in heat:
milestone: havana-3 → 2013.2
Thierry Carrez (ttx) on 2013-10-17
Changed in cinder:
milestone: havana-3 → 2013.2
Thierry Carrez (ttx) on 2013-10-17
Changed in neutron:
milestone: havana-3 → 2013.2
Thierry Carrez (ttx) on 2013-10-17
Changed in nova:
milestone: havana-3 → 2013.2
Changed in oslo.messaging:
status: New → Confirmed
Ken Giusti (kgiusti) wrote :

working on port to oslo.messaging.

Changed in oslo.messaging:
assignee: nobody → Ken Giusti (kgiusti)
status: Confirmed → New

Fix proposed to branch: master
Review: https://review.openstack.org/56686

Changed in oslo.messaging:
status: New → In Progress
Ken Giusti (kgiusti) wrote :
Download full text (3.2 KiB)

There appears to be a problem with this patch. It introduces a change in messaging behavior that I think is a bug.

I've found a problem while testing my port of this fix to oslo.messaging (see link in previous comment for review).

The problem that I'm seeing is that RPC messages sent to a topic that has more than one server is no longer having the RPC request be serviced by only one server. Instead, this patch causes the RPC request to be multicast to _all_ servers - basically it now acts like a fanout.

Here's exactly what I'm seeing. I've built a simple RPC server and client test tool (see https://github.com/kgiusti/oslo-messaging-clients), which can select between the two topology versions via command line switch.

If I run two servers using topology version 1 on topic "my-topic", like this:

(py27)[kgiusti@t530 work (master)]$ ./my-server.py --topology=1 server01 &
Running server, name=server01 exchange=my-exchange topic=my-topic namespace=my-namespace
Using QPID topology version 1

(py27)[kgiusti@t530 work (master)]$ ./my-server.py --topology=1 server02 &
[ Running server, name=server02 exchange=my-exchange topic=my-topic namespace=my-namespace
Using QPID topology version 1

And if I then invoke a single RPC call via a client, I see only one of the servers get the RPC request. Each time I invoke the RPC client, the request is "round-robined" across the servers:

(py27)[kgiusti@t530 work (master)]$ ./my-client.py --topology=1 my-topic methodB arg1 arg2
server01::TestEndpoint02::methodB( ctxt={'application': u'my-client', 'cast': None, 'time': u'Sat Nov 16 08:50:39 2013'} arg={u'arg1': u'arg2'} ) called!!!

(py27)[kgiusti@t530 work (master)]$ ./my-client.py --topology=1 my-topic methodB arg1 arg2
server02::TestEndpoint02::methodB( ctxt={'application': u'my-client', 'cast': None, 'time': u'Sat Nov 16 08:50:45 2013'} arg={u'arg1': u'arg2'} ) called!!!

(py27)[kgiusti@t530 work (master)]$ ./my-client.py --topology=1 my-topic methodB arg1 arg2
server01::TestEndpoint02::methodB( ctxt={'application': u'my-client', 'cast': None, 'time': u'Sat Nov 16 08:50:50 2013'} arg={u'arg1': u'arg2'} ) called!!!

However, if I repeat the above using topology=1, both servers get a copy of the RPC request and service it:

(py27)[kgiusti@t530 work (master)]$ ./my-server.py --topology=2 server02 &
 Running server, name=server02 exchange=my-exchange topic=my-topic namespace=my-namespace
Using QPID topology version 2

(py27)[kgiusti@t530 work (master)]$ ./my-server.py --topology=2 server01 &
Running server, name=server01 exchange=my-exchange topic=my-topic namespace=my-namespace
Using QPID topology version 2

Now invoke the client once:
(py27)[kgiusti@t530 work (master)]$ ./my-client.py --topology=2 my-topic methodB arg1 arg2

And both servers respond:
(py27)[kgiusti@t530 work (master)]$ server02::TestEndpoint02::methodB( ctxt={'application': u'my-client', 'cast': None, 'time': u'Sat Nov 16 08:52:22 2013'} arg={u'arg1': u'arg2'} ) called!!!
server01::TestEndpoint02::methodB( ctxt={'application': u'my-client', 'cast': None, 'time': u'Sat Nov 16 08:52:22 2013'} arg={u'arg1': u'arg2'} ) called!!!

Clearly a change in behavior, and, per my noob understanding of t...

Read more...

Ken Giusti (kgiusti) wrote :

Oops, should've proofread that last comment -

"However, if I repeat the above using topology=1, both servers get a copy of the RPC request and service it:"

Should read "topology 2" not 1, sorry!

Ken Giusti (kgiusti) wrote :

This patch should address the problem I described in the previous comment. I've had the changes reviewed by other QPID developers.

This change should be "backward compatible" in that it does not change the actual address used by the patch - only the address options. There should be no need to bump the topology number.

Mark McLoughlin (markmc) on 2013-11-19
Changed in oslo:
importance: Undecided → Medium
Changed in oslo.messaging:
importance: Undecided → Medium

Reviewed: https://review.openstack.org/56686
Committed: http://github.com/openstack/oslo.messaging/commit/7e1fddb2171f4ce3ccceb507eea2ee413e81c66d
Submitter: Jenkins
Branch: master

commit 7e1fddb2171f4ce3ccceb507eea2ee413e81c66d
Author: Russell Bryant <email address hidden>
Date: Wed Aug 28 12:09:54 2013 -0400

    Support a new qpid topology

    There has been a bug open for a while pointing out that the way we
    create direct exchanges with qpid results in leaking exchanges since
    qpid doesn't support auto-deleting exchanges. This was somewhat
    mitigated by change to use a single reply queue. This meant we created
    far fewer direct exchanges, but the problem persists anyway.

    A Qpid expert, William Henry, originally proposed a change to address
    this issue. Unfortunately, it wasn't backwards compatible with existing
    installations. This patch takes the same approach, but makes it
    optional and off by default. This will allow a migration period.

    As a really nice side effect, the Qpid experts have told us that this
    change will also allow us to use Qpid broker federation to provide HA.

    DocImpact
    Closes-bug: #1178375
    Co-authored-by: William Henry <email address hidden>
    Change-Id: I09b8317c0d8a298237beeb3105f2b90cb13933d8

Changed in oslo.messaging:
status: In Progress → Fix Committed
Mark McLoughlin (markmc) on 2013-11-29
Changed in oslo.messaging:
milestone: none → icehouse-1
Gordon Sim (gsim) wrote :

I concur with Ken's comment above (i.e. #26). The topology 2 'TopicConsumer' does not set a link name which means each consumer will be independent (non-competing). That's fine where it is for the topic with server name appended, but *not* when it is for the topic itself, when it needs to in fact be a shared subscription between all servers on the topic.

See https://review.openstack.org/59677 also, which changes the setting of link name for topology 1. Doing the same thing in topology 2 (or better, pulling the line that conditionally sets the link name into the common path) would fix the problem as described by Ken.

Thierry Carrez (ttx) on 2013-12-05
Changed in oslo.messaging:
status: Fix Committed → Fix Released
Alan Pevec (apevec) on 2013-12-06
Changed in keystone:
status: New → Invalid
Alan Pevec (apevec) on 2014-03-21
tags: removed: grizzly-backport-potential
no longer affects: nova/grizzly
Thierry Carrez (ttx) on 2014-04-17
Changed in oslo.messaging:
milestone: icehouse-1 → 1.3.0
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.