[RFE] Have more granular control for topic exchanges and durable mode

Bug #1953351 reported by Herve Beraud
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
oslo.messaging
Fix Released
Undecided
Unassigned

Bug Description

Hello,

## Context

RabbitMQ, AMQP

## Problem description

Today, each service can declare their own control exchange [1], but if this one is not defined in the service config the service will use the default value (`openstack`)[1]. If more than 2 services do not defined their own control exchange this default exchange will be implicitly turned as a shared control exchanges.

Today, each service can declare if their queues are durable queues or not [2] (by default they aren't durable queues). The service will try to configure each create queue with this config (durable or not).

The problem is that when a control exchange become a shared control exchange, de facto when more than two services use the same control exchange name, if one service is not configured to set durable queues (name it the service A) and if the other service is set to create durable queues (name it the service B), then when the service A will create the shared control exchange, it will create it as a control exchange with non durable queue, in turn then the service B will try to declare the shared control exchange, however, this control exchange already exist and it is already configured, then rabbitmq will reject the configuration given by B because the exchange already exist and the given configuration isn't compatible with the one given by A.

The problem described in the previous sentence will lead to an error. B will get a PRECONDITION_FAILED error related to the configuration of durable queues [3].

## Work around

If all the services set their own control exchanges [1], then they are able to also set specific durable queues for these dedicated control exchanges. The trick here is to not use the default control exchange to allow to configure if queues are durable or not.

Else, if services with different configurations try to apply them to the same control exchange (a shared control exchange) when this will lead to this issue (see the side observations section below).

## Enhancement

It could be worth to see if is it possible to give more granular control for topic exchanges and for durable mode to allow users to be more fine grained in their configuration and avoiding similar issues. In other words, this kind of improvement could lead to more isolated configurations and applied on topic consumption for services.

The problem described here is more a design issue at the services/configuration level than an oslo.messaging bug. I think oslo.messaging behave as expected here. Oslo.messaging is an underlying library without the knowledege of the global context between services and it should stay as it is now concerning this knowledge, but maybe some adjustement could be made to afford to users to get around this kind of design problem by allowing a more fine grained tuning.

I need to dig more to see how to get around this problem by using rabbitmq's capabilities and features.

## Side observations

I think the same problem exist with the `auto_delete` configuration of these implicitly shared control exchanges, and if two services set different values concerning how to auto delete queues of a shared control exchange then the same kind of `PRECONDITION_FAILED` error will be raised.

[1] https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/transport.py#L57-L62
[2] https://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/_drivers/amqp.py#L33-L35
[3] ```
2021-12-02 12:13:56.689 13 ERROR oslo.messaging._drivers.impl_rabbit [req-9b3a86d6-36b8-45b9-9ebd-cfe2b2b8b24c 5634c1385b8244d9b99393341eae6f33 bd26c02434014991a394eeb820516872 - default default] Failed to publish message to topic 'openstack': Exchange.declare: (406) PRECONDITION_FAILED - inequivalent arg 'durable' for exchange 'openstack' in vhost '/': received 'true' but current is 'false':
```

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Please note that changing defaults for control exchange of a service would likely provide an upgrade impact. Updating defaults should happen consistently, otherwise two sides of RPC wouldn't be able to communicate, until restarted with a new config. Also, it might affect custom ha-policies set by operators for rabbitmq, expecting 'openstack' instead of the changed defaults.

Meanwhile, the status quo remains as that - one *cannot* use amqp durable queues with today control_exchange defaults.

Changed in oslo.messaging:
status: New → Confirmed
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Download full text (9.5 KiB)

Please also note, that the problem is not only with a shared 'openstack' exchange among services.
The issue arise as well with service specific exchanges, see that example build: https://zuul.opendev.org/t/openstack/build/aa514dd788f34cc1be3800e6d7dba0e8/log/controller/logs/screen-n-cpu.txt :

-- Logs begin at Mon 2021-12-06 13:32:01 UTC, end at Mon 2021-12-06 13:54:52 UTC. --
Dec 06 13:52:15.980547 ubuntu-focal-ovh-bhs1-0027613400 systemd[1]: Started Devstack <email address hidden>.
Dec 06 13:52:18.278518 ubuntu-focal-ovh-bhs1-0027613400 nova-compute[110235]: DEBUG os_vif [-] Loaded VIF plugin class '<class 'vif_plug_linux_bridge.linux_bridge.LinuxBridgePlugin'>' with name 'linux_bridge' {{(pid=110235) initialize /usr/local/lib/python3.8/dist-packages/os_vif/__init__.py:44}}
Dec 06 13:52:18.279319 ubuntu-focal-ovh-bhs1-0027613400 nova-compute[110235]: DEBUG os_vif [-] Loaded VIF plugin class '<class 'vif_plug_noop.noop.NoOpPlugin'>' with name 'noop' {{(pid=110235) initialize /usr/local/lib/python3.8/dist-packages/os_vif/__init__.py:44}}
Dec 06 13:52:18.279319 ubuntu-focal-ovh-bhs1-0027613400 nova-compute[110235]: WARNING oslo_config.cfg [-] Deprecated: Option "ovsdb_interface" from group "os_vif_ovs" is deprecated for removal (
Dec 06 13:52:18.279319 ubuntu-focal-ovh-bhs1-0027613400 nova-compute[110235]: os-vif has supported ovsdb access via python bindings
Dec 06 13:52:18.279319 ubuntu-focal-ovh-bhs1-0027613400 nova-compute[110235]: since Stein (1.15.0), starting in Victoria (2.2.0) the
Dec 06 13:52:18.279319 ubuntu-focal-ovh-bhs1-0027613400 nova-compute[110235]: ovs-vsctl driver is now deprecated for removal and
Dec 06 13:52:18.279319 ubuntu-focal-ovh-bhs1-0027613400 nova-compute[110235]: in future releases it will be be removed.
Dec 06 13:52:18.279319 ubuntu-focal-ovh-bhs1-0027613400 nova-compute[110235]: ). Its value may be silently ignored in the future.
Dec 06 13:52:18.279319 ubuntu-focal-ovh-bhs1-0027613400 nova-compute[110235]: DEBUG os_vif [-] Loaded VIF plugin class '<class 'vif_plug_ovs.ovs.OvsPlugin'>' with name 'ovs' {{(pid=110235) initialize /usr/local/lib/python3.8/dist-packages/os_vif/__init__.py:44}}
Dec 06 13:52:18.280080 ubuntu-focal-ovh-bhs1-0027613400 nova-compute[110235]: INFO os_vif [-] Loaded VIF plugins: linux_bridge, noop, ovs
Dec 06 13:52:18.548898 ubuntu-focal-ovh-bhs1-0027613400 nova-compute[110235]: DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): grep -F node.session.scan /sbin/iscsiadm {{(pid=110235) execute /usr/local/lib/python3.8/dist-packages/oslo_concurrency/processutils.py:384}}
Dec 06 13:52:18.560048 ubuntu-focal-ovh-bhs1-0027613400 nova-compute[110235]: DEBUG oslo_concurrency.processutils [-] CMD "grep -F node.session.scan /sbin/iscsiadm" returned: 0 in 0.011s {{(pid=110235) execute /usr/local/lib/python3.8/dist-packages/oslo_concurrency/processutils.py:422}}
Dec 06 13:52:18.612783 ubuntu-focal-ovh-bhs1-0027613400 nova-compute[110235]: ERROR oslo.messaging._drivers.impl_rabbit [None req-a2d299d2-06e9-44cf-b81a-615a5f3e4ccb None None] Failed to publish message to topic 'nova': Exchange.declar...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (master)
Changed in oslo.messaging:
status: Confirmed → In Progress
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.opendev.org/c/openstack/oslo.messaging/+/821712
Committed: https://opendev.org/openstack/oslo.messaging/commit/1fd461647f7f727dad9d4603abf0defe339d320f
Submitter: "Zuul (22348)"
Branch: master

commit 1fd461647f7f727dad9d4603abf0defe339d320f
Author: Hervé Beraud <email address hidden>
Date: Tue Dec 14 15:58:34 2021 +0100

    Force creating non durable control exchange when a precondition failed

    Precondition failed exception related to durable exchange
    config may be triggered when a control exchange is shared
    between services and when services try to create it with
    configs that differ from each others. RabbitMQ will reject
    the services that try to create it with a configuration
    that differ from the one used first.

    This kind of exception is not managed for now and services
    can fails without handling this kind of issue.

    These changes catch this kind exception to analyze if they
    related to durable config. In this case we try to re-declare
    the failing exchange/queue as non durable.

    This problem can be easily reproduced by running a local RabbitMQ
    server.

    By setting the config below (sample.conf):

    ```
    [DEFAULT]
    transport_url = rabbit://localhost/
    [OSLO_MESSAGING_RABBIT]
    amqp_durable_queues = true
    ```

    And by running our simulator twice:

    ```
    $ tox -e venv -- python tools/simulator.py -d rpc-server -w 40
    $ tox -e venv -- python tools/simulator.py --config-file ./sample.conf -d rpc-server -w 40
    ```

    The first one will create a default non durable control exchange.
    The second one will create the same default control exchange but as
    durable.

    Closes-Bug: #1953351
    Change-Id: I27625b468c428cde6609730c8ab429c2c112d010

Changed in oslo.messaging:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/oslo.messaging 14.1.0

This issue was fixed in the openstack/oslo.messaging 14.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (unmaintained/zed)

Fix proposed to branch: unmaintained/zed
Review: https://review.opendev.org/c/openstack/oslo.messaging/+/922608

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.