Connections to RabbitMQ are less stable from Wallaby onwards

Bug #1961603 reported by Andrew Bonney
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Undecided
Unassigned

Bug Description

We have been tracking two issues which have emerged since the Wallaby release and relate to connections between services such as 'nova-compute' and RabbitMQ. These manifest as follows:

- Services fail to auto-recover when the RabbitMQ cluster is taken down and restarted. (see also https://bugs.launchpad.net/oslo.messaging/+bug/1961402)
This is a common occurrence when performing an OSA major upgrade. Whilst the services will get restarted during 'setup-openstack', this means they are unstable until this runs which could take a long time in a large deployment.

- File descriptors accumulate against services until the per-service limit is reached, at which point RabbitMQ connections start to fail. (see also https://bugs.launchpad.net/oslo.messaging/+bug/1949964)
This happens more for heavily loaded services (such as nova-compute on a hypervisor with lots of instances) and is aggravated by the oslo.messaging RabbitMQ connection pool increasing and decreasing in size over time.

The latter issue has a partial fix via an update to the underlying 'amqp' library (see https://review.opendev.org/c/openstack/requirements/+/823104 and https://review.opendev.org/c/openstack/requirements/+/823350).

The cause of the remaining issues has been tracked to the change to the 'heartbeat_in_pthread' default in the oslo.messaging release used in Wallaby (https://review.opendev.org/c/openstack/oslo.messaging/+/747395). Changing this configuration option back to 'False' for impacted services resolves the problems.

We have observed one or both of the above issues in the following services, but this may not be exhaustive:

- nova-compute
- neutron-linuxbridge-agent
- neutron-l3-agent
- neutron-bgp-dragent
- neutron-dhcp-agent
- neutron-metadata-agent
- neutron-server (not noted by us, but identified by another bug reporter)
- cinder-volume

Further testing suggests that services which run under uwsgi may not be impacted (or at least not to the same degree), so these may not need this default to be reverted.

Adding an explicit OSA configuration option for this problematic parameter would help to reduce the amount of overrides which a deployment may need to carry in order to work around the issue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-os_nova (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-os_neutron (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-os_cinder (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/openstack-ansible/+/833239

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on openstack-ansible (master)

Change abandoned by "Andrew Bonney <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/openstack-ansible/+/833239
Reason: Moved logic to roles

Revision history for this message
Andrew Bonney (andrewbonney) wrote :

The proposed patches tackle the immediate issue. It is possible that the issue could also occur for services such as nova-conductor/scheduler, and similar for cinder and potentially other services, but we haven't detected that in our own deployments. Changing the variable for these services would also be more challenging as the config file is currently shared between these and the uwsgi based API services.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-os_cinder (master)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-os_cinder/+/833238
Committed: https://opendev.org/openstack/openstack-ansible-os_cinder/commit/6efa45e2bdc3d478eead0b8d7b179c2ed49f9841
Submitter: "Zuul (22348)"
Branch: master

commit 6efa45e2bdc3d478eead0b8d7b179c2ed49f9841
Author: Andrew Bonney <email address hidden>
Date: Fri Mar 11 09:29:23 2022 +0000

    Add configuration option for heartbeat_in_pthread

    This configuration option has been observed to result in file
    descriptor leaks in certain circumstances. A variable is added
    here so that it can be easily overridden.

    Related-Bug: #1961603
    Change-Id: I8155264b181d6f21728804ef8260979931597427

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-os_nova (master)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/833236
Committed: https://opendev.org/openstack/openstack-ansible-os_nova/commit/b1e38084ccb6dce4967fa8aeaf33a47ffd03689b
Submitter: "Zuul (22348)"
Branch: master

commit b1e38084ccb6dce4967fa8aeaf33a47ffd03689b
Author: Andrew Bonney <email address hidden>
Date: Fri Mar 11 08:48:16 2022 +0000

    Add configuration option for heartbeat_in_pthread

    This configuration option has been observed to result in file
    descriptor leaks in certain circumstances. A variable is added
    here so that it can be easily overridden.

    Change-Id: I7de034307da9352e6f5d1f5f175a330fb8c86463
    Related-Bug: #1961603

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-os_neutron (master)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/833237
Committed: https://opendev.org/openstack/openstack-ansible-os_neutron/commit/01951cd77b7edfc1008da14c1afd4e260fcf432f
Submitter: "Zuul (22348)"
Branch: master

commit 01951cd77b7edfc1008da14c1afd4e260fcf432f
Author: Andrew Bonney <email address hidden>
Date: Fri Mar 11 09:28:12 2022 +0000

    Add configuration option for heartbeat_in_pthread

    This configuration option has been observed to result in file
    descriptor leaks in certain circumstances. A variable is added
    here so that it can be easily overridden.

    Change-Id: I833d72715daff81b64da077e899615b9b2002650
    Related-Bug: #1961603

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-os_cinder (stable/xena)

Related fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/openstack-ansible-os_cinder/+/833863

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-os_cinder (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/openstack-ansible-os_cinder/+/833864

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-os_nova (stable/xena)

Related fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/833865

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-os_nova (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/833866

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-os_neutron (stable/xena)

Related fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/833867

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible-os_neutron (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/833868

Changed in openstack-ansible:
status: New → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-os_nova (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/833865
Committed: https://opendev.org/openstack/openstack-ansible-os_nova/commit/85b893fe9c8a6c96373231d4d87de29afbe08181
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 85b893fe9c8a6c96373231d4d87de29afbe08181
Author: Andrew Bonney <email address hidden>
Date: Fri Mar 11 08:48:16 2022 +0000

    Add configuration option for heartbeat_in_pthread

    This configuration option has been observed to result in file
    descriptor leaks in certain circumstances. A variable is added
    here so that it can be easily overridden.

    Depends-On: https://review.opendev.org/c/openstack/openstack-ansible/+/835548
    Change-Id: I7de034307da9352e6f5d1f5f175a330fb8c86463
    Related-Bug: #1961603
    (cherry picked from commit b1e38084ccb6dce4967fa8aeaf33a47ffd03689b)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-os_cinder (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-os_cinder/+/833863
Committed: https://opendev.org/openstack/openstack-ansible-os_cinder/commit/ca32f8b7459224e882966b8a689e5bdc383b7e9e
Submitter: "Zuul (22348)"
Branch: stable/xena

commit ca32f8b7459224e882966b8a689e5bdc383b7e9e
Author: Andrew Bonney <email address hidden>
Date: Fri Mar 11 09:29:23 2022 +0000

    Add configuration option for heartbeat_in_pthread

    This configuration option has been observed to result in file
    descriptor leaks in certain circumstances. A variable is added
    here so that it can be easily overridden.

    Depends-On: https://review.opendev.org/c/openstack/openstack-ansible/+/835548
    Related-Bug: #1961603
    Change-Id: I8155264b181d6f21728804ef8260979931597427
    (cherry picked from commit 6efa45e2bdc3d478eead0b8d7b179c2ed49f9841)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-os_cinder (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-os_cinder/+/833864
Committed: https://opendev.org/openstack/openstack-ansible-os_cinder/commit/d9c6359b02aaa695fe767895fcf5c5dce2a254e2
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit d9c6359b02aaa695fe767895fcf5c5dce2a254e2
Author: Andrew Bonney <email address hidden>
Date: Fri Mar 11 09:29:23 2022 +0000

    Add configuration option for heartbeat_in_pthread

    This configuration option has been observed to result in file
    descriptor leaks in certain circumstances. A variable is added
    here so that it can be easily overridden.

    Related-Bug: #1961603
    Change-Id: I8155264b181d6f21728804ef8260979931597427
    (cherry picked from commit 6efa45e2bdc3d478eead0b8d7b179c2ed49f9841)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-os_nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/833866
Committed: https://opendev.org/openstack/openstack-ansible-os_nova/commit/e94066394018a4e76499603f31160773993f6112
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit e94066394018a4e76499603f31160773993f6112
Author: Andrew Bonney <email address hidden>
Date: Fri Mar 11 08:48:16 2022 +0000

    Add configuration option for heartbeat_in_pthread

    This configuration option has been observed to result in file
    descriptor leaks in certain circumstances. A variable is added
    here so that it can be easily overridden.

    Change-Id: I7de034307da9352e6f5d1f5f175a330fb8c86463
    Related-Bug: #1961603
    (cherry picked from commit b1e38084ccb6dce4967fa8aeaf33a47ffd03689b)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-os_neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/833868
Committed: https://opendev.org/openstack/openstack-ansible-os_neutron/commit/713ae64c49cb145eac3a445127210c5e8784aed6
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 713ae64c49cb145eac3a445127210c5e8784aed6
Author: Andrew Bonney <email address hidden>
Date: Fri Mar 11 09:28:12 2022 +0000

    Add configuration option for heartbeat_in_pthread

    This configuration option has been observed to result in file
    descriptor leaks in certain circumstances. A variable is added
    here so that it can be easily overridden.

    Change-Id: I833d72715daff81b64da077e899615b9b2002650
    Related-Bug: #1961603
    (cherry picked from commit 01951cd77b7edfc1008da14c1afd4e260fcf432f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible-os_neutron (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/833867
Committed: https://opendev.org/openstack/openstack-ansible-os_neutron/commit/32a5fdf16788b921b340091b8032625b801d7c6a
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 32a5fdf16788b921b340091b8032625b801d7c6a
Author: Andrew Bonney <email address hidden>
Date: Fri Mar 11 09:28:12 2022 +0000

    Add configuration option for heartbeat_in_pthread

    This configuration option has been observed to result in file
    descriptor leaks in certain circumstances. A variable is added
    here so that it can be easily overridden.

    Depends-On: https://review.opendev.org/c/openstack/openstack-ansible/+/835548
    Change-Id: I833d72715daff81b64da077e899615b9b2002650
    Related-Bug: #1961603
    (cherry picked from commit 01951cd77b7edfc1008da14c1afd4e260fcf432f)

Changed in openstack-ansible:
status: Fix Committed → Fix Released
Revision history for this message
Andrew Bonney (andrewbonney) wrote :

I noted a fix went into oslo.messaging in https://bugs.launchpad.net/oslo.messaging/+bug/1961402. I haven't tested it yet, but this may mean the overrides we put in place as part of this bug are no longer required.

Revision history for this message
Bjoern (bjoern-t) wrote :

The heartbeat_in_pthread is deprecated already:

Deprecated: Option "heartbeat_in_pthread" from group "oslo_messaging_rabbit" is deprecated for removal.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.