Connections to RabbitMQ are less stable from Wallaby onwards
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack-Ansible |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
We have been tracking two issues which have emerged since the Wallaby release and relate to connections between services such as 'nova-compute' and RabbitMQ. These manifest as follows:
- Services fail to auto-recover when the RabbitMQ cluster is taken down and restarted. (see also https:/
This is a common occurrence when performing an OSA major upgrade. Whilst the services will get restarted during 'setup-openstack', this means they are unstable until this runs which could take a long time in a large deployment.
- File descriptors accumulate against services until the per-service limit is reached, at which point RabbitMQ connections start to fail. (see also https:/
This happens more for heavily loaded services (such as nova-compute on a hypervisor with lots of instances) and is aggravated by the oslo.messaging RabbitMQ connection pool increasing and decreasing in size over time.
The latter issue has a partial fix via an update to the underlying 'amqp' library (see https:/
The cause of the remaining issues has been tracked to the change to the 'heartbeat_
We have observed one or both of the above issues in the following services, but this may not be exhaustive:
- nova-compute
- neutron-
- neutron-l3-agent
- neutron-bgp-dragent
- neutron-dhcp-agent
- neutron-
- neutron-server (not noted by us, but identified by another bug reporter)
- cinder-volume
Further testing suggests that services which run under uwsgi may not be impacted (or at least not to the same degree), so these may not need this default to be reverted.
Adding an explicit OSA configuration option for this problematic parameter would help to reduce the amount of overrides which a deployment may need to carry in order to work around the issue.
Changed in openstack-ansible: | |
status: | New → Fix Committed |
Changed in openstack-ansible: | |
status: | Fix Committed → Fix Released |
Related fix proposed to branch: master /review. opendev. org/c/openstack /openstack- ansible- os_nova/ +/833236
Review: https:/