Excessive number of ConnectionForced: Too many heartbeats missed in logs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
Invalid
|
Undecided
|
Unassigned | ||
Stein |
Fix Released
|
Undecided
|
Unassigned | ||
Train |
Fix Released
|
Undecided
|
Unassigned | ||
oslo.messaging |
Fix Released
|
Undecided
|
norman shen |
Bug Description
We are using Openstack Rocky as well as rabbitmq 3.7.4 in our production.
Occasionally I saw many following lines in log
2020-06-11 02:03:06.753 3877409 WARNING oslo.messaging.
2020-06-11 02:03:21.754 3877409 WARNING oslo.messaging.
2020-06-11 02:03:36.755 3877409 WARNING oslo.messaging.
2020-06-11 02:03:51.756 3877409 WARNING oslo.messaging.
2020-06-11 02:04:06.757 3877409 WARNING oslo.messaging.
2020-06-11 02:04:21.757 3877409 WARNING oslo.messaging.
2020-06-11 02:04:36.758 3877409 WARNING oslo.messaging.
2020-06-11 02:04:51.759 3877409 WARNING oslo.messaging.
heartbeart interval is 60s and rate is 2. Although it is screaming for missing hearbeats seems rabbitmq server is running fine and messages are received and processed successfully.
*******
SRU Details
-----------
[Impact]
AMQP messages are dropped sometimes resulted in resource creation errors (happened on an environment twice in a week).
Catching the ConnectionForced AMQP connection and reestablish the connection immediately will remediate the issue.
[Test Case]
Reproducing the issue is trickysome. Here are the steps that might help in reproducing the issue.
1. Deploy OpenStack
(If stsstack-bundles project is used, run command ./generate-
2. Change heartbeat_
On nova-cloud-
[oslo_messaging
heartbeat_
systemctl restart apache2.service
3. Create and delete instances continuously
./tools/
openstack server list -c ID -f value | xargs openstack server delete
4. On rabbitmq server, drop packets from nova-api -> rabbitmq and allow them randomly
sudo iptables -A INPUT -p tcp --dport 5672 -s 10.5.1.55 -j DROP
sudo iptables -D INPUT 1
5. Perform steps 3,4 until you see the following message in nova-api log
WARNING oslo.messaging.
6. Install the fixed python-
And restart apache service.
7. Perform steps 3,4 and verify nova-api log for the following INFO message.
INFO oslo.messaging.
As the above test case is random in nature to reproduce, as additional measure, continuous integration tests for nova-cloud-
[Regression Potential]
I do not foresee any regression potential as the patch just adds a new exception and reconnects to AMQP server immediately.
tags: | added: sts |
description: | updated |
tags: | added: verification-done |
Fix proposed to branch: master /review. opendev. org/738538
Review: https:/