2020-06-11 02:16:31 |
norman shen |
bug |
|
|
added bug |
2020-06-30 01:01:30 |
OpenStack Infra |
oslo.messaging: status |
New |
In Progress |
|
2020-06-30 01:01:30 |
OpenStack Infra |
oslo.messaging: assignee |
|
norman shen (jshen28) |
|
2020-07-28 16:25:37 |
OpenStack Infra |
oslo.messaging: status |
In Progress |
Fix Released |
|
2020-07-31 20:29:16 |
OpenStack Infra |
tags |
|
in-stable-ussuri |
|
2020-12-02 17:27:12 |
Adam Vinsh |
bug |
|
|
added subscriber Adam Vinsh |
2021-07-20 14:11:35 |
Christian Rohmann |
bug |
|
|
added subscriber Christian Rohmann |
2021-09-02 19:59:12 |
OpenStack Infra |
tags |
in-stable-ussuri |
in-stable-train in-stable-ussuri |
|
2021-09-08 15:26:57 |
OpenStack Infra |
tags |
in-stable-train in-stable-ussuri |
in-stable-stein in-stable-train in-stable-ussuri |
|
2022-01-11 11:49:20 |
Hemanth Nakkina |
tags |
in-stable-stein in-stable-train in-stable-ussuri |
in-stable-stein in-stable-train in-stable-ussuri sts |
|
2022-01-12 08:58:44 |
Hemanth Nakkina |
bug task added |
|
oslo.messaging (Ubuntu) |
|
2022-01-12 08:59:02 |
Hemanth Nakkina |
nominated for series |
|
Ubuntu Bionic |
|
2022-01-12 08:59:02 |
Hemanth Nakkina |
bug task added |
|
oslo.messaging (Ubuntu Bionic) |
|
2022-01-12 08:59:31 |
Hemanth Nakkina |
bug task added |
|
cloud-archive |
|
2022-01-12 08:59:51 |
Hemanth Nakkina |
nominated for series |
|
cloud-archive/train |
|
2022-01-12 08:59:51 |
Hemanth Nakkina |
bug task added |
|
cloud-archive/train |
|
2022-01-12 08:59:51 |
Hemanth Nakkina |
nominated for series |
|
cloud-archive/stein |
|
2022-01-12 08:59:51 |
Hemanth Nakkina |
bug task added |
|
cloud-archive/stein |
|
2022-01-12 08:59:51 |
Hemanth Nakkina |
nominated for series |
|
cloud-archive/queens |
|
2022-01-12 08:59:51 |
Hemanth Nakkina |
bug task added |
|
cloud-archive/queens |
|
2022-01-12 08:59:51 |
Hemanth Nakkina |
nominated for series |
|
cloud-archive/rocky |
|
2022-01-12 08:59:51 |
Hemanth Nakkina |
bug task added |
|
cloud-archive/rocky |
|
2022-01-12 09:46:51 |
Hemanth Nakkina |
description |
We are using Openstack Rocky as well as rabbitmq 3.7.4 in our production.
Occasionally I saw many following lines in log
2020-06-11 02:03:06.753 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
2020-06-11 02:03:21.754 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
2020-06-11 02:03:36.755 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
2020-06-11 02:03:51.756 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
2020-06-11 02:04:06.757 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
2020-06-11 02:04:21.757 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
2020-06-11 02:04:36.758 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
2020-06-11 02:04:51.759 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
heartbeart interval is 60s and rate is 2. Although it is screaming for missing hearbeats seems rabbitmq server is running fine and messages are received and processed successfully. |
We are using Openstack Rocky as well as rabbitmq 3.7.4 in our production.
Occasionally I saw many following lines in log
2020-06-11 02:03:06.753 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
2020-06-11 02:03:21.754 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
2020-06-11 02:03:36.755 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
2020-06-11 02:03:51.756 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
2020-06-11 02:04:06.757 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
2020-06-11 02:04:21.757 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
2020-06-11 02:04:36.758 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
2020-06-11 02:04:51.759 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed
heartbeart interval is 60s and rate is 2. Although it is screaming for missing hearbeats seems rabbitmq server is running fine and messages are received and processed successfully.
***************************************************
SRU Details
-----------
[Impact]
AMQP messages are dropped sometimes resulted in resource creation errors (happened on an environment twice in a week).
Catching the ConnectionForced AMQP connection and reestablish the connection immediately will remediate the issue.
[Test Case]
Reproducing the issue is trickysome. Here are the steps that might help in reproducing the issue.
1. Deploy OpenStack
(If stsstack-bundles project is used, run command ./generate-bundle.sh -s bionic -r stein -n ddmi:stsstack --run)
2. Change heartbeat_timeout_threshold to 20s in nova.conf and restart nova-api
On nova-cloud-controller,
[oslo_messaging_rabbit]
heartbeat_timeout_threshold = 20
systemctl restart apache2.service
3. Create and delete instances continuously
./tools/instance_launch.sh 10 cirros # command on stsstack-bundles
openstack server list -c ID -f value | xargs openstack server delete
4. On rabbitmq server, drop packets from nova-api -> rabbitmq and allow them randomly
sudo iptables -A INPUT -p tcp --dport 5672 -s 10.5.1.55 -j DROP
sudo iptables -D INPUT 1
5. Perform steps 3,4 until you see the following message in nova-api log
WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: amqp.exceptions.ConnectionForced: Too many heartbeats missed
6. Install the fixed python-oslo.messaging package on nova-cloud-controller
And restart apache service.
7. Perform steps 3,4 and verify nova-api log for the following INFO message.
INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Too many heartbeats missed
As the above test case is random in nature to reproduce, as additional measure, continuous integration tests for nova-cloud-controller will be run against the packages that are in -proposed.
[Regression Potential]
I do not foresee any regression potential as the patch just adds a new exception and reconnects to AMQP server immediately. |
|
2022-01-12 10:36:52 |
Hemanth Nakkina |
bug task deleted |
oslo.messaging (Ubuntu) |
|
|
2022-01-12 10:36:55 |
Hemanth Nakkina |
bug task deleted |
oslo.messaging (Ubuntu Bionic) |
|
|
2022-01-12 10:37:04 |
Hemanth Nakkina |
bug task deleted |
cloud-archive/queens |
|
|
2022-01-12 10:37:09 |
Hemanth Nakkina |
bug task deleted |
cloud-archive/rocky |
|
|
2022-01-12 10:37:46 |
Hemanth Nakkina |
attachment added |
|
Debdiff for UCA train https://bugs.launchpad.net/cloud-archive/+bug/1883038/+attachment/5553547/+files/lp1883038_train.debdiff |
|
2022-01-12 10:38:10 |
Hemanth Nakkina |
attachment added |
|
Debdiff for UCA stein https://bugs.launchpad.net/cloud-archive/+bug/1883038/+attachment/5553548/+files/lp1883038_stein.debdiff |
|
2022-01-18 15:50:46 |
Corey Bryant |
cloud-archive/train: status |
New |
Fix Committed |
|
2022-01-18 15:50:47 |
Corey Bryant |
tags |
in-stable-stein in-stable-train in-stable-ussuri sts |
in-stable-stein in-stable-train in-stable-ussuri sts verification-train-needed |
|
2022-01-18 15:50:48 |
Corey Bryant |
cloud-archive/stein: status |
New |
Fix Committed |
|
2022-01-18 15:51:51 |
Corey Bryant |
cloud-archive: status |
New |
Invalid |
|
2022-01-20 05:34:06 |
Hemanth Nakkina |
tags |
in-stable-stein in-stable-train in-stable-ussuri sts verification-train-needed |
in-stable-stein in-stable-train in-stable-ussuri sts verification-stein-done verification-train-done |
|
2022-01-27 09:27:47 |
Hemanth Nakkina |
tags |
in-stable-stein in-stable-train in-stable-ussuri sts verification-stein-done verification-train-done |
in-stable-stein in-stable-train in-stable-ussuri sts verification-done verification-stein-done verification-train-done |
|
2022-01-27 13:04:14 |
Corey Bryant |
cloud-archive/train: status |
Fix Committed |
Fix Released |
|
2022-01-27 13:04:23 |
Corey Bryant |
cloud-archive/stein: status |
Fix Committed |
Fix Released |
|