Activity log for bug #1883038

Date Who What changed Old value New value Message
2020-06-11 02:16:31 norman shen bug added bug
2020-06-30 01:01:30 OpenStack Infra oslo.messaging: status New In Progress
2020-06-30 01:01:30 OpenStack Infra oslo.messaging: assignee norman shen (jshen28)
2020-07-28 16:25:37 OpenStack Infra oslo.messaging: status In Progress Fix Released
2020-07-31 20:29:16 OpenStack Infra tags in-stable-ussuri
2020-12-02 17:27:12 Adam Vinsh bug added subscriber Adam Vinsh
2021-07-20 14:11:35 Christian Rohmann bug added subscriber Christian Rohmann
2021-09-02 19:59:12 OpenStack Infra tags in-stable-ussuri in-stable-train in-stable-ussuri
2021-09-08 15:26:57 OpenStack Infra tags in-stable-train in-stable-ussuri in-stable-stein in-stable-train in-stable-ussuri
2022-01-11 11:49:20 Hemanth Nakkina tags in-stable-stein in-stable-train in-stable-ussuri in-stable-stein in-stable-train in-stable-ussuri sts
2022-01-12 08:58:44 Hemanth Nakkina bug task added oslo.messaging (Ubuntu)
2022-01-12 08:59:02 Hemanth Nakkina nominated for series Ubuntu Bionic
2022-01-12 08:59:02 Hemanth Nakkina bug task added oslo.messaging (Ubuntu Bionic)
2022-01-12 08:59:31 Hemanth Nakkina bug task added cloud-archive
2022-01-12 08:59:51 Hemanth Nakkina nominated for series cloud-archive/train
2022-01-12 08:59:51 Hemanth Nakkina bug task added cloud-archive/train
2022-01-12 08:59:51 Hemanth Nakkina nominated for series cloud-archive/stein
2022-01-12 08:59:51 Hemanth Nakkina bug task added cloud-archive/stein
2022-01-12 08:59:51 Hemanth Nakkina nominated for series cloud-archive/queens
2022-01-12 08:59:51 Hemanth Nakkina bug task added cloud-archive/queens
2022-01-12 08:59:51 Hemanth Nakkina nominated for series cloud-archive/rocky
2022-01-12 08:59:51 Hemanth Nakkina bug task added cloud-archive/rocky
2022-01-12 09:46:51 Hemanth Nakkina description We are using Openstack Rocky as well as rabbitmq 3.7.4 in our production. Occasionally I saw many following lines in log 2020-06-11 02:03:06.753 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed 2020-06-11 02:03:21.754 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed 2020-06-11 02:03:36.755 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed 2020-06-11 02:03:51.756 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed 2020-06-11 02:04:06.757 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed 2020-06-11 02:04:21.757 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed 2020-06-11 02:04:36.758 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed 2020-06-11 02:04:51.759 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed heartbeart interval is 60s and rate is 2. Although it is screaming for missing hearbeats seems rabbitmq server is running fine and messages are received and processed successfully. We are using Openstack Rocky as well as rabbitmq 3.7.4 in our production. Occasionally I saw many following lines in log 2020-06-11 02:03:06.753 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed 2020-06-11 02:03:21.754 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed 2020-06-11 02:03:36.755 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed 2020-06-11 02:03:51.756 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed 2020-06-11 02:04:06.757 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed 2020-06-11 02:04:21.757 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed 2020-06-11 02:04:36.758 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed 2020-06-11 02:04:51.759 3877409 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed heartbeart interval is 60s and rate is 2. Although it is screaming for missing hearbeats seems rabbitmq server is running fine and messages are received and processed successfully. *************************************************** SRU Details ----------- [Impact] AMQP messages are dropped sometimes resulted in resource creation errors (happened on an environment twice in a week). Catching the ConnectionForced AMQP connection and reestablish the connection immediately will remediate the issue. [Test Case] Reproducing the issue is trickysome. Here are the steps that might help in reproducing the issue. 1. Deploy OpenStack (If stsstack-bundles project is used, run command ./generate-bundle.sh -s bionic -r stein -n ddmi:stsstack --run) 2. Change heartbeat_timeout_threshold to 20s in nova.conf and restart nova-api On nova-cloud-controller, [oslo_messaging_rabbit] heartbeat_timeout_threshold = 20 systemctl restart apache2.service 3. Create and delete instances continuously ./tools/instance_launch.sh 10 cirros # command on stsstack-bundles openstack server list -c ID -f value | xargs openstack server delete 4. On rabbitmq server, drop packets from nova-api -> rabbitmq and allow them randomly sudo iptables -A INPUT -p tcp --dport 5672 -s 10.5.1.55 -j DROP sudo iptables -D INPUT 1 5. Perform steps 3,4 until you see the following message in nova-api log WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: amqp.exceptions.ConnectionForced: Too many heartbeats missed 6. Install the fixed python-oslo.messaging package on nova-cloud-controller And restart apache service. 7. Perform steps 3,4 and verify nova-api log for the following INFO message. INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: Too many heartbeats missed As the above test case is random in nature to reproduce, as additional measure, continuous integration tests for nova-cloud-controller will be run against the packages that are in -proposed. [Regression Potential] I do not foresee any regression potential as the patch just adds a new exception and reconnects to AMQP server immediately.
2022-01-12 10:36:52 Hemanth Nakkina bug task deleted oslo.messaging (Ubuntu)
2022-01-12 10:36:55 Hemanth Nakkina bug task deleted oslo.messaging (Ubuntu Bionic)
2022-01-12 10:37:04 Hemanth Nakkina bug task deleted cloud-archive/queens
2022-01-12 10:37:09 Hemanth Nakkina bug task deleted cloud-archive/rocky
2022-01-12 10:37:46 Hemanth Nakkina attachment added Debdiff for UCA train https://bugs.launchpad.net/cloud-archive/+bug/1883038/+attachment/5553547/+files/lp1883038_train.debdiff
2022-01-12 10:38:10 Hemanth Nakkina attachment added Debdiff for UCA stein https://bugs.launchpad.net/cloud-archive/+bug/1883038/+attachment/5553548/+files/lp1883038_stein.debdiff
2022-01-18 15:50:46 Corey Bryant cloud-archive/train: status New Fix Committed
2022-01-18 15:50:47 Corey Bryant tags in-stable-stein in-stable-train in-stable-ussuri sts in-stable-stein in-stable-train in-stable-ussuri sts verification-train-needed
2022-01-18 15:50:48 Corey Bryant cloud-archive/stein: status New Fix Committed
2022-01-18 15:51:51 Corey Bryant cloud-archive: status New Invalid
2022-01-20 05:34:06 Hemanth Nakkina tags in-stable-stein in-stable-train in-stable-ussuri sts verification-train-needed in-stable-stein in-stable-train in-stable-ussuri sts verification-stein-done verification-train-done
2022-01-27 09:27:47 Hemanth Nakkina tags in-stable-stein in-stable-train in-stable-ussuri sts verification-stein-done verification-train-done in-stable-stein in-stable-train in-stable-ussuri sts verification-done verification-stein-done verification-train-done
2022-01-27 13:04:14 Corey Bryant cloud-archive/train: status Fix Committed Fix Released
2022-01-27 13:04:23 Corey Bryant cloud-archive/stein: status Fix Committed Fix Released