Activity log for bug #1789177

Date Who What changed Old value New value Message
2018-08-27 08:16:34 Oleg Bondarev bug added bug
2018-08-27 08:16:39 Oleg Bondarev oslo.messaging: assignee Oleg Bondarev (obondarev)
2018-08-27 08:22:08 OpenStack Infra oslo.messaging: status New In Progress
2018-11-01 20:03:11 OpenStack Infra oslo.messaging: status In Progress Fix Released
2020-03-25 05:05:13 OpenStack Infra tags in-stable-rocky
2020-05-18 06:56:43 norman shen attachment added exchanges.png https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5373233/+files/exchanges.png
2020-07-02 02:32:42 norman shen attachment added pics.zip https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5388877/+files/pics.zip
2020-08-25 11:58:33 OpenStack Infra tags in-stable-rocky in-stable-rocky in-stable-ussuri
2020-12-16 07:59:17 Seyeong Kim bug task added python-oslo.messaging (Ubuntu)
2020-12-16 07:59:38 Seyeong Kim python-oslo.messaging (Ubuntu): assignee Seyeong Kim (seyeongkim)
2020-12-17 03:30:25 Seyeong Kim tags in-stable-rocky in-stable-ussuri in-stable-rocky in-stable-ussuri sts
2020-12-17 03:33:02 Seyeong Kim description Input: - OpenStack Pike cluster with ~500 nodes - DVR enabled in neutron - Lots of messages Scenario: failover of one rabbit node in a cluster Issue: after failed rabbit node gets back online some rpc communications appear broken Logs from rabbit: =ERROR REPORT==== 10-Aug-2018::17:24:37 === Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1: operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack' Investigation: After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing. Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec) Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue. Is there a good reason for creating new exchanges for direct messages? [Impact] Affected Bionic Not affected Focal [Test Case] TBD [Where problems could occur] TBD [Others] // original description Input:  - OpenStack Pike cluster with ~500 nodes  - DVR enabled in neutron  - Lots of messages Scenario: failover of one rabbit node in a cluster Issue: after failed rabbit node gets back online some rpc communications appear broken Logs from rabbit: =ERROR REPORT==== 10-Aug-2018::17:24:37 === Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1: operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack' Investigation: After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing. Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec) Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue. Is there a good reason for creating new exchanges for direct messages?
2020-12-17 03:45:10 Seyeong Kim attachment added lp1789177_bionic.debdiff https://bugs.launchpad.net/ubuntu/+source/python-oslo.messaging/+bug/1789177/+attachment/5444392/+files/lp1789177_bionic.debdiff
2020-12-17 04:25:53 Ubuntu Foundations Team Bug Bot tags in-stable-rocky in-stable-ussuri sts in-stable-rocky in-stable-ussuri patch sts
2020-12-17 04:25:58 Ubuntu Foundations Team Bug Bot bug added subscriber Ubuntu Sponsors Team
2020-12-17 05:43:07 Seyeong Kim attachment added lp1789177_queens.debdiff https://bugs.launchpad.net/ubuntu/+source/python-oslo.messaging/+bug/1789177/+attachment/5444422/+files/lp1789177_queens.debdiff
2020-12-17 07:45:06 Nobuto Murata bug added subscriber Nobuto Murata
2020-12-18 00:59:18 Mathew Hodson bug task added cloud-archive
2020-12-18 01:00:09 Mathew Hodson nominated for series Ubuntu Bionic
2020-12-18 01:00:09 Mathew Hodson bug task added python-oslo.messaging (Ubuntu Bionic)
2020-12-18 02:10:18 Seyeong Kim description [Impact] Affected Bionic Not affected Focal [Test Case] TBD [Where problems could occur] TBD [Others] // original description Input:  - OpenStack Pike cluster with ~500 nodes  - DVR enabled in neutron  - Lots of messages Scenario: failover of one rabbit node in a cluster Issue: after failed rabbit node gets back online some rpc communications appear broken Logs from rabbit: =ERROR REPORT==== 10-Aug-2018::17:24:37 === Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1: operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack' Investigation: After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing. Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec) Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue. Is there a good reason for creating new exchanges for direct messages? [Impact] If there are many exchanges and queues, after failing over, rabbitmq-server shows us error that exchanges are cannot be found. Affected  Bionic (Queens) Not affected  Focal [Test Case] 1. deploy simple rabbitmq cluster - https://pastebin.ubuntu.com/p/MR76VbMwY5/ 2. juju ssh neutron-gateway/0 - for i in {1..1000}; do systemd restart neutron-metering-agent; sleep 2; done 3. it would be better if we can add more exchanges, queues, bindings - rabbitmq-plugins enable rabbitmq_management - rabbitmqctl add_user test password - rabbitmqctl set_user_tags test administrator - rabbitmqctl set_permissions -p openstack test ".*" ".*" ".*" - https://pastebin.ubuntu.com/p/brw7rSXD7q/ ( save this as create.sh) - for i in {1..2000}; do ./create.sh test_$i; done 4. restart rabbitmq-server service or shutdown machine and turn on several times. 5. you can see the exchange not found error [Where problems could occur] 1. every service which uses oslo.messaging need to be restarted. 2. Message transferring could be an issue [Others] // original description Input:  - OpenStack Pike cluster with ~500 nodes  - DVR enabled in neutron  - Lots of messages Scenario: failover of one rabbit node in a cluster Issue: after failed rabbit node gets back online some rpc communications appear broken Logs from rabbit: =ERROR REPORT==== 10-Aug-2018::17:24:37 === Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1: operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack' Investigation: After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing. Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec) Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue. Is there a good reason for creating new exchanges for direct messages?
2020-12-18 02:56:28 Seyeong Kim attachment removed lp1789177_bionic.debdiff https://bugs.launchpad.net/cloud-archive/+bug/1789177/+attachment/5444392/+files/lp1789177_bionic.debdiff
2020-12-18 02:56:36 Seyeong Kim attachment removed lp1789177_queens.debdiff https://bugs.launchpad.net/cloud-archive/+bug/1789177/+attachment/5444422/+files/lp1789177_queens.debdiff
2020-12-18 03:22:29 Seyeong Kim attachment added lp1789177_bionic.debdiff https://bugs.launchpad.net/ubuntu/+source/python-oslo.messaging/+bug/1789177/+attachment/5444720/+files/lp1789177_bionic.debdiff
2020-12-18 03:22:39 Seyeong Kim attachment added lp1789177_queens.debdiff https://bugs.launchpad.net/ubuntu/+source/python-oslo.messaging/+bug/1789177/+attachment/5444721/+files/lp1789177_queens.debdiff
2020-12-18 03:23:28 Mathew Hodson python-oslo.messaging (Ubuntu): status New Fix Released
2020-12-18 03:31:26 Mathew Hodson python-oslo.messaging (Ubuntu): importance Undecided Medium
2020-12-18 03:31:29 Mathew Hodson python-oslo.messaging (Ubuntu Bionic): importance Undecided Medium
2020-12-18 04:02:20 Seyeong Kim attachment added lp1789177_xenial.debdiff https://bugs.launchpad.net/ubuntu/+source/python-oslo.messaging/+bug/1789177/+attachment/5444730/+files/lp1789177_xenial.debdiff
2020-12-18 04:14:39 Seyeong Kim attachment added lp1789177_mitaka.debdiff https://bugs.launchpad.net/ubuntu/+source/python-oslo.messaging/+bug/1789177/+attachment/5444740/+files/lp1789177_mitaka.debdiff
2020-12-18 05:07:06 Mathew Hodson nominated for series Ubuntu Xenial
2020-12-18 05:07:06 Mathew Hodson bug task added python-oslo.messaging (Ubuntu Xenial)
2020-12-18 05:07:14 Mathew Hodson python-oslo.messaging (Ubuntu Xenial): importance Undecided Medium
2020-12-18 05:49:25 Seyeong Kim nominated for series cloud-archive/mitaka
2020-12-18 05:49:25 Seyeong Kim bug task added cloud-archive/mitaka
2020-12-18 05:49:25 Seyeong Kim nominated for series cloud-archive/queens
2020-12-18 05:49:25 Seyeong Kim bug task added cloud-archive/queens
2020-12-18 06:35:22 Seyeong Kim attachment added lp1789177_stein.debdiff https://bugs.launchpad.net/ubuntu/+source/python-oslo.messaging/+bug/1789177/+attachment/5444807/+files/lp1789177_stein.debdiff
2020-12-18 06:35:36 Seyeong Kim attachment added lp1789177_train.debdiff https://bugs.launchpad.net/ubuntu/+source/python-oslo.messaging/+bug/1789177/+attachment/5444808/+files/lp1789177_train.debdiff
2020-12-21 04:01:40 Seyeong Kim bug added subscriber Ubuntu Stable Release Updates Team
2020-12-21 04:08:23 Seyeong Kim python-oslo.messaging (Ubuntu Xenial): status New In Progress
2020-12-21 04:08:29 Seyeong Kim python-oslo.messaging (Ubuntu Xenial): assignee Seyeong Kim (seyeongkim)
2020-12-21 04:08:36 Seyeong Kim python-oslo.messaging (Ubuntu Bionic): status New In Progress
2020-12-21 04:08:37 Seyeong Kim cloud-archive/queens: status New In Progress
2020-12-21 04:08:42 Seyeong Kim python-oslo.messaging (Ubuntu Bionic): assignee Seyeong Kim (seyeongkim)
2020-12-21 04:08:46 Seyeong Kim cloud-archive/queens: assignee Seyeong Kim (seyeongkim)
2020-12-21 04:08:48 Seyeong Kim python-oslo.messaging (Ubuntu): assignee Seyeong Kim (seyeongkim)
2020-12-21 04:08:51 Seyeong Kim cloud-archive/mitaka: assignee Seyeong Kim (seyeongkim)
2020-12-23 23:16:59 Dominique Poulain bug added subscriber Dominique Poulain
2021-01-04 15:34:35 Chris MacNaughton nominated for series cloud-archive/stein
2021-01-04 15:34:35 Chris MacNaughton bug task added cloud-archive/stein
2021-01-04 15:34:35 Chris MacNaughton nominated for series cloud-archive/train
2021-01-04 15:34:35 Chris MacNaughton bug task added cloud-archive/train
2021-01-06 07:41:07 Chris MacNaughton cloud-archive/train: status New Fix Committed
2021-01-06 07:41:09 Chris MacNaughton tags in-stable-rocky in-stable-ussuri patch sts in-stable-rocky in-stable-ussuri patch sts verification-train-needed
2021-01-06 07:51:48 Chris MacNaughton cloud-archive/stein: status New Fix Committed
2021-01-06 07:51:51 Chris MacNaughton tags in-stable-rocky in-stable-ussuri patch sts verification-train-needed in-stable-rocky in-stable-ussuri patch sts verification-stein-needed verification-train-needed
2021-01-07 02:53:03 Seyeong Kim tags in-stable-rocky in-stable-ussuri patch sts verification-stein-needed verification-train-needed in-stable-rocky in-stable-ussuri patch sts verification-stein-done verification-train-needed
2021-01-07 06:56:57 Seyeong Kim tags in-stable-rocky in-stable-ussuri patch sts verification-stein-done verification-train-needed in-stable-rocky in-stable-ussuri patch sts verification-stein-done verification-train-done
2021-01-13 15:00:55 Robie Basak python-oslo.messaging (Ubuntu Bionic): status In Progress Fix Committed
2021-01-13 15:00:57 Robie Basak bug added subscriber SRU Verification
2021-01-13 15:00:59 Robie Basak tags in-stable-rocky in-stable-ussuri patch sts verification-stein-done verification-train-done in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-stein-done verification-train-done
2021-01-13 16:04:08 Chris MacNaughton cloud-archive/queens: status In Progress Fix Committed
2021-01-13 16:04:10 Chris MacNaughton tags in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-stein-done verification-train-done in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-queens-needed verification-stein-done verification-train-done
2021-01-14 04:45:46 Seyeong Kim tags in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-queens-needed verification-stein-done verification-train-done in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-needed verification-queens-needed verification-stein-done verification-train-done
2021-01-14 07:53:12 Seyeong Kim tags in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-needed verification-queens-needed verification-stein-done verification-train-done in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-needed verification-queens-done verification-stein-done verification-train-done
2021-01-14 13:58:30 Chris MacNaughton cloud-archive/train: status Fix Committed Fix Released
2021-01-14 13:59:44 Chris MacNaughton nominated for series cloud-archive/rocky
2021-01-14 13:59:44 Chris MacNaughton bug task added cloud-archive/rocky
2021-01-14 14:00:16 Chris MacNaughton cloud-archive/stein: status Fix Committed Fix Released
2021-01-14 14:00:23 Chris MacNaughton cloud-archive/rocky: assignee Chris MacNaughton (chris.macnaughton)
2021-01-16 23:23:57 Mathew Hodson tags in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-needed verification-queens-done verification-stein-done verification-train-done in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-queens-done verification-stein-done verification-train-done
2021-01-28 15:12:02 Launchpad Janitor python-oslo.messaging (Ubuntu Bionic): status Fix Committed Fix Released
2021-02-02 13:02:23 Edward Hope-Morley tags in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-queens-done verification-stein-done verification-train-done in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-queens-failed verification-stein-done verification-train-done
2021-02-02 13:35:14 Corey Bryant description [Impact] If there are many exchanges and queues, after failing over, rabbitmq-server shows us error that exchanges are cannot be found. Affected  Bionic (Queens) Not affected  Focal [Test Case] 1. deploy simple rabbitmq cluster - https://pastebin.ubuntu.com/p/MR76VbMwY5/ 2. juju ssh neutron-gateway/0 - for i in {1..1000}; do systemd restart neutron-metering-agent; sleep 2; done 3. it would be better if we can add more exchanges, queues, bindings - rabbitmq-plugins enable rabbitmq_management - rabbitmqctl add_user test password - rabbitmqctl set_user_tags test administrator - rabbitmqctl set_permissions -p openstack test ".*" ".*" ".*" - https://pastebin.ubuntu.com/p/brw7rSXD7q/ ( save this as create.sh) - for i in {1..2000}; do ./create.sh test_$i; done 4. restart rabbitmq-server service or shutdown machine and turn on several times. 5. you can see the exchange not found error [Where problems could occur] 1. every service which uses oslo.messaging need to be restarted. 2. Message transferring could be an issue [Others] // original description Input:  - OpenStack Pike cluster with ~500 nodes  - DVR enabled in neutron  - Lots of messages Scenario: failover of one rabbit node in a cluster Issue: after failed rabbit node gets back online some rpc communications appear broken Logs from rabbit: =ERROR REPORT==== 10-Aug-2018::17:24:37 === Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1: operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack' Investigation: After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing. Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec) Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue. Is there a good reason for creating new exchanges for direct messages? [Impact] If there are many exchanges and queues, after failing over, rabbitmq-server shows us error that exchanges are cannot be found. Affected  Bionic (Queens) Not affected  Focal [Test Case] 1. deploy simple rabbitmq cluster - https://pastebin.ubuntu.com/p/MR76VbMwY5/ 2. juju ssh neutron-gateway/0 - for i in {1..1000}; do systemd restart neutron-metering-agent; sleep 2; done 3. it would be better if we can add more exchanges, queues, bindings - rabbitmq-plugins enable rabbitmq_management - rabbitmqctl add_user test password - rabbitmqctl set_user_tags test administrator - rabbitmqctl set_permissions -p openstack test ".*" ".*" ".*" - https://pastebin.ubuntu.com/p/brw7rSXD7q/ ( save this as create.sh) [1] - for i in {1..2000}; do ./create.sh test_$i; done 4. restart rabbitmq-server service or shutdown machine and turn on several times. 5. you can see the exchange not found error [1] create.sh (pasting here because pastebins don't last forever) #!/bin/bash rabbitmqadmin declare exchange -V openstack name=$1 type=direct -u test -p password rabbitmqadmin declare queue -V openstack name=$1 durable=false -u test -p password 'arguments={"x-expires":1800000}' rabbitmqadmin -V openstack declare binding source=$1 destination_type="queue" destination=$1 routing_key="" -u test -p password [Where problems could occur] 1. every service which uses oslo.messaging need to be restarted. 2. Message transferring could be an issue [Others] // original description Input:  - OpenStack Pike cluster with ~500 nodes  - DVR enabled in neutron  - Lots of messages Scenario: failover of one rabbit node in a cluster Issue: after failed rabbit node gets back online some rpc communications appear broken Logs from rabbit: =ERROR REPORT==== 10-Aug-2018::17:24:37 === Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1: operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack' Investigation: After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing. Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec) Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue. Is there a good reason for creating new exchanges for direct messages?
2021-02-04 13:49:04 Corey Bryant cloud-archive/queens: status Fix Committed New
2021-02-04 13:49:15 Corey Bryant cloud-archive/stein: status Fix Released New
2021-02-04 13:49:26 Corey Bryant python-oslo.messaging (Ubuntu Bionic): status Fix Released New
2021-02-04 13:49:46 Corey Bryant cloud-archive: status New Invalid
2021-02-23 15:42:32 Corey Bryant cloud-archive/stein: importance Undecided High
2021-02-23 15:42:32 Corey Bryant cloud-archive/stein: status New Triaged
2021-02-23 16:52:37 Corey Bryant cloud-archive/rocky: importance Undecided Medium
2021-02-23 16:52:37 Corey Bryant cloud-archive/rocky: status New Triaged
2021-02-23 16:52:51 Corey Bryant cloud-archive/stein: importance High Medium
2021-02-23 16:53:06 Corey Bryant cloud-archive/queens: importance Undecided Medium
2021-02-23 16:53:06 Corey Bryant cloud-archive/queens: status New Triaged
2021-02-23 16:53:18 Corey Bryant cloud-archive/mitaka: importance Undecided Medium
2021-02-23 16:53:18 Corey Bryant cloud-archive/mitaka: status New Triaged
2021-02-23 16:53:32 Corey Bryant python-oslo.messaging (Ubuntu Bionic): status New Triaged
2021-02-23 17:02:15 Corey Bryant cloud-archive/stein: status Triaged Fix Released
2021-02-23 17:02:25 Corey Bryant cloud-archive/rocky: status Triaged Fix Committed
2021-02-23 17:02:27 Corey Bryant tags in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-queens-failed verification-stein-done verification-train-done in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-queens-failed verification-rocky-needed verification-stein-done verification-train-done
2021-02-25 04:55:44 Seyeong Kim attachment removed lp1789177_bionic.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5444720/+files/lp1789177_bionic.debdiff
2021-02-25 05:01:29 Seyeong Kim attachment removed lp1789177_queens.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5444721/+files/lp1789177_queens.debdiff
2021-02-25 05:02:07 Seyeong Kim attachment removed lp1789177_xenial.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5444730/+files/lp1789177_xenial.debdiff
2021-02-25 05:02:56 Seyeong Kim attachment removed lp1789177_mitaka.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5444740/+files/lp1789177_mitaka.debdiff
2021-02-25 05:04:13 Seyeong Kim attachment removed lp1789177_stein.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5444807/+files/lp1789177_stein.debdiff
2021-02-25 05:04:34 Seyeong Kim attachment removed lp1789177_train.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5444808/+files/lp1789177_train.debdiff
2021-02-25 05:10:28 Seyeong Kim attachment added lp1789177_queens.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5466822/+files/lp1789177_queens.debdiff
2021-02-25 05:10:44 Seyeong Kim attachment added lp1789177_bionic.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5466823/+files/lp1789177_bionic.debdiff
2021-03-10 02:05:58 Seyeong Kim attachment removed lp1789177_bionic.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5466823/+files/lp1789177_bionic.debdiff
2021-03-10 02:06:09 Seyeong Kim attachment removed lp1789177_queens.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5466822/+files/lp1789177_queens.debdiff
2021-03-18 19:38:14 Liam Young tags in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-queens-failed verification-rocky-needed verification-stein-done verification-train-done in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-queens-failed verification-rocky-done verification-stein-done verification-train-done
2021-03-23 06:56:06 Seyeong Kim description [Impact] If there are many exchanges and queues, after failing over, rabbitmq-server shows us error that exchanges are cannot be found. Affected  Bionic (Queens) Not affected  Focal [Test Case] 1. deploy simple rabbitmq cluster - https://pastebin.ubuntu.com/p/MR76VbMwY5/ 2. juju ssh neutron-gateway/0 - for i in {1..1000}; do systemd restart neutron-metering-agent; sleep 2; done 3. it would be better if we can add more exchanges, queues, bindings - rabbitmq-plugins enable rabbitmq_management - rabbitmqctl add_user test password - rabbitmqctl set_user_tags test administrator - rabbitmqctl set_permissions -p openstack test ".*" ".*" ".*" - https://pastebin.ubuntu.com/p/brw7rSXD7q/ ( save this as create.sh) [1] - for i in {1..2000}; do ./create.sh test_$i; done 4. restart rabbitmq-server service or shutdown machine and turn on several times. 5. you can see the exchange not found error [1] create.sh (pasting here because pastebins don't last forever) #!/bin/bash rabbitmqadmin declare exchange -V openstack name=$1 type=direct -u test -p password rabbitmqadmin declare queue -V openstack name=$1 durable=false -u test -p password 'arguments={"x-expires":1800000}' rabbitmqadmin -V openstack declare binding source=$1 destination_type="queue" destination=$1 routing_key="" -u test -p password [Where problems could occur] 1. every service which uses oslo.messaging need to be restarted. 2. Message transferring could be an issue [Others] // original description Input:  - OpenStack Pike cluster with ~500 nodes  - DVR enabled in neutron  - Lots of messages Scenario: failover of one rabbit node in a cluster Issue: after failed rabbit node gets back online some rpc communications appear broken Logs from rabbit: =ERROR REPORT==== 10-Aug-2018::17:24:37 === Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1: operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack' Investigation: After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing. Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec) Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue. Is there a good reason for creating new exchanges for direct messages? [Impact] If there are many exchanges and queues, after failing over, rabbitmq-server shows us error that exchanges are cannot be found. Affected  Bionic (Queens) Not affected  Focal [Test Case] 1. deploy simple rabbitmq cluster - https://pastebin.ubuntu.com/p/MR76VbMwY5/ 2. juju ssh neutron-gateway/0 - for i in {1..1000}; do systemd restart neutron-metering-agent; sleep 2; done 3. it would be better if we can add more exchanges, queues, bindings - rabbitmq-plugins enable rabbitmq_management - rabbitmqctl add_user test password - rabbitmqctl set_user_tags test administrator - rabbitmqctl set_permissions -p openstack test ".*" ".*" ".*" - https://pastebin.ubuntu.com/p/brw7rSXD7q/ ( save this as create.sh) [1] - for i in {1..2000}; do ./create.sh test_$i; done 4. restart rabbitmq-server service or shutdown machine and turn on several times. 5. you can see the exchange not found error [1] create.sh (pasting here because pastebins don't last forever) #!/bin/bash rabbitmqadmin declare exchange -V openstack name=$1 type=direct -u test -p password rabbitmqadmin declare queue -V openstack name=$1 durable=false -u test -p password 'arguments={"x-expires":1800000}' rabbitmqadmin -V openstack declare binding source=$1 destination_type="queue" destination=$1 routing_key="" -u test -p password [Where problems could occur] 1. every service which uses oslo.messaging need to be restarted. 2. Message transferring could be an issue [Others] Possible Workaround 1. for exchange not found issue, - create exchange, queue, binding for problematic name in log - then restart rabbitmq-server one by one 2. for queue crashed and failed to restart - delete specific queue in log // original description Input:  - OpenStack Pike cluster with ~500 nodes  - DVR enabled in neutron  - Lots of messages Scenario: failover of one rabbit node in a cluster Issue: after failed rabbit node gets back online some rpc communications appear broken Logs from rabbit: =ERROR REPORT==== 10-Aug-2018::17:24:37 === Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1: operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack' Investigation: After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing. Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec) Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue. Is there a good reason for creating new exchanges for direct messages?
2021-04-06 13:47:30 Corey Bryant summary RabbitMQ fails to synchronize exchanges under high load RabbitMQ fails to synchronize exchanges under high load (Note for ubuntu: stein, rocky, queens(bionic) changes only fix compatibility with fully patched releases)
2021-04-06 13:50:21 Corey Bryant cloud-archive/rocky: status Fix Committed Fix Released
2021-04-21 19:04:29 OpenStack Infra cloud-archive/queens: status Triaged In Progress
2021-06-07 14:28:13 Łukasz Zemczak python-oslo.messaging (Ubuntu Bionic): status Triaged Fix Committed
2021-06-07 14:28:18 Łukasz Zemczak tags in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-queens-failed verification-rocky-done verification-stein-done verification-train-done in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-queens-failed verification-rocky-done verification-stein-done verification-train-done
2021-06-08 12:17:15 Corey Bryant cloud-archive/queens: status In Progress Fix Committed
2021-06-08 12:17:17 Corey Bryant tags in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-queens-failed verification-rocky-done verification-stein-done verification-train-done in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-queens-needed verification-rocky-done verification-stein-done verification-train-done
2021-06-30 00:04:57 Seyeong Kim bug added subscriber Seyeong Kim
2021-06-30 00:38:25 Seyeong Kim tags in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-queens-needed verification-rocky-done verification-stein-done verification-train-done in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-queens-done verification-rocky-done verification-stein-done verification-train-done
2021-06-30 05:29:58 Seyeong Kim tags in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-queens-done verification-rocky-done verification-stein-done verification-train-done in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-needed verification-queens-done verification-rocky-done verification-stein-done verification-train-done
2021-07-01 10:19:23 Launchpad Janitor python-oslo.messaging (Ubuntu Bionic): status Fix Committed Fix Released
2021-07-05 13:44:42 James Page tags in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-needed verification-queens-done verification-rocky-done verification-stein-done verification-train-done in-stable-rocky in-stable-ussuri patch sts verification-done verification-done-bionic verification-queens-done verification-rocky-done verification-stein-done verification-train-done
2021-07-05 13:46:23 James Page cloud-archive/queens: status Fix Committed Fix Released
2021-07-05 13:47:16 James Page python-oslo.messaging (Ubuntu Xenial): status In Progress Invalid
2021-08-17 06:13:00 Brett Milford bug added subscriber Brett Milford
2022-07-08 13:49:13 OpenStack Infra tags in-stable-rocky in-stable-ussuri patch sts verification-done verification-done-bionic verification-queens-done verification-rocky-done verification-stein-done verification-train-done in-stable-rocky in-stable-stein in-stable-ussuri patch sts verification-done verification-done-bionic verification-queens-done verification-rocky-done verification-stein-done verification-train-done