2018-08-27 08:16:34 |
Oleg Bondarev |
bug |
|
|
added bug |
2018-08-27 08:16:39 |
Oleg Bondarev |
oslo.messaging: assignee |
|
Oleg Bondarev (obondarev) |
|
2018-08-27 08:22:08 |
OpenStack Infra |
oslo.messaging: status |
New |
In Progress |
|
2018-11-01 20:03:11 |
OpenStack Infra |
oslo.messaging: status |
In Progress |
Fix Released |
|
2020-03-25 05:05:13 |
OpenStack Infra |
tags |
|
in-stable-rocky |
|
2020-05-18 06:56:43 |
norman shen |
attachment added |
|
exchanges.png https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5373233/+files/exchanges.png |
|
2020-07-02 02:32:42 |
norman shen |
attachment added |
|
pics.zip https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5388877/+files/pics.zip |
|
2020-08-25 11:58:33 |
OpenStack Infra |
tags |
in-stable-rocky |
in-stable-rocky in-stable-ussuri |
|
2020-12-16 07:59:17 |
Seyeong Kim |
bug task added |
|
python-oslo.messaging (Ubuntu) |
|
2020-12-16 07:59:38 |
Seyeong Kim |
python-oslo.messaging (Ubuntu): assignee |
|
Seyeong Kim (seyeongkim) |
|
2020-12-17 03:30:25 |
Seyeong Kim |
tags |
in-stable-rocky in-stable-ussuri |
in-stable-rocky in-stable-ussuri sts |
|
2020-12-17 03:33:02 |
Seyeong Kim |
description |
Input:
- OpenStack Pike cluster with ~500 nodes
- DVR enabled in neutron
- Lots of messages
Scenario: failover of one rabbit node in a cluster
Issue: after failed rabbit node gets back online some rpc communications appear broken
Logs from rabbit:
=ERROR REPORT==== 10-Aug-2018::17:24:37 ===
Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1:
operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack'
Investigation:
After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing.
Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec)
Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue.
Is there a good reason for creating new exchanges for direct messages? |
[Impact]
Affected
Bionic
Not affected
Focal
[Test Case]
TBD
[Where problems could occur]
TBD
[Others]
// original description
Input:
- OpenStack Pike cluster with ~500 nodes
- DVR enabled in neutron
- Lots of messages
Scenario: failover of one rabbit node in a cluster
Issue: after failed rabbit node gets back online some rpc communications appear broken
Logs from rabbit:
=ERROR REPORT==== 10-Aug-2018::17:24:37 ===
Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1:
operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack'
Investigation:
After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing.
Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec)
Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue.
Is there a good reason for creating new exchanges for direct messages? |
|
2020-12-17 03:45:10 |
Seyeong Kim |
attachment added |
|
lp1789177_bionic.debdiff https://bugs.launchpad.net/ubuntu/+source/python-oslo.messaging/+bug/1789177/+attachment/5444392/+files/lp1789177_bionic.debdiff |
|
2020-12-17 04:25:53 |
Ubuntu Foundations Team Bug Bot |
tags |
in-stable-rocky in-stable-ussuri sts |
in-stable-rocky in-stable-ussuri patch sts |
|
2020-12-17 04:25:58 |
Ubuntu Foundations Team Bug Bot |
bug |
|
|
added subscriber Ubuntu Sponsors Team |
2020-12-17 05:43:07 |
Seyeong Kim |
attachment added |
|
lp1789177_queens.debdiff https://bugs.launchpad.net/ubuntu/+source/python-oslo.messaging/+bug/1789177/+attachment/5444422/+files/lp1789177_queens.debdiff |
|
2020-12-17 07:45:06 |
Nobuto Murata |
bug |
|
|
added subscriber Nobuto Murata |
2020-12-18 00:59:18 |
Mathew Hodson |
bug task added |
|
cloud-archive |
|
2020-12-18 01:00:09 |
Mathew Hodson |
nominated for series |
|
Ubuntu Bionic |
|
2020-12-18 01:00:09 |
Mathew Hodson |
bug task added |
|
python-oslo.messaging (Ubuntu Bionic) |
|
2020-12-18 02:10:18 |
Seyeong Kim |
description |
[Impact]
Affected
Bionic
Not affected
Focal
[Test Case]
TBD
[Where problems could occur]
TBD
[Others]
// original description
Input:
- OpenStack Pike cluster with ~500 nodes
- DVR enabled in neutron
- Lots of messages
Scenario: failover of one rabbit node in a cluster
Issue: after failed rabbit node gets back online some rpc communications appear broken
Logs from rabbit:
=ERROR REPORT==== 10-Aug-2018::17:24:37 ===
Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1:
operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack'
Investigation:
After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing.
Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec)
Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue.
Is there a good reason for creating new exchanges for direct messages? |
[Impact]
If there are many exchanges and queues, after failing over, rabbitmq-server shows us error that exchanges are cannot be found.
Affected
Bionic (Queens)
Not affected
Focal
[Test Case]
1. deploy simple rabbitmq cluster
- https://pastebin.ubuntu.com/p/MR76VbMwY5/
2. juju ssh neutron-gateway/0
- for i in {1..1000}; do systemd restart neutron-metering-agent; sleep 2; done
3. it would be better if we can add more exchanges, queues, bindings
- rabbitmq-plugins enable rabbitmq_management
- rabbitmqctl add_user test password
- rabbitmqctl set_user_tags test administrator
- rabbitmqctl set_permissions -p openstack test ".*" ".*" ".*"
- https://pastebin.ubuntu.com/p/brw7rSXD7q/ ( save this as create.sh)
- for i in {1..2000}; do ./create.sh test_$i; done
4. restart rabbitmq-server service or shutdown machine and turn on several times.
5. you can see the exchange not found error
[Where problems could occur]
1. every service which uses oslo.messaging need to be restarted.
2. Message transferring could be an issue
[Others]
// original description
Input:
- OpenStack Pike cluster with ~500 nodes
- DVR enabled in neutron
- Lots of messages
Scenario: failover of one rabbit node in a cluster
Issue: after failed rabbit node gets back online some rpc communications appear broken
Logs from rabbit:
=ERROR REPORT==== 10-Aug-2018::17:24:37 ===
Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1:
operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack'
Investigation:
After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing.
Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec)
Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue.
Is there a good reason for creating new exchanges for direct messages? |
|
2020-12-18 02:56:28 |
Seyeong Kim |
attachment removed |
lp1789177_bionic.debdiff https://bugs.launchpad.net/cloud-archive/+bug/1789177/+attachment/5444392/+files/lp1789177_bionic.debdiff |
|
|
2020-12-18 02:56:36 |
Seyeong Kim |
attachment removed |
lp1789177_queens.debdiff https://bugs.launchpad.net/cloud-archive/+bug/1789177/+attachment/5444422/+files/lp1789177_queens.debdiff |
|
|
2020-12-18 03:22:29 |
Seyeong Kim |
attachment added |
|
lp1789177_bionic.debdiff https://bugs.launchpad.net/ubuntu/+source/python-oslo.messaging/+bug/1789177/+attachment/5444720/+files/lp1789177_bionic.debdiff |
|
2020-12-18 03:22:39 |
Seyeong Kim |
attachment added |
|
lp1789177_queens.debdiff https://bugs.launchpad.net/ubuntu/+source/python-oslo.messaging/+bug/1789177/+attachment/5444721/+files/lp1789177_queens.debdiff |
|
2020-12-18 03:23:28 |
Mathew Hodson |
python-oslo.messaging (Ubuntu): status |
New |
Fix Released |
|
2020-12-18 03:31:26 |
Mathew Hodson |
python-oslo.messaging (Ubuntu): importance |
Undecided |
Medium |
|
2020-12-18 03:31:29 |
Mathew Hodson |
python-oslo.messaging (Ubuntu Bionic): importance |
Undecided |
Medium |
|
2020-12-18 04:02:20 |
Seyeong Kim |
attachment added |
|
lp1789177_xenial.debdiff https://bugs.launchpad.net/ubuntu/+source/python-oslo.messaging/+bug/1789177/+attachment/5444730/+files/lp1789177_xenial.debdiff |
|
2020-12-18 04:14:39 |
Seyeong Kim |
attachment added |
|
lp1789177_mitaka.debdiff https://bugs.launchpad.net/ubuntu/+source/python-oslo.messaging/+bug/1789177/+attachment/5444740/+files/lp1789177_mitaka.debdiff |
|
2020-12-18 05:07:06 |
Mathew Hodson |
nominated for series |
|
Ubuntu Xenial |
|
2020-12-18 05:07:06 |
Mathew Hodson |
bug task added |
|
python-oslo.messaging (Ubuntu Xenial) |
|
2020-12-18 05:07:14 |
Mathew Hodson |
python-oslo.messaging (Ubuntu Xenial): importance |
Undecided |
Medium |
|
2020-12-18 05:49:25 |
Seyeong Kim |
nominated for series |
|
cloud-archive/mitaka |
|
2020-12-18 05:49:25 |
Seyeong Kim |
bug task added |
|
cloud-archive/mitaka |
|
2020-12-18 05:49:25 |
Seyeong Kim |
nominated for series |
|
cloud-archive/queens |
|
2020-12-18 05:49:25 |
Seyeong Kim |
bug task added |
|
cloud-archive/queens |
|
2020-12-18 06:35:22 |
Seyeong Kim |
attachment added |
|
lp1789177_stein.debdiff https://bugs.launchpad.net/ubuntu/+source/python-oslo.messaging/+bug/1789177/+attachment/5444807/+files/lp1789177_stein.debdiff |
|
2020-12-18 06:35:36 |
Seyeong Kim |
attachment added |
|
lp1789177_train.debdiff https://bugs.launchpad.net/ubuntu/+source/python-oslo.messaging/+bug/1789177/+attachment/5444808/+files/lp1789177_train.debdiff |
|
2020-12-21 04:01:40 |
Seyeong Kim |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2020-12-21 04:08:23 |
Seyeong Kim |
python-oslo.messaging (Ubuntu Xenial): status |
New |
In Progress |
|
2020-12-21 04:08:29 |
Seyeong Kim |
python-oslo.messaging (Ubuntu Xenial): assignee |
|
Seyeong Kim (seyeongkim) |
|
2020-12-21 04:08:36 |
Seyeong Kim |
python-oslo.messaging (Ubuntu Bionic): status |
New |
In Progress |
|
2020-12-21 04:08:37 |
Seyeong Kim |
cloud-archive/queens: status |
New |
In Progress |
|
2020-12-21 04:08:42 |
Seyeong Kim |
python-oslo.messaging (Ubuntu Bionic): assignee |
|
Seyeong Kim (seyeongkim) |
|
2020-12-21 04:08:46 |
Seyeong Kim |
cloud-archive/queens: assignee |
|
Seyeong Kim (seyeongkim) |
|
2020-12-21 04:08:48 |
Seyeong Kim |
python-oslo.messaging (Ubuntu): assignee |
Seyeong Kim (seyeongkim) |
|
|
2020-12-21 04:08:51 |
Seyeong Kim |
cloud-archive/mitaka: assignee |
|
Seyeong Kim (seyeongkim) |
|
2020-12-23 23:16:59 |
Dominique Poulain |
bug |
|
|
added subscriber Dominique Poulain |
2021-01-04 15:34:35 |
Chris MacNaughton |
nominated for series |
|
cloud-archive/stein |
|
2021-01-04 15:34:35 |
Chris MacNaughton |
bug task added |
|
cloud-archive/stein |
|
2021-01-04 15:34:35 |
Chris MacNaughton |
nominated for series |
|
cloud-archive/train |
|
2021-01-04 15:34:35 |
Chris MacNaughton |
bug task added |
|
cloud-archive/train |
|
2021-01-06 07:41:07 |
Chris MacNaughton |
cloud-archive/train: status |
New |
Fix Committed |
|
2021-01-06 07:41:09 |
Chris MacNaughton |
tags |
in-stable-rocky in-stable-ussuri patch sts |
in-stable-rocky in-stable-ussuri patch sts verification-train-needed |
|
2021-01-06 07:51:48 |
Chris MacNaughton |
cloud-archive/stein: status |
New |
Fix Committed |
|
2021-01-06 07:51:51 |
Chris MacNaughton |
tags |
in-stable-rocky in-stable-ussuri patch sts verification-train-needed |
in-stable-rocky in-stable-ussuri patch sts verification-stein-needed verification-train-needed |
|
2021-01-07 02:53:03 |
Seyeong Kim |
tags |
in-stable-rocky in-stable-ussuri patch sts verification-stein-needed verification-train-needed |
in-stable-rocky in-stable-ussuri patch sts verification-stein-done verification-train-needed |
|
2021-01-07 06:56:57 |
Seyeong Kim |
tags |
in-stable-rocky in-stable-ussuri patch sts verification-stein-done verification-train-needed |
in-stable-rocky in-stable-ussuri patch sts verification-stein-done verification-train-done |
|
2021-01-13 15:00:55 |
Robie Basak |
python-oslo.messaging (Ubuntu Bionic): status |
In Progress |
Fix Committed |
|
2021-01-13 15:00:57 |
Robie Basak |
bug |
|
|
added subscriber SRU Verification |
2021-01-13 15:00:59 |
Robie Basak |
tags |
in-stable-rocky in-stable-ussuri patch sts verification-stein-done verification-train-done |
in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-stein-done verification-train-done |
|
2021-01-13 16:04:08 |
Chris MacNaughton |
cloud-archive/queens: status |
In Progress |
Fix Committed |
|
2021-01-13 16:04:10 |
Chris MacNaughton |
tags |
in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-stein-done verification-train-done |
in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-queens-needed verification-stein-done verification-train-done |
|
2021-01-14 04:45:46 |
Seyeong Kim |
tags |
in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-queens-needed verification-stein-done verification-train-done |
in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-needed verification-queens-needed verification-stein-done verification-train-done |
|
2021-01-14 07:53:12 |
Seyeong Kim |
tags |
in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-needed verification-queens-needed verification-stein-done verification-train-done |
in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-needed verification-queens-done verification-stein-done verification-train-done |
|
2021-01-14 13:58:30 |
Chris MacNaughton |
cloud-archive/train: status |
Fix Committed |
Fix Released |
|
2021-01-14 13:59:44 |
Chris MacNaughton |
nominated for series |
|
cloud-archive/rocky |
|
2021-01-14 13:59:44 |
Chris MacNaughton |
bug task added |
|
cloud-archive/rocky |
|
2021-01-14 14:00:16 |
Chris MacNaughton |
cloud-archive/stein: status |
Fix Committed |
Fix Released |
|
2021-01-14 14:00:23 |
Chris MacNaughton |
cloud-archive/rocky: assignee |
|
Chris MacNaughton (chris.macnaughton) |
|
2021-01-16 23:23:57 |
Mathew Hodson |
tags |
in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-needed verification-queens-done verification-stein-done verification-train-done |
in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-queens-done verification-stein-done verification-train-done |
|
2021-01-28 15:12:02 |
Launchpad Janitor |
python-oslo.messaging (Ubuntu Bionic): status |
Fix Committed |
Fix Released |
|
2021-02-02 13:02:23 |
Edward Hope-Morley |
tags |
in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-queens-done verification-stein-done verification-train-done |
in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-queens-failed verification-stein-done verification-train-done |
|
2021-02-02 13:35:14 |
Corey Bryant |
description |
[Impact]
If there are many exchanges and queues, after failing over, rabbitmq-server shows us error that exchanges are cannot be found.
Affected
Bionic (Queens)
Not affected
Focal
[Test Case]
1. deploy simple rabbitmq cluster
- https://pastebin.ubuntu.com/p/MR76VbMwY5/
2. juju ssh neutron-gateway/0
- for i in {1..1000}; do systemd restart neutron-metering-agent; sleep 2; done
3. it would be better if we can add more exchanges, queues, bindings
- rabbitmq-plugins enable rabbitmq_management
- rabbitmqctl add_user test password
- rabbitmqctl set_user_tags test administrator
- rabbitmqctl set_permissions -p openstack test ".*" ".*" ".*"
- https://pastebin.ubuntu.com/p/brw7rSXD7q/ ( save this as create.sh)
- for i in {1..2000}; do ./create.sh test_$i; done
4. restart rabbitmq-server service or shutdown machine and turn on several times.
5. you can see the exchange not found error
[Where problems could occur]
1. every service which uses oslo.messaging need to be restarted.
2. Message transferring could be an issue
[Others]
// original description
Input:
- OpenStack Pike cluster with ~500 nodes
- DVR enabled in neutron
- Lots of messages
Scenario: failover of one rabbit node in a cluster
Issue: after failed rabbit node gets back online some rpc communications appear broken
Logs from rabbit:
=ERROR REPORT==== 10-Aug-2018::17:24:37 ===
Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1:
operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack'
Investigation:
After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing.
Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec)
Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue.
Is there a good reason for creating new exchanges for direct messages? |
[Impact]
If there are many exchanges and queues, after failing over, rabbitmq-server shows us error that exchanges are cannot be found.
Affected
Bionic (Queens)
Not affected
Focal
[Test Case]
1. deploy simple rabbitmq cluster
- https://pastebin.ubuntu.com/p/MR76VbMwY5/
2. juju ssh neutron-gateway/0
- for i in {1..1000}; do systemd restart neutron-metering-agent; sleep 2; done
3. it would be better if we can add more exchanges, queues, bindings
- rabbitmq-plugins enable rabbitmq_management
- rabbitmqctl add_user test password
- rabbitmqctl set_user_tags test administrator
- rabbitmqctl set_permissions -p openstack test ".*" ".*" ".*"
- https://pastebin.ubuntu.com/p/brw7rSXD7q/ ( save this as create.sh) [1]
- for i in {1..2000}; do ./create.sh test_$i; done
4. restart rabbitmq-server service or shutdown machine and turn on several times.
5. you can see the exchange not found error
[1] create.sh (pasting here because pastebins don't last forever)
#!/bin/bash
rabbitmqadmin declare exchange -V openstack name=$1 type=direct -u test -p password
rabbitmqadmin declare queue -V openstack name=$1 durable=false -u test -p password 'arguments={"x-expires":1800000}'
rabbitmqadmin -V openstack declare binding source=$1 destination_type="queue" destination=$1 routing_key="" -u test -p password
[Where problems could occur]
1. every service which uses oslo.messaging need to be restarted.
2. Message transferring could be an issue
[Others]
// original description
Input:
- OpenStack Pike cluster with ~500 nodes
- DVR enabled in neutron
- Lots of messages
Scenario: failover of one rabbit node in a cluster
Issue: after failed rabbit node gets back online some rpc communications appear broken
Logs from rabbit:
=ERROR REPORT==== 10-Aug-2018::17:24:37 ===
Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1:
operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack'
Investigation:
After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing.
Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec)
Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue.
Is there a good reason for creating new exchanges for direct messages? |
|
2021-02-04 13:49:04 |
Corey Bryant |
cloud-archive/queens: status |
Fix Committed |
New |
|
2021-02-04 13:49:15 |
Corey Bryant |
cloud-archive/stein: status |
Fix Released |
New |
|
2021-02-04 13:49:26 |
Corey Bryant |
python-oslo.messaging (Ubuntu Bionic): status |
Fix Released |
New |
|
2021-02-04 13:49:46 |
Corey Bryant |
cloud-archive: status |
New |
Invalid |
|
2021-02-23 15:42:32 |
Corey Bryant |
cloud-archive/stein: importance |
Undecided |
High |
|
2021-02-23 15:42:32 |
Corey Bryant |
cloud-archive/stein: status |
New |
Triaged |
|
2021-02-23 16:52:37 |
Corey Bryant |
cloud-archive/rocky: importance |
Undecided |
Medium |
|
2021-02-23 16:52:37 |
Corey Bryant |
cloud-archive/rocky: status |
New |
Triaged |
|
2021-02-23 16:52:51 |
Corey Bryant |
cloud-archive/stein: importance |
High |
Medium |
|
2021-02-23 16:53:06 |
Corey Bryant |
cloud-archive/queens: importance |
Undecided |
Medium |
|
2021-02-23 16:53:06 |
Corey Bryant |
cloud-archive/queens: status |
New |
Triaged |
|
2021-02-23 16:53:18 |
Corey Bryant |
cloud-archive/mitaka: importance |
Undecided |
Medium |
|
2021-02-23 16:53:18 |
Corey Bryant |
cloud-archive/mitaka: status |
New |
Triaged |
|
2021-02-23 16:53:32 |
Corey Bryant |
python-oslo.messaging (Ubuntu Bionic): status |
New |
Triaged |
|
2021-02-23 17:02:15 |
Corey Bryant |
cloud-archive/stein: status |
Triaged |
Fix Released |
|
2021-02-23 17:02:25 |
Corey Bryant |
cloud-archive/rocky: status |
Triaged |
Fix Committed |
|
2021-02-23 17:02:27 |
Corey Bryant |
tags |
in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-queens-failed verification-stein-done verification-train-done |
in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-queens-failed verification-rocky-needed verification-stein-done verification-train-done |
|
2021-02-25 04:55:44 |
Seyeong Kim |
attachment removed |
lp1789177_bionic.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5444720/+files/lp1789177_bionic.debdiff |
|
|
2021-02-25 05:01:29 |
Seyeong Kim |
attachment removed |
lp1789177_queens.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5444721/+files/lp1789177_queens.debdiff |
|
|
2021-02-25 05:02:07 |
Seyeong Kim |
attachment removed |
lp1789177_xenial.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5444730/+files/lp1789177_xenial.debdiff |
|
|
2021-02-25 05:02:56 |
Seyeong Kim |
attachment removed |
lp1789177_mitaka.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5444740/+files/lp1789177_mitaka.debdiff |
|
|
2021-02-25 05:04:13 |
Seyeong Kim |
attachment removed |
lp1789177_stein.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5444807/+files/lp1789177_stein.debdiff |
|
|
2021-02-25 05:04:34 |
Seyeong Kim |
attachment removed |
lp1789177_train.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5444808/+files/lp1789177_train.debdiff |
|
|
2021-02-25 05:10:28 |
Seyeong Kim |
attachment added |
|
lp1789177_queens.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5466822/+files/lp1789177_queens.debdiff |
|
2021-02-25 05:10:44 |
Seyeong Kim |
attachment added |
|
lp1789177_bionic.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5466823/+files/lp1789177_bionic.debdiff |
|
2021-03-10 02:05:58 |
Seyeong Kim |
attachment removed |
lp1789177_bionic.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5466823/+files/lp1789177_bionic.debdiff |
|
|
2021-03-10 02:06:09 |
Seyeong Kim |
attachment removed |
lp1789177_queens.debdiff https://bugs.launchpad.net/oslo.messaging/+bug/1789177/+attachment/5466822/+files/lp1789177_queens.debdiff |
|
|
2021-03-18 19:38:14 |
Liam Young |
tags |
in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-queens-failed verification-rocky-needed verification-stein-done verification-train-done |
in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-queens-failed verification-rocky-done verification-stein-done verification-train-done |
|
2021-03-23 06:56:06 |
Seyeong Kim |
description |
[Impact]
If there are many exchanges and queues, after failing over, rabbitmq-server shows us error that exchanges are cannot be found.
Affected
Bionic (Queens)
Not affected
Focal
[Test Case]
1. deploy simple rabbitmq cluster
- https://pastebin.ubuntu.com/p/MR76VbMwY5/
2. juju ssh neutron-gateway/0
- for i in {1..1000}; do systemd restart neutron-metering-agent; sleep 2; done
3. it would be better if we can add more exchanges, queues, bindings
- rabbitmq-plugins enable rabbitmq_management
- rabbitmqctl add_user test password
- rabbitmqctl set_user_tags test administrator
- rabbitmqctl set_permissions -p openstack test ".*" ".*" ".*"
- https://pastebin.ubuntu.com/p/brw7rSXD7q/ ( save this as create.sh) [1]
- for i in {1..2000}; do ./create.sh test_$i; done
4. restart rabbitmq-server service or shutdown machine and turn on several times.
5. you can see the exchange not found error
[1] create.sh (pasting here because pastebins don't last forever)
#!/bin/bash
rabbitmqadmin declare exchange -V openstack name=$1 type=direct -u test -p password
rabbitmqadmin declare queue -V openstack name=$1 durable=false -u test -p password 'arguments={"x-expires":1800000}'
rabbitmqadmin -V openstack declare binding source=$1 destination_type="queue" destination=$1 routing_key="" -u test -p password
[Where problems could occur]
1. every service which uses oslo.messaging need to be restarted.
2. Message transferring could be an issue
[Others]
// original description
Input:
- OpenStack Pike cluster with ~500 nodes
- DVR enabled in neutron
- Lots of messages
Scenario: failover of one rabbit node in a cluster
Issue: after failed rabbit node gets back online some rpc communications appear broken
Logs from rabbit:
=ERROR REPORT==== 10-Aug-2018::17:24:37 ===
Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1:
operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack'
Investigation:
After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing.
Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec)
Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue.
Is there a good reason for creating new exchanges for direct messages? |
[Impact]
If there are many exchanges and queues, after failing over, rabbitmq-server shows us error that exchanges are cannot be found.
Affected
Bionic (Queens)
Not affected
Focal
[Test Case]
1. deploy simple rabbitmq cluster
- https://pastebin.ubuntu.com/p/MR76VbMwY5/
2. juju ssh neutron-gateway/0
- for i in {1..1000}; do systemd restart neutron-metering-agent; sleep 2; done
3. it would be better if we can add more exchanges, queues, bindings
- rabbitmq-plugins enable rabbitmq_management
- rabbitmqctl add_user test password
- rabbitmqctl set_user_tags test administrator
- rabbitmqctl set_permissions -p openstack test ".*" ".*" ".*"
- https://pastebin.ubuntu.com/p/brw7rSXD7q/ ( save this as create.sh) [1]
- for i in {1..2000}; do ./create.sh test_$i; done
4. restart rabbitmq-server service or shutdown machine and turn on several times.
5. you can see the exchange not found error
[1] create.sh (pasting here because pastebins don't last forever)
#!/bin/bash
rabbitmqadmin declare exchange -V openstack name=$1 type=direct -u test -p password
rabbitmqadmin declare queue -V openstack name=$1 durable=false -u test -p password 'arguments={"x-expires":1800000}'
rabbitmqadmin -V openstack declare binding source=$1 destination_type="queue" destination=$1 routing_key="" -u test -p password
[Where problems could occur]
1. every service which uses oslo.messaging need to be restarted.
2. Message transferring could be an issue
[Others]
Possible Workaround
1. for exchange not found issue,
- create exchange, queue, binding for problematic name in log
- then restart rabbitmq-server one by one
2. for queue crashed and failed to restart
- delete specific queue in log
// original description
Input:
- OpenStack Pike cluster with ~500 nodes
- DVR enabled in neutron
- Lots of messages
Scenario: failover of one rabbit node in a cluster
Issue: after failed rabbit node gets back online some rpc communications appear broken
Logs from rabbit:
=ERROR REPORT==== 10-Aug-2018::17:24:37 ===
Channel error on connection <0.14839.1> (10.200.0.24:55834 -> 10.200.0.31:5672, vhost: '/openstack', user: 'openstack'), channel 1:
operation basic.publish caused a channel exception not_found: no exchange 'reply_5675d7991b4a4fb7af5d239f4decb19f' in vhost '/openstack'
Investigation:
After rabbit node gets back online it gets many new connections immediately and fails to synchronize exchanges for some reason (number of exchanges in that cluster was ~1600), on that node it stays low and not increasing.
Workaround: let the recovered node synchronize all exchanges - forbid new connections with iptables rules for some time after failed node gets online (30 sec)
Proposal: do not create new exchanges (use default) for all direct messages - this also fixes the issue.
Is there a good reason for creating new exchanges for direct messages? |
|
2021-04-06 13:47:30 |
Corey Bryant |
summary |
RabbitMQ fails to synchronize exchanges under high load |
RabbitMQ fails to synchronize exchanges under high load (Note for ubuntu: stein, rocky, queens(bionic) changes only fix compatibility with fully patched releases) |
|
2021-04-06 13:50:21 |
Corey Bryant |
cloud-archive/rocky: status |
Fix Committed |
Fix Released |
|
2021-04-21 19:04:29 |
OpenStack Infra |
cloud-archive/queens: status |
Triaged |
In Progress |
|
2021-06-07 14:28:13 |
Łukasz Zemczak |
python-oslo.messaging (Ubuntu Bionic): status |
Triaged |
Fix Committed |
|
2021-06-07 14:28:18 |
Łukasz Zemczak |
tags |
in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-queens-failed verification-rocky-done verification-stein-done verification-train-done |
in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-queens-failed verification-rocky-done verification-stein-done verification-train-done |
|
2021-06-08 12:17:15 |
Corey Bryant |
cloud-archive/queens: status |
In Progress |
Fix Committed |
|
2021-06-08 12:17:17 |
Corey Bryant |
tags |
in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-queens-failed verification-rocky-done verification-stein-done verification-train-done |
in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-queens-needed verification-rocky-done verification-stein-done verification-train-done |
|
2021-06-30 00:04:57 |
Seyeong Kim |
bug |
|
|
added subscriber Seyeong Kim |
2021-06-30 00:38:25 |
Seyeong Kim |
tags |
in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-queens-needed verification-rocky-done verification-stein-done verification-train-done |
in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-queens-done verification-rocky-done verification-stein-done verification-train-done |
|
2021-06-30 05:29:58 |
Seyeong Kim |
tags |
in-stable-rocky in-stable-ussuri patch sts verification-needed verification-needed-bionic verification-queens-done verification-rocky-done verification-stein-done verification-train-done |
in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-needed verification-queens-done verification-rocky-done verification-stein-done verification-train-done |
|
2021-07-01 10:19:23 |
Launchpad Janitor |
python-oslo.messaging (Ubuntu Bionic): status |
Fix Committed |
Fix Released |
|
2021-07-05 13:44:42 |
James Page |
tags |
in-stable-rocky in-stable-ussuri patch sts verification-done-bionic verification-needed verification-queens-done verification-rocky-done verification-stein-done verification-train-done |
in-stable-rocky in-stable-ussuri patch sts verification-done verification-done-bionic verification-queens-done verification-rocky-done verification-stein-done verification-train-done |
|
2021-07-05 13:46:23 |
James Page |
cloud-archive/queens: status |
Fix Committed |
Fix Released |
|
2021-07-05 13:47:16 |
James Page |
python-oslo.messaging (Ubuntu Xenial): status |
In Progress |
Invalid |
|
2021-08-17 06:13:00 |
Brett Milford |
bug |
|
|
added subscriber Brett Milford |
2022-07-08 13:49:13 |
OpenStack Infra |
tags |
in-stable-rocky in-stable-ussuri patch sts verification-done verification-done-bionic verification-queens-done verification-rocky-done verification-stein-done verification-train-done |
in-stable-rocky in-stable-stein in-stable-ussuri patch sts verification-done verification-done-bionic verification-queens-done verification-rocky-done verification-stein-done verification-train-done |
|