Uncontrolled RabbitMQ channel creation after rabbitmq-server failure
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack-Ansible |
Fix Released
|
High
|
Kevin Carter | ||
Icehouse |
Fix Released
|
High
|
Jesse Pretorius | ||
Juno |
Fix Released
|
High
|
Jesse Pretorius | ||
Trunk |
Fix Released
|
High
|
Kevin Carter | ||
oslo.messaging |
Fix Released
|
High
|
Mehdi Abaakouk |
Bug Description
We noticed a pretty serious issue in oslo.messaging version 1.5.1 whenever one out of several configured rabbitmq server die/crashes that we suddenly see a influx of rabbitmq channels per connection. Usually in a normal setup we see one rabbitmq channel per connection but in this situation we suddenly see a growing number of channels per connection, for instance :
#rabbitmqctl list_connections name channels | awk '{ if ($4 >1) print }'
172.29.
172.29.238.56:43869 -> 172.29.236.219:5672 452
172.29.238.56:43870 -> 172.29.236.219:5672 452
172.29.
172.29.
172.29.
172.29.
This output should be zero in normal situation.
At this point I also noticed that all remaining rabbitmq server have been connected by the amqp connection layer, TCP sessions are established.
Before failure of rabbitmq-server (cluster of 3 nodes):
-------
The amqp connection layer has only one TCP connection open to the first configured rabbit_host
root@compute1_
172.29.
After failure of the first configured rabbitmq-server :
-------
The amqp connection layer has now a TCP connection open to each remaining rabbitmq server
root@compute1_
172.29.
172.29.
Stack strace :
2014-12-30 17:40:28.197 1574 ERROR oslo.messaging.
2014-12-30 17:40:28.198 1574 ERROR oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
2014-12-30 17:40:28.198 1574 TRACE oslo.messaging.
Configured rabbitmq settings (omitted password/user) :
rabbit_hosts = 172.29.
rabbit_port = 5672
Oslo versions:
oslo.config==1.5.0
oslo.db==1.3.0
oslo.i18n==1.1.0
oslo.messaging=
oslo.middleware
oslo.rootwrap=
oslo.serializat
oslo.utils==1.1.0
Rabbitmq version : 3.4.2-1 (from http://
Currently we can only stop this behavior if we restart all Openstack services utilizing rabbitmq
Changed in oslo.messaging: | |
status: | New → Confirmed |
Changed in oslo.messaging: | |
importance: | Undecided → High |
milestone: | none → next-kilo |
Changed in openstack-ansible: | |
milestone: | none → next |
Changed in openstack-ansible: | |
status: | New → Confirmed |
importance: | Undecided → High |
status: | Confirmed → Triaged |
Changed in oslo.messaging: | |
milestone: | 1.6.0 → next-kilo |
Changed in oslo.messaging: | |
status: | Fix Committed → Fix Released |
This issue, let the channels grow until the Erlang VM is out of processes and the whole Openstack setup comes to a halt. At least from the provisioning perspective