L3 agent hangs when rabbitmq cluster fails often
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mirantis OpenStack |
Invalid
|
Medium
|
Oleg Bondarev |
Bug Description
L3 agent literally hangs in conditions when rabbitmq cluster often failing apart and repairing.
The last lines of logs seen from L3 agent:
2015-04-24 10:07:14.205 30273 ERROR oslo.messaging.
2015-04-24 10:07:14.230 30273 INFO oslo.messaging.
2015-04-24 10:07:14.244 30273 ERROR oslo.messaging.
2015-04-24 10:07:14.245 30273 INFO oslo.messaging.
2015-04-24 10:07:14.256 30273 ERROR oslo.messaging.
2015-04-24 10:07:14.737 30273 INFO oslo.messaging.
2015-04-24 10:07:14.752 30273 ERROR oslo.messaging.
2015-04-24 10:07:28.215 30273 DEBUG neutron.
2015-04-24 10:07:28.216 30273 DEBUG neutron.
2015-04-24 10:07:28.906 30273 INFO oslo.messaging.
That could be an issue in the oslo messaging somehow hanging on reconnect attempt.
further analysis shows that the process is hanging in
epoll_wait(4, {}, 1023, 0) = 0
Changed in mos: | |
milestone: | none → 6.1 |
description: | updated |
Changed in mos: | |
assignee: | nobody → MOS Neutron (mos-neutron) |
status: | New → Confirmed |
Changed in mos: | |
status: | Confirmed → In Progress |
Changed in mos: | |
assignee: | MOS Neutron (mos-neutron) → Oleg Bondarev (obondarev) |
One important (but yet unconfirmed) condition that may provide a hint to this is that rsyslog server is also constantly restarting for some reason.
Some time ago we've fixed an issue with eventlet wich made services to hang (spin in busy wait) consuming 100% cpu on rsyslog restart.
Here we may face same issue, so that is worth checking.