nova-compute SSL connections make rabbitmq pods OOM

Bug #1936574 reported by peiran wei
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
RabbitMQ
New
Undecided
Unassigned
oslo.messaging
New
Undecided
Unassigned

Bug Description

we have an Rocky openstack deployment that includes 3 controller and 500 computes.just at 15:58,nova-compute detect that rabbitmq connection was broken ,then reconnected.

2021-07-05 15:58:28.633 8 ERROR oslo.messaging._drivers.impl_rabbit [req-a09d4a8b-c24b-4b30-b433-64fe4f6bace5 - - - - -] [8ed1f425-ad67-4b98-874c-e4516aaf3134] AMQP server on 145.247.103.16:5671 is unreachable: . Trying again in 1 seconds.: timeout
2021-07-05 15:58:29.656 8 INFO oslo.messaging._drivers.impl_rabbit [req-a09d4a8b-c24b-4b30-b433-64fe4f6bace5 - - - - -] [8ed1f425-ad67-4b98-874c-e4516aaf3134] Reconnected to AMQP server on 145.247.103.16:5671 via [amqp] client with port 28205.

then rabbitmq report huge connections was closed by client.

=WARNING REPORT==== 5-Jul-2021::15:57:59 ===
closing AMQP connection <0.6345.754> (20.16.36.44:2451 -> 145.247.103.14:5671 - nova-compute:8:b4ce7b09-b9b5-4db1-983b-a071dc031c64, vhost: '/', user: 'openstack'):
client unexpectedly closed TCP connection

after 10 minutes ,cluster was blocked with 0.4 memory watermark.

=INFO REPORT==== 5-Jul-2021::16:19:29 ===
vm_memory_high_watermark set. Memory used:111358541824 allowed:107949065830

**********************************************************
*** Publishers will be blocked until this alarm clears ***
**********************************************************

However ,after the publishers were bloked ,rabbitmq pod still result in memory leaking,in the end, the node OOM,system force pod to restart.

amqp release : 2.5.2
oslo-messaging release :8.1.4
openstack : Rocky

peiran wei (james940928)
Changed in oslo.messaging:
assignee: nobody → peiran wei (james940928)
assignee: peiran wei (james940928) → nobody
Lee Yarwood (lyarwood)
Changed in nova:
status: New → Invalid
Revision history for this message
peiran wei (james940928) wrote :

We saw that this topic was related to "Upgrading to pike version causes rabbit timeouts with ssl",however,we noticed this issue and upgraded amqp and oslo-messaging to 2.5.2\8.1.4,at the end ,bugs still existd.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.