nova-api logs are spammed with oslo.messaging errors

Bug #1943793 reported by Kristina Jasser
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

after updating from ussuri to victoria, the nova-api.logs are spammed with the following errors and infos:

nova-api.log:
2021-09-16 05:34:23.402 19 ERROR oslo.messaging._drivers.impl_rabbit [-] [ac436422-607b-4ca5-941a-b70dbfe6be3d] AMQP server on xxx.xxx.xxx.xxx:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: OSError: Server unexpectedly closed connection

2021-09-16 05:35:24.696 22 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer

<email address hidden>:
2021-09-16 05:09:26.821 [error] <0.31945.6> closing AMQP connection <0.31945.6> (xxx.xxx.xxx.xxx:38108 -> xxx.xxx.xxx.xxx:5672 - mod_wsgi:22:835db11a-efbd-4005-ad89-3d7a891dfa27):
missed heartbeats from client, timeout: 60s

after restarting nova-api the error is gone for about 20 to 30min. After that the errors get more and more from 1-3 errors in 5m to 50-60 errors in 5m after one day.

we tried the following settings with no effect:
rpc_response_timeout = 180

and
heartbeat_rate = 4
heartbeat_timeout_threshold = 120

befor upgrading to victoria we had no errors.

Tags: oslo
Revision history for this message
Kristina Jasser (marvin01) wrote :

nova 22.2.1.dev15
rabbitmq 3.8.16

Revision history for this message
Kristina Jasser (marvin01) wrote :

same problem with updated versions:
nova 22.2.3.dev16
rabbitmq 3.8.22

Revision history for this message
sean mooney (sean-k-mooney) wrote :

this seams like it similar to https://bugs.launchpad.net/nova/+bug/1825584
but it could also just be an intermitent connection issue to rabbitmq in your envionment.

this to me is not really poingint to an issue with nova so likely this should be filed agains oslo msesaging if there is a bug

have you tried setting
[oslo_messaging_rabbit]
heartbeat_in_pthread=true in the nova.conf used by nova api

tags: added: oslo
Changed in nova:
status: New → Incomplete
Revision history for this message
Kristina Jasser (marvin01) wrote :

thanks for this hint - we had no errors for about an hour until now.

I did not thougt about this config because of the documentation:

heartbeat_in_pthread
Type
boolean

Default
True

we did not set this option to false, so maybe the default value isn't true anymore?

also from the documentation: "This option is deprecated for removal. Its value may be silently ignored in the future." --> I hope, there will be something else to set the heartbeat to pthread in future.

Revision history for this message
Kristina Jasser (marvin01) wrote :

after two days with

[oslo_messaging_rabbit]
heartbeat_in_pthread=true

we got less errors

1 - 2 hours after restarting nova-api the first error occous - one day later there are 8 to 10 errors in 5min - without config change there have been 50 to 60 errors in 5min.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.