shutdowning rabbitmq causes nova-compute.service down
This bug report will be marked for expiration in 47 days if no further activity occurs. (find out why)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Invalid
|
Undecided
|
Unassigned | ||
oslo.messaging |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
Description
===========
We have an OpenStack with a RabbitMQ cluster of 3 nodes, and with dozens of nova-compute nodes.
When we shut down 1 out of 3 RabbitMQ nodes, Nagios alerted nova-compute.
Upon checking, we found that nova-compute.
nova-compute.
Loaded: loaded (/lib/systemd/
Active: active (running) since Fri 2024-02-16 00:42:47 UTC; 4 days ago
Main PID: 10130 (nova-compute)
Tasks: 32 (limit: 463517)
Memory: 248.2M
CPU: 55min 5.217s
CGroup: /system.
Feb 16 00:42:53 node002 sudo[11540]: pam_unix(
Feb 16 00:42:54 node002 sudo[11540]: pam_unix(
Feb 20 04:55:31 node002 nova-compute[
Feb 20 04:55:31 node002 nova-compute[
Feb 20 04:55:31 node002 nova-compute[
Feb 20 04:55:31 node002 nova-compute[
Feb 20 04:55:31 node002 nova-compute[
Feb 20 04:55:31 node002 nova-compute[
Feb 20 04:55:31 node002 nova-compute[
Feb 20 04:55:31 node002 nova-compute[
I guess it's possible that when shutting down a RabbitMQ node, nova-compute is experiencing contention or state inconsistencies in processing connection recovery
restarting nova-compute.
Logs & Configs
==============
The nova-compute.log:
2024-02-20 04:55:28.675 10130 ERROR oslo.messaging.
2024-02-20 04:55:29.677 10130 ERROR oslo.messaging.
2024-02-20 04:55:30.682 10130 INFO oslo.messaging.
2024-02-20 04:55:31.361 10130 INFO oslo.messaging.
然后systemctl status nova-compute
Feb 20 04:55:31 node002 nova-compute[
Feb 20 04:55:31 node002 nova-compute[
Feb 20 04:55:31 node002 nova-compute[
Feb 20 04:55:31 node002 nova-compute[
Feb 20 04:55:31 node002 nova-compute[
Feb 20 04:55:31 node002 nova-compute[
Feb 20 04:55:31 node002 nova-compute[
Feb 20 04:55:31 node002 nova-compute[
Jammy + nova-compute(
nova.conf:
[oslo_messaging
[oslo_messaging
driver = messagingv2
transport_url = *********
[notifications]
notification_format = unversioned
Changed in oslo.messaging: | |
status: | New → Incomplete |
This isn't a Nova bug, maybe some oslo.messaging problem, but anyway, as the nova-compute service will be off, then the servicegroup API wouldn't accept it for the scheduler, so this shouldn't be a problem.