Instance evacuation is stuck and completes after a long time

Bug #2044703 reported by Sai Siddhartha P V K
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
New
Undecided
Unassigned

Bug Description

Stack:
3 node cluster on CentOS 8 Stream, Libvirt+KVM
Openstack Xena.
Nova packages version - 24.1.1-1
SAN based Shared storage connected through ISCSI
MariaDB Galera Cluster configured in Active-Backup mode on haproxy.

We are observing an issue with instance evacuation when a host with active database is powered off.
Evacuation task is triggered but it is taking more than 15 minutes. At the same time, we see placement and other Openstack services report 504 gateway timeout errors when talking with keystone. We saw that the major part of time is spent in nova scheduler from the following log :

Request filter 'map_az_to_placement_aggregate' took 972.3 seconds wrapper /usr/lib/python3.6/site-packages/nova/scheduler/request_filter.py:47

Sometimes rebuild task is stuck forever until we restart nova services and force VM to go to an error state.
Please let me know if any other configurations or logs are required to understand this issue.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.