[SRU] nova scheduler should ignore removed groups
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
sean mooney | ||
Ubuntu Cloud Archive |
Fix Released
|
Undecided
|
Unassigned | ||
Yoga |
Triaged
|
High
|
Unassigned | ||
Zed |
Fix Released
|
Undecided
|
Unassigned | ||
nova (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Jammy |
Triaged
|
High
|
Unassigned | ||
Kinetic |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Impact]
Fixes nova to not fail when scheduling a vm that previously belonged to a server group that has been deleted.
[Test Plan]
* deploy Openstack Yoga with shared storage e.g. Ceph
* create a server group (e.g. affinity) and boot one or more instances within that group:
openstack server group create --policy affinity sg1
openstack server create --image jammy --flavor m1.small --key-name testkey --nic net-id=private vm1 --hint group=e29105e1-
* delete the server group:
openstack server group delete sg1
* poweroff the compute host where the vms are running
* evacuate the vms from the powered down host:
nova evacuate vm1 juju-882778-
* check vms are ACTIVE and reachable on new compute host(s)
[Regression Potential]
No regressions are expected as a result of this fix.
-------
Description
===========
We created a server group and started some instances in it.
Later we removed the server group.
Some time later, we had to evacuate these instances, but this failed, because the
scheduler removed all available hosts during filtering.
Steps to reproduce
==================
* create a server group
* start some instances in this group
* delete the server group
* ( hard poweroff your hypervisor )
* evacuate the instances
Expected result
===============
The instances are evacuated
Actual result
=============
The instances run into ERROR-state, because the server group is not found.
Environment
===========
* Kolla deployed OpenStack Train
* Ubuntu 18.04 / KVM + Libvirt
Logs & Configs
==============
scheduler tells:
Filtering removed all hosts for the request with instance ID 'adddf2c9-
instance show:
| fault | {'code': 404, 'created': '2020-08-
Changed in nova: | |
assignee: | nobody → sean mooney (sean-k-mooney) |
no longer affects: | nova (Ubuntu Lunar) |
Changed in nova (Ubuntu Kinetic): | |
status: | New → Fix Released |
no longer affects: | cloud-archive/ussuri |
no longer affects: | cloud-archive/victoria |
no longer affects: | cloud-archive/wallaby |
no longer affects: | cloud-archive/xena |
Changed in nova (Ubuntu): | |
status: | New → Fix Released |
Changed in cloud-archive: | |
status: | New → Fix Released |
Interesting, I tried to reproduce it on latest master with a two node devstack and evacuate worked with affinity or anti-affinity group after the group was deleted.
We does not store the group uuid on the instance but store the instance_uuid in the group so when the group is deleted there should be no way to found that group based on the instance_uuid.
This is the place where the scheduler gets the group, if any, for the instance being scheduled [1].
We also store the group uuid in the RequestSpec but loading a RequestSpec back from the DB also guarded against deleted groups. [2]
Could you execute
$openstack server group show <grp>
before you delete the group?
Could you also grep the nova logs for with the group uuid to see if any interesting log shows up?
[1] https:/ /github. com/openstack/ nova/blob/ 8ecc29bfccc64e6 036d068f9bcbeb0 d8e0748776/ nova/scheduler/ utils.py# L1076
[2] https:/ /github. com/openstack/ nova/blob/ 8ecc29bfccc64e6 036d068f9bcbeb0 d8e0748776/ nova/objects/ request_ spec.py# L595