OpenStack Compute (nova)

Bug #1952745
Comment #3

Comment 3 for bug 1952745

Revision history for this message

Konrad Cempura (kcem) wrote on 2021-12-09:

> "TBH, I don't really see a problem with what you say : if you recreate a nova-compute service, you need to restart it for removing the evacuated instances, but I could maybe misunderstand your concerns."

1. It is possible to cold migrate evacuated instances to new compute that has the same name like source compute from which instances has been evacuated earlier and that instances will be removed from libvirt on first compute restart (and you definitely don't want that).

2. Evacuations from compute-1 are not completed on first run of nova_compute on new compute-1 that get the same name like broken compute-1 and make possible scenario 1 to happen. They try to complete them but there are errors in logs and it is successful after restart.

3. It is possible to cold migrate evacuated instances to compute that has slightly different name; difference is in capital letters (ex. compute-1 vs. COMPUTE-1) and the result will be the same as in point 1. With exception that evacuations from compute-1 will never be completed. Instances evacuated from compute-1 will be deleted if they are moved to COMPUTE-1 in future... but in unexpected moment -> after COMPUTE-1 restart (it may never happen but when it happen you will lose some/all instances).

Scenarios to point 1, 2, 3:

Scenario 1:
- remove compute-1 physically by format disk
- evacuate instances from removed compute-1 to compute-2
- remove compute-1 service from openstack cloud
- remove orphaned resource provider for compute-1
- configure new compute with name compute-1 and add it to openstack cloud
- cold migrate evacuated instances from compute-2 to compute-1
- accept migrations
- restart compute-1
- instances from compute-1 are GONE from libvirt (they exists in OpenStack)

Scenario 2:
- remove compute-1 physically by format disk
- evacuate instances from removed compute-1
- remove compute-1 service from openstack
- remove orphaned resource provider for compute-1
- add new compute with the same name: compute-1
- evacuations are not completed, error during first start occur in logs; second restart finish evacuations

Scenario 3:
- remove compute-1 physically by format disk
- evacuate instances from removed compute-1 to compute-2
- remove compute-1 service from openstack cloud
- remove orphaned resource provider for compute-1
- configure new compute with slightly different name: COMPUTE-1 and add it to openstack cloud
- restart COMPUTE-1 or service nova_compute on COMPUTE-1 as many times as you like
- cold migrate evacuated instances from compute-2 to COMPUTE-1
- accept migrations
- restart COMPUTE-1
- instances from COMPUTE-1 are GONE from libvirt (they exists in OpenStack)
- evacuations from compute-1 will be never completed (until instances got on COMPUTE-1 and COMPUTE-1 will be restarted)

Scenarios to point 1, 2, 3: