Cannot delete / confirm / revert resize an instance if nova-compute crashes after VERIFY_RESIZE
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
How to reproduce the bug:
nova boot ... vm1
nova migrate vm1 (or resize)
wait for the vm status to reach VERIFY_RESIZE
stop nova-compute on the host where vm1 is running
nova delete vm1
Error: The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-be1379bc-
restart quickly the nova-compute service, before the status becomes "XXX" in:
nova-manage service list
Note: the vm is still running on the hypervisor.
nova show vm1
VM status is still: VERIFY_RESIZE
nova resize-confirm vm1
ERROR: Cannot 'confirmResize' while instance is in task_state deleting (HTTP 409) (Request-ID: req-9660c776-
nova resize-revert vm1
ERROR: Cannot 'revertResize' while instance is in task_state deleting (HTTP 409) (Request-ID: req-3cf0141b-
nova delete vm1
The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-2cb17333-
nova-api log when running nova delete:
http://
Notes:
Tests have been performed using the Hyper-V driver, but the issue seems to be unrelated to the driver.
After stopping nova-compute, by waiting long enough for the service to be marked as XXX in "nova-manage service list", issuing "nova delete vm1" succeeds.
Changed in nova: | |
assignee: | nobody → Dan Smith (danms) |
importance: | Undecided → High |
status: | New → Triaged |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
I have been attempting to reproduce this issue without success. I'm not sure what state occurs with the specific nova-compute crash you detailed above but I can't reach it.
Could you try logging what information is getting inserted into the db by the "self._ record_ action_ start(context, instance, instance_ actions. CONFIRM_ RESIZE) " right above the confirm_resize in _delete() in the compute api? Then comparing that against the request id returned by the exception.