Cannot delete / confirm / revert resize an instance if nova-compute crashes after VERIFY_RESIZE

Bug #1155800 reported by Alessandro Pilotti
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned

Bug Description

How to reproduce the bug:

nova boot ... vm1
nova migrate vm1 (or resize)

wait for the vm status to reach VERIFY_RESIZE

stop nova-compute on the host where vm1 is running

nova delete vm1

Error: The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-be1379bc-6a5b-41f5-a554-60e02acfdb79)

restart quickly the nova-compute service, before the status becomes "XXX" in:
nova-manage service list

Note: the vm is still running on the hypervisor.

nova show vm1
VM status is still: VERIFY_RESIZE

nova resize-confirm vm1

ERROR: Cannot 'confirmResize' while instance is in task_state deleting (HTTP 409) (Request-ID: req-9660c776-ebc3-4397-a8e2-7ad83e8b6a0f)

nova resize-revert vm1

ERROR: Cannot 'revertResize' while instance is in task_state deleting (HTTP 409) (Request-ID: req-3cf0141b-ee3d-478f-8aa0-89091028a227)

nova delete vm1

The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-2cb17333-6cc9-42ca-baaa-da88ec90153f)

nova-api log when running nova delete:
http://paste.openstack.org/show/33783/

Notes:

Tests have been performed using the Hyper-V driver, but the issue seems to be unrelated to the driver.

After stopping nova-compute, by waiting long enough for the service to be marked as XXX in "nova-manage service list", issuing "nova delete vm1" succeeds.

Dan Smith (danms)
Changed in nova:
assignee: nobody → Dan Smith (danms)
importance: Undecided → High
status: New → Triaged
description: updated
description: updated
description: updated
description: updated
description: updated
Revision history for this message
Andrew Laski (alaski) wrote :

I have been attempting to reproduce this issue without success. I'm not sure what state occurs with the specific nova-compute crash you detailed above but I can't reach it.

Could you try logging what information is getting inserted into the db by the "self._record_action_start(context, instance, instance_actions.CONFIRM_RESIZE)" right above the confirm_resize in _delete() in the compute api? Then comparing that against the request id returned by the exception.

Revision history for this message
Tiantian Gao (gtt116) wrote :

I reproduct the bug.

When termination instance, first check if need to confirm_resize() the instance, then check if the host is alive.

Since confirm_resize is a RPC call, if the host is down, may raise exception, then the deleting will return 500.

Revision history for this message
haruka tanizawa (h-tanizawa) wrote :

I couldn't reproduce this issue, either.
(nova ver.: 308996e00a50201547c0ac74d4f1e1710736d472)

This issues is appeared still now?

Revision history for this message
Dan Smith (danms) wrote :

This is super old, lots has changed since then, and several folks have not been able to reproduce. Please re-open if this is still valid.

Changed in nova:
assignee: Dan Smith (danms) → nobody
importance: High → Undecided
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.