OpenStack Compute (nova)

vm_state ERROR vm undeletable if first delete attempt does not succeed.

Bug #1281324 reported by Robert Collins on 2014-02-17

This bug affects 7 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Medium	Tiago Mello	OpenStack Compute (nova) 2014.1 "icehouse"

Bug Description

We had a neutron failure in our cloud, which lead to a bunch of VM's in state ERROR.. we've repaired neutron but now we can't delete:

ERROR: Cannot 'forceDelete' while instance is in vm_state error (HTTP 409) (Request-ID:
req-1c4a88c3-4ea1-45a8-b987-629a69b4af06)

or stop/start
nova stop 01199ed9-b3c3-4ee9-a482-bdfdc7347ce1
ERROR: Instance 01199ed9-b3c3-4ee9-a482-bdfdc7347ce1 in task_state deleting. Cannot stop while the instance is in this state. (HTTP 400) (Request-ID: req-18d58b8d-b360-4b37-b671-34624a6dade4)
(ci-overcloud)robertc@lifelesshp:~/work$ nova start 01199ed9-b3c3-4ee9-a482-bdfdc7347ce1
ERROR: Instance 01199ed9-b3c3-4ee9-a482-bdfdc7347ce1 in vm_state error. Cannot start while the instance is in this state. (HTTP 400) (Request-ID: req-b46c0ee6-8ed8-41c3-b400-72f76429209a)

normal 'delete' doesn't error.. but doesn't delete the VM either.

The problem is that nothing is cancelling the task state, so the VMs are staying stuck indefinitely.

See original description

Revision history for this message

Robert Collins (lifeless) wrote on 2014-02-18:

I can't see anything in the logs for nova-api or nova-compute w.r.t.

Revision history for this message

Robert Collins (lifeless) wrote on 2014-02-18:

Ok, so it hits this:
LOG.info(_('Instance is already in deleting state, '
'ignoring this request'), instance=instance)

but - the nova compute process for that VM has been restarted and the VM isn't being deleted, Also that message level of info is wrong - default logging won't show this, and this is IMO an usual situation where admins will be scratching their head.

Robert Collins (lifeless) on 2014-02-18

summary:	- vm_state ERROR vm undeletable + vm_state ERROR vm undeletable if first delete attempt does not succeed.
description:	updated

Revision history for this message

Robert Collins (lifeless) wrote on 2014-02-18:

AHHA, and so here's how the problem happened in the first place:
- the compute node wasn't reachable from the api when the delete was submitted: so when the API calls delete, task_state=deleting is set.
- but the compute node never got the message from rabbit, so task_state=None is never set.

Revision history for this message

Robert Collins (lifeless) wrote on 2014-02-18:

^ example vm. Note the STATUS ERROR power state = RUNNING

Revision history for this message

Robert Collins (lifeless) wrote on 2014-02-18:

And this if block -

         if (instance.vm_state == vm_states.SOFT_DELETED or
            (instance.vm_state == vm_states.ERROR and
            instance.task_state != task_states.RESIZE_MIGRATING)):
            LOG.debug(_("Instance is in %s state."),
                      instance.vm_state, instance=instance)

is the one that fails to delete these on startup - because they are in ERROR + != RESIZE_MIGRATING

Tiago Mello (timello) on 2014-02-19

Changed in nova:
assignee:	nobody → Tiago Rodrigues de Mello (timello)

Revision history for this message

Tiago Mello (timello) wrote on 2014-02-19:

The code below in the same _init_instance function is suppose to handle the case where task_state is in 'DELETING'... but as you pointed out, the first 'if' block stops the process...

Changed in nova:
status:	New → Confirmed
importance:	Undecided → Medium

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-02-20: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/75047

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

Steve Kowalik (stevenk) wrote on 2014-02-24:

https://review.openstack.org/#/c/74240/ pre-dates your change, but I'm not certain why the bot did not update this bug.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-03-11: Fix merged to nova (master)

Reviewed: https://review.openstack.org/74240
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=556ab844c823dd364032d59ab1b61780243cbfd1
Submitter: Jenkins
Branch: master

commit 556ab844c823dd364032d59ab1b61780243cbfd1
Author: Robert Collins <email address hidden>
Date: Tue Feb 18 16:03:23 2014 +1300

Delete ERROR+DELETING VMs during compute startup.

    We should perhaps do this check during message bus reconnection as
    well.. Anyhow, if a compute node is offline during a nova API call
    to delete an instance, and the rabbit message is lost for some
    reason (or alternatively if the delete method throws an error)
    then the task state is not cleared and won't be cleared on compute
    restart, leaving it wedged forever.

    Change-Id: Ie0a47958eb0fb58307902437a95634d5f54f74f3
    Fixes-bug: #1281324
    Co-Authored-By: Steve Kowalik <email address hidden>

Changed in nova:
status:	In Progress → Fix Committed

Russell Bryant (russellb) on 2014-03-17

Changed in nova:
milestone:	none → icehouse-rc1

Thierry Carrez (ttx) on 2014-03-31

Changed in nova:
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2014-04-17

Changed in nova:
milestone:	icehouse-rc1 → 2014.1

Revision history for this message

Sacha Yunusic (sacha-m) wrote on 2015-03-12:

#10

Is there any update on this? I have a similar problem. Even though I don't want to delete the instance, but turn it on.
This is my instance state:
[_ID_] | [_Name_] | ACTIVE | - | Shutdown | admin_net=10.10.0.13, 10.222.221.6 |
When I try to start it from the cli, this is what I get:
ERROR (Conflict): Instance [_ID_] in vm_state active. Cannot start while the instance is in this state. (HTTP 409)
Can I save my instance?

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.