Can't delete instance stuck in deleting task

Bug #1461055 reported by Tzach Shefi
32
This bug affects 7 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Undecided
Unassigned

Bug Description

Description of problem: On a Juno HA deployment, nova over shared nfs storage, when I deleted an instance it was "deleted":

2015-06-02 11:57:36.273 3505 INFO nova.virt.libvirt.driver [req-4cc54412-a449-4c7a-bbe1-b21d202bcfe7 None] [instance: 7b6c8ad5-7633-4d53-9f84-93b12a701cd3] Deletion of /var/lib/nova/instances/7b6c8ad5-7633-4d53-9f84-93b12a701cd3_del complete

Also instance wasn't found with virsh list all.
Yet nova list and Horizon both still show this instance as stuck in task deleting, two hours+ pasted since I deleted it.

Version-Release number of selected component (if applicable):
rhel 7.1
python-nova-2014.2.2-19.el7ost.noarch
openstack-nova-compute-2014.2.2-19.el7ost.noarch
python-novaclient-2.20.0-1.el7ost.noarch
openstack-nova-common-2014.2.2-19.el7ost.noarch

How reproducible:
Unsure, it doesn't happen with every instance deletion, but happened more than this one time.

Steps to Reproduce:
1. Boot an instance
2. Delete instance
3. Instance is stuck in deleting task on nova/Horozon.

Actual results:
Stuck with a phantom "deleting" instance, which is basically already dead from Virsh's point of view.

Expected results:
Instance should get deleted including from nova list/Horizon.

Additional info:

Workaround doing openstack-service restart for nova on compute node fixed my problem. Instance is totally gone from Nova/Horizon.

instance virsh id instance-00000d4d.log
instanceID 7b6c8ad5-7633-4d53-9f84-93b12a701cd3

| OS-EXT-STS:power_state | 1 |
| OS-EXT-STS:task_state | deleting |
| OS-EXT-STS:vm_state | deleted |
| OS-SRV-USG:launched_at | 2015-05-28T11:06:33.000000 |
| OS-SRV-USG:terminated_at | 2015-06-02T08:57:37.000000 |
...... |
| status | DELETED

Attached nova log from compute and controller.

Tags: conductor db
Revision history for this message
Tzach Shefi (tshefi) wrote :
  • logs Edit (776.7 KiB, application/x-tar)
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

@Tzach Shefi:

I observed this too, but was never able to create steps for a reproduction.
Another workaround would be to delete the db entries manually [1].

[1] http://stackoverflow.com/questions/22194965/openstack-can-not-delete-instance-from-dashboard

tags: added: conductor db
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

In general, we recommend using "noc

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

Sorry, hitted tab too fast.

So, was saying that in general, we recommend to run 'nova reset-state <vm>' to set the state to the desired option and then delete it again.

Could you please try it ?

Changed in nova:
status: New → Invalid
Revision history for this message
Tzach Shefi (tshefi) wrote :

Stuck instance is gone, once it happens again I'll give it a try.

Revision history for this message
Tzach Shefi (tshefi) wrote :

Just ran into this again
1. #nova reset-state <instanceID> --active
2. Instance state changed to active
3. #nova delete <instanceID> still doesn't work , vm_state = active, task_status = deleting
Waited 10 minutes nothing happens.

4. Same test this time used # nova force-delete, end result the same, instance still stuck in deleting.

virsh list --all on compute node no such "instance_name" remains.

Revision history for this message
Tzach Shefi (tshefi) wrote :

One more clue, out of my two compute nodes total of 66 instances.
4 instances are stuck in task_state deleting.
6 have a status of deleted, yet remain with task_state "-"
All of them reside on the same compute node.

Restarting nova service on the effected compute node removed all phantom instances.
But this isn't a solution only a temp workaround.

Revision history for this message
Oleksandr Savatieiev (osavatieiev) wrote :

Hit this issue on Queens. The behavior is the same as above.

Tried already with no success:
- Reset state
- Forced
- Investigating logs shows that nova keeps "checking" those for something
- Once nova-compute service restarted it starts the instance update task and deletes those fake instances.

My personal best guess is that something wrong with the communication channel while Nova compute has its hands full running instances (degraded performance)

Changed in nova:
status: Invalid → Confirmed
Changed in nova:
status: Confirmed → Fix Released
status: Fix Released → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.