OpenStack Compute (nova)

VMs cannot be terminated if compute host is dead

Bug #872899 reported by Gavin B on 2011-10-12

This bug affects 6 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Medium	Joe Gordon	OpenStack Compute (nova) 2012.2 "folsom"

Bug Description

We have seen this issue in a Diablo-2 setup. If a compute server is down (nova-compute not running / node crashed) the VMs hosted on that server cannot be terminated - hence are consuming instance/memory/floating_ip/ ... quota. A temporary crash / halt can be fixed easily enough by a host reboot, but a permanent host death is not so easy to fix.

We need to have some way of updating the DB to wipe an instance even if the appropriate host is not contactable - and of having nodes check on boot if "their" VMs are still all there.

Version = 2011.3-d2 + some bug fixes.

Tags:

Gavin B (gavin-brebner-orange) on 2011-10-18

Changed in nova:
status:	New → Confirmed
importance:	Undecided → Low

Revision history for this message

Joe Gordon (jogo) wrote on 2012-03-01:

What do expect to happen when:

a) nova-compute stops working, but the physical machine is up

b) the nova-compute server dies

And we can't necessarily differentiate the two.

Revision history for this message

semy (semyazz) wrote on 2012-05-04:

Why this bug has "Low" priority? I think it's critical for users. Instance is hanging on Rebooting/Deleting. Openstack should remove instances from dead hosts and show proper warning to users or even run deleted instances' snapshots on other hosts. Or something like that.

Thierry Carrez (ttx) on 2012-05-21

Changed in nova:
importance:	Low → Medium

Tiago Mello (timello) on 2012-08-09

Changed in nova:
assignee:	nobody → Tiago Rodrigues de Mello (tmello)

Tiago Mello (timello) on 2012-08-09

Changed in nova:
assignee:	Tiago Rodrigues de Mello (tmello) → nobody

Revision history for this message

Tong Li (litong01) wrote on 2012-08-10:

At this moment, the only way I can figure out is to remove related records from Nova DB about the dead VM.
I remember there was a thread awhile back discussing this issue. The problem seems that there is no way to distinguish if the actual VM went dead or the host went dead. Also some status problems for VM, it was a long discussion. My proposal is to add a command to nova-manage to actually remove the db records, so the removal of a VM is completely up to the human who performs this command, he or she will be responsible to determine the real cause.

Revision history for this message

Sam Stoelinga (sammiestoel) wrote on 2012-08-28:

Couldn't we do a check using nova.utils.service_is_up(service) ? If it's not up, remove the record from DB.

service = db.service_get_by_host(instance['host']
service_is_up(service)

Revision history for this message

Joe Gordon (jogo) wrote on 2012-08-28:

Sam, If a compute node goes down for a finite period of time, we want to leave the record in the DB to potentially recover the VMs when the compute node powers up.

Tong, adding a command to nova-manage to remove records sounds like a good compromise.

Revision history for this message

Sam Stoelinga (sammiestoel) wrote on 2012-08-29:

Hi Joe,

Thanks for replying me! I'm still a newb with Openstack and learning a lot about it, but looking at the terminate_instance functionalliy, I can see that the record of the instance is being destroyed anyway:

A self.db.instance_destroy(context, instance_uuid) is being called in ComputeManager._delete_instance after the instance gets shutdown and volumes cleaned up etc.

_delete_instance is being called from terminate_instance, so in fact the record is not left in the DB, or maybe it is. But I think we should also call the self.db.instance_destroy function if the host is not up, it just means we don't have to shut down the instance, because it wasn't functioning anyway?

My reasoning: They are terminating so why not remove the record from the database as that is what happens anyway when you terminate an instance, the only difference being we don't have to shutdown.

A flaw in this approach, if the host comes up again the resources(Volumes) should still be cleaned up I guess?

Joe Gordon (jogo) on 2012-08-30

Changed in nova:
assignee:	nobody → Joe Gordon (joe-gordon0)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-08-31: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/12231

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

Joe Gordon (jogo) wrote on 2012-08-31:

Sam, I am working on a patch to do exactly what you outlined

Revision history for this message

Sam Stoelinga (sammiestoel) wrote on 2012-08-31:

Hehe nice job, I also wanna get started on contributing :) Just looked at your patch.

Joe Gordon (jogo) on 2012-09-13

tags:

added: folsom-rc1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-09-14: Fix merged to nova (master)

#10

Reviewed: https://review.openstack.org/12231
Committed: http://github.com/openstack/nova/commit/77dd6a0b37652bc163d4ad3083e29af55f2b9a5d
Submitter: Jenkins
Branch: master

commit 77dd6a0b37652bc163d4ad3083e29af55f2b9a5d
Author: Joe Gordon <email address hidden>
Date: Fri Aug 31 00:04:33 2012 +0000

Allow for deleting VMs from down compute nodes.

Fix bug 872899

    If compute node service_is_up returns false, just delete the VM from
    the database. If compute node recovers, setting
    running_deleted_instance_action=reap will clean up the node.

Change-Id: Ibb5f1e22c2e482d304c59a485a04b882ead0c67d

Changed in nova:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2012-09-19

Changed in nova:
milestone:	none → folsom-rc1
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2012-09-27

Changed in nova:
milestone:	folsom-rc1 → 2012.2

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.