When a node dies, its instances should be marked !running

Bug #661214 reported by Soren Hansen
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
justinsb

Bug Description

It's the owning node's responsibility to change the state of instances, but if the node dies, this obviously doesn't happen.

There a multiple scenarios here:

1) Nova on the host has crashed, but the VM's are still alive.
2) The machine has died and taken nova and the vm's with it to the grave.
3) Nothing is wrong with neither nova, nor the vm's, but the network connection has been severed, so we can't tell.

For 1) we need a notification mechanism of sorts. A really simple (to minimise potential for crashes) agent should be monitoring the components and raising an alert in case of failure or just try restarting nova.
For 2) we need to at the very least mark the instances as not running anymore. To make this happen, something must look through the list of registered compute nodes and see if they've failed to provide a heartbeat recently, and mark their VM's accordingly.
3) is more involved. We'll need a big discussion about network partitioning (in CAP parlance) and such at some point, and the outcome of that will likely make this pretty straightforward. Here's hoping.

Related branches

Eric Day (eday)
Changed in nova:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Thierry Carrez (ttx) wrote :

Created spec so that we look into it for Cactus. Downgrading importance, since fixing bug 661262 should allow us to cover the most obvious use case (nova-compute or the whole system crashes, but restarting it will update VM status).

Changed in nova:
importance: High → Medium
Changed in nova:
assignee: nobody → justinsb (justin-fathomdb)
Thierry Carrez (ttx)
Changed in nova:
status: Confirmed → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → 2011.2
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.