When a node dies, its instances should be marked !running
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
justinsb |
Bug Description
It's the owning node's responsibility to change the state of instances, but if the node dies, this obviously doesn't happen.
There a multiple scenarios here:
1) Nova on the host has crashed, but the VM's are still alive.
2) The machine has died and taken nova and the vm's with it to the grave.
3) Nothing is wrong with neither nova, nor the vm's, but the network connection has been severed, so we can't tell.
For 1) we need a notification mechanism of sorts. A really simple (to minimise potential for crashes) agent should be monitoring the components and raising an alert in case of failure or just try restarting nova.
For 2) we need to at the very least mark the instances as not running anymore. To make this happen, something must look through the list of registered compute nodes and see if they've failed to provide a heartbeat recently, and mark their VM's accordingly.
3) is more involved. We'll need a big discussion about network partitioning (in CAP parlance) and such at some point, and the outcome of that will likely make this pretty straightforward. Here's hoping.
Related branches
- Vish Ishaya (community): Approve
- Matt Dietz (community): Approve
- Thierry Carrez (community): Approve
-
Diff: 729 lines (+442/-23)11 files modifiednova/compute/manager.py (+61/-1)
nova/compute/power_state.py (+14/-4)
nova/tests/test_compute.py (+21/-0)
nova/utils.py (+9/-0)
nova/virt/connection.py (+3/-1)
nova/virt/driver.py (+234/-0)
nova/virt/fake.py (+29/-10)
nova/virt/hyperv.py (+17/-2)
nova/virt/libvirt_conn.py (+28/-3)
nova/virt/xenapi/vmops.py (+20/-1)
nova/virt/xenapi_conn.py (+6/-1)
Changed in nova: | |
status: | New → Confirmed |
importance: | Undecided → High |
Changed in nova: | |
assignee: | nobody → justinsb (justin-fathomdb) |
Changed in nova: | |
status: | Confirmed → Fix Committed |
Changed in nova: | |
milestone: | none → 2011.2 |
status: | Fix Committed → Fix Released |
Created spec so that we look into it for Cactus. Downgrading importance, since fixing bug 661262 should allow us to cover the most obvious use case (nova-compute or the whole system crashes, but restarting it will update VM status).