Comment 3 for bug 1272623

Revision history for this message
Robert Collins (lifeless) wrote :

On startup nova-compute attempts to restore the state of the node to it's internal model. E.g. start vms that are meant to be running, fully delete vms that are means to be purged from disk etc.

We also try to start VMs in state 'ERROR' here, which AFAICT doesn't happen in any other circumstance. This is conceptually problematic because ERROR is used to indicate that nova has given up on the VM, rather than it being in the middle of an operation which needs resuming.

One particular thing that can happen is that once a VM is in state ERROR, there is no guarantee that the axioms for it are maintained - it might not have had networking allocated, for instance.

The thing that caused this particular backtrace here was an instance of that: nova-compute error the VM before writing the instance id to the bm_nodes table (which is what captures the association of instance to node). This happened quite legitimately - the scheduler was trying to schedule to an already used node (due to a different issue - but the scheduler is intrinsically racy, so this should be expected in general). Then when restarted nova-compute attempted to restart the ERROR state VM, and threw an exception (rightly so, attempting to power on nothing is an error)