Comment 0 for bug 1161657

Revision history for this message
Joshua Harlow (harlowja) wrote :

As documented at https://review.openstack.org/#/c/25075/2/nova/compute/manager.py there are cases in the compute manager that cause the database, network, or instances themselves to be in a inconsistent (or wrong entirely) state. It would be useful to verify that when a plugin is called that there is a defined interface and known set of errors that said interface can throw, and how to rollback from all of those allowed set of errors. The top level manager code must correctly rollback state (as needed) so that the compute node is left in a pristine state when a underlying driver does not behave correctly (or just doesn't work).

Lets first attack on function, a critical path one, _run_instance(), and its direct _spawn(), _prep_block_device()

Certain calls noted:

- Deallocating networks/volumes (not always done) -> _setup_block_device_mapping is never rolledback...
- Un-preparing a block device (on later failure)
- A driver can affect the macs for an instance (self.driver.macs_for_instance) and since this is 3rd party driver code, if said driver 'locks' said macs (via whatever mechanism) then there is future macs not rolledback.