needs better rollbacks

Bug #1161657 reported by Joshua Harlow
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)

Bug Description

As documented at there are cases in the compute manager that cause the database, network, or instances themselves to be in a inconsistent (or wrong entirely) state. It would be useful to verify that when a plugin is called that there is a defined interface and known set of errors that said interface can throw, and how to rollback from all of those allowed set of errors. The top level manager code must correctly rollback state (as needed) so that the compute node is left in a pristine state when a underlying driver does not behave correctly (or just doesn't work).

Lets first attack one function, a critical path one, _run_instance(), and its direct _spawn(), _prep_block_device()

Certain calls noted:

- Deallocating networks/volumes (not always done) -> _setup_block_device_mapping is never rolledback...
- Un-preparing a block device (on later failure)
- A driver can affect the macs for an instance (self.driver.macs_for_instance) and since this is 3rd party driver code, if said driver 'locks' said macs (via whatever mechanism) then there is future macs not rolledback.

Revision history for this message
Andrew Laski (alaski) wrote :

I am hugely in favor of cleaning this up. I also think we could do a better job of pre-emptively checking for certain things before starting any work in the manager/driver.

Changed in nova:
status: New → Confirmed
importance: Undecided → Wishlist
Joshua Harlow (harlowja)
description: updated
Revision history for this message
Sean Dague (sdague) wrote :

This isn't really a bug, this is really something which should come in via the specs process

Changed in nova:
status: Confirmed → Opinion
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers