Comment 1 for bug 1871925

Revision history for this message
melanie witt (melwitt) wrote :

This issue can occur if the attempt to update the instance mapping with a cell_id fails due to a DBError.

There are three places we update the instance mapping with a cell.

  * Putting an instance in cell0 due to a failure to schedule [1]
  * Successful schedule to a cell at the first schedule [2]
  * While cleaning up build artifacts when an instance is deleted while in the middle of building [3]

To fix this bug, we need to figure out what we should do if an attempt to update the instance mapping record fails.

Some ideas:

  * delete the instance record to prevent orphaning it ... but note that this can also fail if it too hits DBError. And can we fill in instance fault information in the build request? How will the user be able to know what happened to their instance?
  * retry instance mapping cell_id update. How many times?

[1] https://github.com/openstack/nova/blob/7a71408a79dc81f344ee6c7760fa881afb935dfc/nova/conductor/manager.py#L1424
[2] https://github.com/openstack/nova/blob/7a71408a79dc81f344ee6c7760fa881afb935dfc/nova/conductor/manager.py#L1686-L1713
[3] https://github.com/openstack/nova/blob/7a71408a79dc81f344ee6c7760fa881afb935dfc/nova/conductor/manager.py#L1757