Failure in resize_instance after cast to finish_resize still sets instance error state

Bug #1688228 reported by Matthew Booth
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Triaged
Low
Unassigned

Bug Description

This is from code inspection only.

ComputeManager.resize_instance does:

  with self._error_out_instance_on_exception(context, instance,
                                             quotas=quotas):
      ...stuff...

      self.compute_rpcapi.finish_resize(context, instance,
                    migration, image, disk_info,
                    migration.dest_compute, reservations=quotas.reservations)

      ... Responsibility for the instance has now been punted to the destination, but...

      self._notify_about_instance_usage(context, instance, "resize.end",
                                              network_info=network_info)

      compute_utils.notify_about_instance_action(context, instance,
                   self.host, action=fields.NotificationAction.RESIZE,
                   phase=fields.NotificationPhase.END)
      self.instance_events.clear_events_for_instance(instance)

The problem is that a failure in anything after the cast to finish_resize will cause the instance to be put in an error state and its quotas rolled back. This would not be correct, as any error here would be purely ephemeral. The resize operation will continue on the destination regardless, so this would almost certainly result in an inconsistent state.

Tags: compute resize
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/462499

Changed in nova:
assignee: nobody → Matthew Booth (mbooth-9)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matthew Booth (<email address hidden>) on branch: master
Review: https://review.openstack.org/462499
Reason: This came loose from my series fixing this.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Putting back to New to re-triage this.

Changed in nova:
assignee: Matthew Booth (mbooth-9) → nobody
status: In Progress → New
tags: added: compute resize
Revision history for this message
Matt Riedemann (mriedem) wrote :

This is true, and I left comments in the patch (it would still need work). It's also true of a *ton* of post-processing type code all over the compute manager where things in post ops could still fail on unnecessary stuff and result in errors even though the guest might be OK.

Changed in nova:
status: New → Triaged
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.