Resize/migrate cannot reschedule

Bug #1263044 reported by hougangliu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Opinion
Wishlist
Unassigned

Bug Description

In nova/compute/manager.py: def prep_resize()

The design intends to catch every host error exception and then reschedule for resize/migrate like below:

        with self._error_out_instance_on_exception(context, instance['uuid'],
                                                   reservations):
            self.conductor_api.notify_usage_exists(
                    context, instance, current_period=True)
            self._notify_about_instance_usage(
                    context, instance, "resize.prep.start")
            try:
                self._prep_resize(context, image, instance,
                                  instance_type, reservations,
                                  request_spec, filter_properties,
                                  node)
            except Exception: <<<<<intend to catch host exception, and reschedule for resize/migrate
                # try to re-schedule the resize elsewhere:
                exc_info = sys.exc_info()
                self._reschedule_resize_or_reraise(context, image, instance,
                        exc_info, instance_type, reservations, request_spec,
                        filter_properties)

However, in self._prep_resize(), it would cast request of 'resize_instance()'. Thus, self._prep_resize() would return before resize_instance() finished. And resize_instance() may throw exception and reschedule is in need, but the exception would not be caught by prep_resize() and reschedule would not work.

Revision history for this message
Guangya Liu (Jay Lau) (jay-lau-513) wrote :

Perhaps we can add some retry logic in self._prep_resize

Changed in nova:
status: New → Confirmed
assignee: nobody → sahid (sahid-ferdjaoui)
Revision history for this message
Sahid Orentino (sahid-ferdjaoui) wrote :

We need to wait for the response, if an exception occurs. What about using call instead of cast in nova/compute/rpcapi.py ?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/64284

Changed in nova:
status: Confirmed → In Progress
Changed in nova:
assignee: sahid (sahid-ferdjaoui) → nobody
importance: Undecided → Medium
tags: added: compute
Tracy Jones (tjones-i)
Changed in nova:
status: In Progress → Triaged
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

Have to check the trunk to check if that bug has been fixed. Some paches came in about the subject, so that needs to be verified.

Changed in nova:
assignee: nobody → Sylvain Bauza (sylvain-bauza)
Sean Dague (sdague)
Changed in nova:
status: Triaged → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/172044

Changed in nova:
assignee: Sylvain Bauza (sylvain-bauza) → sahid (sahid-ferdjaoui)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by sahid (<email address hidden>) on branch: master
Review: https://review.openstack.org/172044

Revision history for this message
melanie witt (melwitt) wrote :

Resetting to New as the proposed patch was abandoned and this bug is also old and needs to be reverified

Changed in nova:
assignee: sahid (sahid-ferdjaoui) → nobody
importance: Medium → Undecided
status: In Progress → New
liunian (839274949-c)
Changed in nova:
assignee: nobody → liunian (839274949-c)
Revision history for this message
Alexis Lee (alexisl) wrote :

@melwitt: setting to Confirmed as sahid and sdague have previously confirmed this bug

Changed in nova:
status: New → Confirmed
Revision history for this message
Alexis Lee (alexisl) wrote :

OK after conferring with markus_z, this is an old bug so setting to Incomplete.

@hougangliu please can you try to reproduce? @sahid, @sdague also free to try.

Changed in nova:
status: Confirmed → Incomplete
Revision history for this message
Alexis Lee (alexisl) wrote :

This seems like a Tasks thing to me, specifically johnthetubaguy says:

"hmm, maybe, its the orchestration piece we want there I guess. The
original plan was to first do the move to conductor work for resize,
migrate and live migrate, the inter-compute stuff should be driven
centrally by the conductor, then we can wrap more structured handling
around that. If I am remembering correctly (I had a blueprint on this I
never had time to complete)"

So the long and short is, fixing this bug will require a lot of work (still worth trying) and a lot of care to fit in with project plans. We invented tag requires-large-refactor to mark this.

tags: added: tasks
tags: added: requires-large-refactor
removed: tasks
Revision history for this message
Sean Dague (sdague) wrote :

This is a bigger feature / structural change. And needs to come in through the specs process.

Changed in nova:
status: Incomplete → Opinion
importance: Undecided → Wishlist
assignee: liunian (839274949-c) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.