Reschedule of a virtual machine fails with 'NetworkInfo' object has no 'wait'

Bug #1636109 reported by ymadhavi@in.ibm.com
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Esha Seth
Newton
Fix Released
High
Lee Yarwood

Bug Description

When a virtual machine is rescheduled and fails eith some storage exception, getting an error that
'NetworkInfo' object has no attribute 'wait' instead of actual storage issue

In nova/compute/manager

https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2088 is actually throwing above exception because , network_info is a 'NetworkInfoAsyncWrapper' object in first time deploy case and 'NetworkInfo' object in a reschedule case, so during reschedule it is throwing 'wait' attribute is not there.

 def _build_networks_for_instance(self, context, instance,
            requested_networks, security_groups):

        # If we're here from a reschedule the network may already be allocated.
        if strutils.bool_from_string(
                instance.system_metadata.get('network_allocated', 'False')):
            # NOTE(alex_xu): The network_allocated is True means the network
            # resource already allocated at previous scheduling, and the
            # network setup is cleanup at previous. After rescheduling, the
            # network resource need setup on the new host.
            self.network_api.setup_instance_network_on_host(
                context, instance, instance.host)
            return self.network_api.get_instance_nw_info(context, instance) -------- this block gets called for reschedule case which returns NetworkInfo object

        if not self.is_neutron_security_groups:
            security_groups = []

        macs = self.driver.macs_for_instance(instance)
        dhcp_options = self.driver.dhcp_options_for_instance(instance)
        network_info = self._allocate_network(context, instance,
                requested_networks, macs, security_groups, dhcp_options) ----------------- this block is called for deploy on first host which returns NetworkInfoAsyncWrapper which has wait.

        return network_info

Revision history for this message
Matthew Edmonds (edmondsw) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

I can't remember exactly why we needed to do:

https://github.com/openstack/nova/commit/61fc1b9ee11e416aecbf3a29e1d150a53fc890e8

But it probably had something to do with test failures.

See what breaks if you add the wait method back in from:

https://review.openstack.org/#/c/290780/6/nova/network/model.py

My guess unit tests might fail, but maybe you could stub that wait() method out in the unit tests but leave it in at runtime.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Would be helpful if we had a test that shows the failure so we can work on testing fixes.

Changed in nova:
assignee: nobody → Esha Seth (eshaseth)
status: New → In Progress
Revision history for this message
Andrew Laski (alaski) wrote :

The quick fix would be to just re-add the noop wait method to NetworkInfo, the only reason it was removed was as a cleanup so it should not negatively impact anything to add it back.

The better fix may be to modify the workflow a little bit so that NetworkInfo is wrapped in the async wrapper at a clear point that would ensure that the schedule and reschedule case can be handled the same way.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/394854

Changed in nova:
assignee: Esha Seth (eshaseth) → Eric Fried (efried)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Esha Seth (<email address hidden>) on branch: master
Review: https://review.openstack.org/394854
Reason: The fix is checked in under https://review.openstack.org/#/c/394854

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Esha Seth (<email address hidden>) on branch: master
Review: https://review.openstack.org/394854
Reason: The fix is checked in under https://review.openstack.org/#/c/393669

Esha Seth (eshaseth)
Changed in nova:
assignee: Eric Fried (efried) → Esha Seth (eshaseth)
Changed in nova:
assignee: Esha Seth (eshaseth) → Matthew Edmonds (edmondsw)
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → High
Esha Seth (eshaseth)
Changed in nova:
assignee: Matthew Edmonds (edmondsw) → Esha Seth (eshaseth)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/393669
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1b351ef5356e0472aef51221ff02376dd8b42954
Submitter: Jenkins
Branch: master

commit 1b351ef5356e0472aef51221ff02376dd8b42954
Author: Esha Seth <email address hidden>
Date: Fri Nov 4 05:48:32 2016 -0400

    Add a no-op wait method to NetworkInfo

    The normal deploy flow uses a NetworkInfoAsyncWrapper for network
    allocation, and because of that many places have to call that class's
    wait method to make sure it has completed. During a reschedule where
    the network was allocated by a previous build attempt, a NetworkInfo
    instance is retrieved instead, which does not have a wait method. This
    then results in an exception complaining the missing method when it is
    called. This fix addresses that by adding a no-op wait method to the
    NetworkInfo class. Alternatively could have used isinstance or hasattr
    to avoid making wait calls on NetworkInfo, but that could be
    problematic to maintain as more places need to make wait calls in the
    future and may not know to make the isinstance/hasattr check.

    This fixes a regression issue caused by
    61fc1b9ee11e416aecbf3a29e1d150a53fc890e8 ,
    which reverted the previous fix made under
    24a04c405ab2c98e52ea1edf8775489907526c6d

    Change-Id: Id7a71b2eb46ea7df19e7da0afbc0eafa87cac965
    Closes-Bug: 1636109

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/396151

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/396151
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=329cd1649703ac801b95fce28afa99b67f9f45aa
Submitter: Jenkins
Branch: stable/newton

commit 329cd1649703ac801b95fce28afa99b67f9f45aa
Author: Esha Seth <email address hidden>
Date: Fri Nov 4 05:48:32 2016 -0400

    Add a no-op wait method to NetworkInfo

    The normal deploy flow uses a NetworkInfoAsyncWrapper for network
    allocation, and because of that many places have to call that class's
    wait method to make sure it has completed. During a reschedule where
    the network was allocated by a previous build attempt, a NetworkInfo
    instance is retrieved instead, which does not have a wait method. This
    then results in an exception complaining the missing method when it is
    called. This fix addresses that by adding a no-op wait method to the
    NetworkInfo class. Alternatively could have used isinstance or hasattr
    to avoid making wait calls on NetworkInfo, but that could be
    problematic to maintain as more places need to make wait calls in the
    future and may not know to make the isinstance/hasattr check.

    This fixes a regression issue caused by
    61fc1b9ee11e416aecbf3a29e1d150a53fc890e8 ,
    which reverted the previous fix made under
    24a04c405ab2c98e52ea1edf8775489907526c6d

    Change-Id: Id7a71b2eb46ea7df19e7da0afbc0eafa87cac965
    Closes-Bug: 1636109
    (cherry picked from commit 1b351ef5356e0472aef51221ff02376dd8b42954)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.0.0b1

This issue was fixed in the openstack/nova 15.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.3

This issue was fixed in the openstack/nova 14.0.3 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.