VMware: unable to spin up instance as network not created on host

Bug #1532750 reported by Gary Kotton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
High
xjfl

Bug Description

When using Neutron there are edge cases when the network created by Neutron has not yet been created on the actual host. This results in the VM creation failing as the network is still to be created on the host:

2016-01-08 20:56:29.486 ^[[00;32mDEBUG oslo_vmware.exceptions [^[[00;36m-^[[00;32m] ^[[01;35m^[[00;32mFault InvalidDeviceSpec not matched.^[[00m ^[[00;33mfrom (pid=28979) get_fault_class /usr/local/lib/python2.7/dist-packages/oslo_vmware/exceptions.py:295^[[00m
2016-01-08 20:56:29.486 ^[[01;31mERROR oslo_vmware.common.loopingcall [^[[00;36m-^[[01;31m] ^[[01;35m^[[01;31min fixed duration looping call^[[00m
^[[01;31m2016-01-08 20:56:29.486 TRACE oslo_vmware.common.loopingcall ^[[01;35m^[[00mTraceback (most recent call last):
^[[01;31m2016-01-08 20:56:29.486 TRACE oslo_vmware.common.loopingcall ^[[01;35m^[[00m File "/usr/local/lib/python2.7/dist-packages/oslo_vmware/common/loopingcall.py", line 76, in _inner
^[[01;31m2016-01-08 20:56:29.486 TRACE oslo_vmware.common.loopingcall ^[[01;35m^[[00m self.f(*self.args, **self.kw)
^[[01;31m2016-01-08 20:56:29.486 TRACE oslo_vmware.common.loopingcall ^[[01;35m^[[00m File "/usr/local/lib/python2.7/dist-packages/oslo_vmware/api.py", line 428, in _poll_task
^[[01;31m2016-01-08 20:56:29.486 TRACE oslo_vmware.common.loopingcall ^[[01;35m^[[00m raise task_ex
^[[01;31m2016-01-08 20:56:29.486 TRACE oslo_vmware.common.loopingcall ^[[01;35m^[[00mVimFaultException: Invalid configuration for device '0'.
^[[01;31m2016-01-08 20:56:29.486 TRACE oslo_vmware.common.loopingcall ^[[01;35m^[[00mFaults: ['InvalidDeviceSpec']
^[[01;31m2016-01-08 20:56:29.486 TRACE oslo_vmware.common.loopingcall ^[[01;35m^[[00m

Adding a retry will successfully address this - giving the actual host time to create the network

Gary Kotton (garyk)
Changed in nova:
importance: Undecided → High
tags: added: liberty-backport-potential vmware
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/265764

Changed in nova:
assignee: nobody → Gary Kotton (garyk)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by garyk (<email address hidden>) on branch: master
Review: https://review.openstack.org/265764

Changed in nova:
assignee: Gary Kotton (garyk) → nobody
status: In Progress → Confirmed
Revision history for this message
Giridhar Jayavelu (gjayavelu) wrote :

Instead of retrying the entire vm creation method, it would be good to retry get_network_with_the_name() which is specific to the issue mentioned here.

Changed in nova:
assignee: nobody → Giridhar Jayavelu (gjayavelu)
Revision history for this message
Sarafraj Singh (sarafraj-singh) wrote :

Giridhar,
Are you working on the fix? Please change status to Inprogress if you are, otherwise remove yourself as assignee so someone else can pick it up.

Revision history for this message
Giridhar Jayavelu (gjayavelu) wrote :

Sarafraj,
I didn't get chance to test and complete the patch. I'll revisit after few weeks if no one has picked up this bug. Thanks!

Changed in nova:
assignee: Giridhar Jayavelu (gjayavelu) → nobody
Revision history for this message
Hong Hui Xiao (xiaohhui) wrote :

But, according to the code, if get_network_with_the_name returns None, we should get exception "NetworkNotFoundForBridge", instead of the exception reported in bug description. I think the bug happens when network has been created in compute cluster, however, has not been provisioned to cluster hosts. I would proposal another solution...

Changed in nova:
assignee: nobody → Hong Hui Xiao (xiaohhui)
status: Confirmed → In Progress
Revision history for this message
Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → Confirmed
assignee: Hong Hui Xiao (xiaohhui) → nobody
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

There was a fix proposed https://review.opendev.org/#/c/467589 but it is abandoned due to inactivity

xjfl (xjfl)
Changed in nova:
assignee: nobody → xjfl (xjfl)
xjfl (xjfl)
Changed in nova:
assignee: xjfl (xjfl) → nobody
assignee: nobody → xjfl (xjfl)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.