Evacuation failing in networking-ovn with VirtualInterfaceCreateException

Bug #1840876 reported by Terry Wilson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-ovn
Fix Released
High
Terry Wilson

Bug Description

Instance evacuation is failing in an environment deployed with OVN with below errors after time out.

~~~
2019-07-08 05:55:34.522 1 WARNING nova.virt.libvirt.driver [req-665011e5-be4c-43ac-a002-7ba661caeb46 7b5516ddd8a9477388a1f4e8e0764fa2 3d769c76682347f597643ad3509b5354 - default default] [instance: XXXXXX-269e-XXXXXX-221e3ae0739] Timeout waiting for [('network-vif-plugged', u'065645c2-c485-44a0-84ba-4336b9fcd41d')] for instance with vm_state active and task_state rebuild_spawning.: Timeout: 300 seconds
~~~

~~~
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [req-665011e5-be4c-43ac-a002-7ba661caeb46 7b5516ddd8a9477388a1f4e8e0764fa2 3d769c76682347f597643ad3509b5354 - default default] [instance:
XXXXXX-269e-XXXXXX-221e3ae0739] Setting instance vm_state to ERROR: VirtualInterfaceCreateException: Virtual Interface creation failed
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [instance: XXXXXX-269e-XXXXXX-221e3ae0739] Traceback (most recent call last):
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [instance: XXXXXX-269e-XXXXXX-221e3ae0739] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7559, in _error_out_instance_on_exception
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [instance: XXXXXX-269e-XXXXXX-221e3ae0739] yield
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [instance: XXXXXX-269e-XXXXXX-221e3ae0739] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2904, in rebuild_instance
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [instance: XXXXXX-269e-XXXXXX-221e3ae0739] migration, request_spec)
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [instance: XXXXXX-269e-XXXXXX-221e3ae0739] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2966, in _do_rebuild_instance_with_claim
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [instance: XXXXXX-269e-XXXXXX-221e3ae0739] self._do_rebuild_instance(*args, **kwargs)
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [instance: XXXXXX-269e-XXXXXX-221e3ae0739] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3123, in _do_rebuild_instance
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [instance: XXXXXX-269e-XXXXXX-221e3ae0739] self._rebuild_default_impl(**kwargs)
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [instance: XXXXXX-269e-XXXXXX-221e3ae0739] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2810, in _rebuild_default_impl
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [instance: XXXXXX-269e-XXXXXX-221e3ae0739] block_device_info=new_block_device_info)
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [instance: XXXXXX-269e-XXXXXX-221e3ae0739] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3114, in spawn
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [instance: XXXXXX-269e-XXXXXX-221e3ae0739] destroy_disks_on_failure=True)
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [instance: XXXXXX-269e-XXXXXX-221e3ae0739] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5597, in _create_domain_and_network
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [instance: XXXXXX-269e-XXXXXX-221e3ae0739] raise exception.VirtualInterfaceCreateException()
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [instance: XXXXXX-269e-XXXXXX-221e3ae0739] VirtualInterfaceCreateException: Virtual Interface creation failed
2019-07-08 05:55:36.768 1 ERROR nova.compute.manager [instance: XXXXXX-269e-XXXXXX-221e3ae0739]

Changed in networking-ovn:
assignee: nobody → Terry Wilson (otherwiseguy)
importance: Undecided → High
Changed in networking-ovn:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/678239

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/678240

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/678241

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.opendev.org/677603
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=4f73c0f016ddda50e25fe7592bb67435384e38f2
Submitter: Zuul
Branch: master

commit 4f73c0f016ddda50e25fe7592bb67435384e38f2
Author: Terry Wilson <email address hidden>
Date: Tue Aug 20 19:09:06 2019 -0500

    Fix evacuation when host dies uncleanly

    If a host crashes and ovn-controller doesn't clean up the
    Port_Binding chassis column, the Logical_Switch_Port is never set
    to DOWN, so we do not detect the up->down transition and update the
    port status with the driver. This patch watches for Port_Binding
    chassis column changes and if the port is up goes ahead and updates
    the driver.

    Change-Id: I8a5e6ad2e98b79a140977ce003609ed5b21e3499
    Closes-Bug: #1840876

Changed in networking-ovn:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (stable/stein)

Change abandoned by Terry Wilson (<email address hidden>) on branch: stable/stein
Review: https://review.opendev.org/678239

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (stable/rocky)

Change abandoned by Terry Wilson (<email address hidden>) on branch: stable/rocky
Review: https://review.opendev.org/678240

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (stable/queens)

Change abandoned by Terry Wilson (<email address hidden>) on branch: stable/queens
Review: https://review.opendev.org/678241

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 7.0.0.0b1

This issue was fixed in the openstack/networking-ovn 7.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (stable/stein)

Reviewed: https://review.opendev.org/678239
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=36f06a32bbfc34d8e4e49d2a3f5fd385e4b4f863
Submitter: Zuul
Branch: stable/stein

commit 36f06a32bbfc34d8e4e49d2a3f5fd385e4b4f863
Author: Terry Wilson <email address hidden>
Date: Tue Aug 20 19:09:06 2019 -0500

    Fix evacuation when host dies uncleanly

    This is a squashed backport of fix and relevant test

    1. Fix evacuation when host dies uncleanly

    If a host crashes and ovn-controller doesn't clean up the
    Port_Binding chassis column, the Logical_Switch_Port is never set
    to DOWN, so we do not detect the up->down transition and update the
    port status with the driver. This patch watches for Port_Binding
    chassis column changes and if the port is up goes ahead and updates
    the driver.

    Conflicts:
            networking_ovn/ovsdb/ovsdb_monitor.py

    (cherry picked from commit 4f73c0f016ddda50e25fe7592bb67435384e38f2)

    2. Test PortBindingChassisUpdateEvent

    While other tests cover what happens when a Port_Binding is updated
    when this event doesn't match, there was no test that covered the
    case when it does. This test does test primarily implementation
    details, but will have to do until we can test the actual use case
    which involves uncleanly killing a compute server and doing a
    'host evacuate' from it.

    Conflicts:
            networking_ovn/tests/unit/ovsdb/test_ovsdb_monitor.py

    (cherry picked from commit f234deba187e8b56c574aa0d6a24cf69cbbd7cb1)

    Change-Id: I8a5e6ad2e98b79a140977ce003609ed5b21e3499
    Closes-Bug: #1840876

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (stable/rocky)

Reviewed: https://review.opendev.org/678240
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=964205b2a927e460ea5fa76ed4c6a052c979e741
Submitter: Zuul
Branch: stable/rocky

commit 964205b2a927e460ea5fa76ed4c6a052c979e741
Author: Terry Wilson <email address hidden>
Date: Tue Aug 20 19:09:06 2019 -0500

    Fix evacuation when host dies uncleanly

    This is a squashed backport of fix and relevant test

    1. Fix evacuation when host dies uncleanly

    If a host crashes and ovn-controller doesn't clean up the
    Port_Binding chassis column, the Logical_Switch_Port is never set
    to DOWN, so we do not detect the up->down transition and update the
    port status with the driver. This patch watches for Port_Binding
    chassis column changes and if the port is up goes ahead and updates
    the driver.

    Conflicts:
            networking_ovn/ovsdb/ovsdb_monitor.py

    (cherry picked from commit 4f73c0f016ddda50e25fe7592bb67435384e38f2)

    2. Test PortBindingChassisUpdateEvent

    While other tests cover what happens when a Port_Binding is updated
    when this event doesn't match, there was no test that covered the
    case when it does. This test does test primarily implementation
    details, but will have to do until we can test the actual use case
    which involves uncleanly killing a compute server and doing a
    'host evacuate' from it.

    Conflicts:
            networking_ovn/tests/unit/ovsdb/test_ovsdb_monitor.py

    (cherry picked from commit f234deba187e8b56c574aa0d6a24cf69cbbd7cb1)

    Change-Id: I8a5e6ad2e98b79a140977ce003609ed5b21e3499
    Closes-Bug: #1840876
    (cherry picked from commit 36f06a32bbfc34d8e4e49d2a3f5fd385e4b4f863)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (stable/queens)

Reviewed: https://review.opendev.org/678241
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=cf66ac597d0c2ae3239ce0650248038763157086
Submitter: Zuul
Branch: stable/queens

commit cf66ac597d0c2ae3239ce0650248038763157086
Author: Terry Wilson <email address hidden>
Date: Tue Aug 20 19:09:06 2019 -0500

    Fix evacuation when host dies uncleanly

    This is a squashed backport of fix and relevant test

    1. Fix evacuation when host dies uncleanly

    If a host crashes and ovn-controller doesn't clean up the
    Port_Binding chassis column, the Logical_Switch_Port is never set
    to DOWN, so we do not detect the up->down transition and update the
    port status with the driver. This patch watches for Port_Binding
    chassis column changes and if the port is up goes ahead and updates
    the driver.

    Conflicts:
            networking_ovn/ovsdb/ovsdb_monitor.py

    (cherry picked from commit 4f73c0f016ddda50e25fe7592bb67435384e38f2)

    2. Test PortBindingChassisUpdateEvent

    While other tests cover what happens when a Port_Binding is updated
    when this event doesn't match, there was no test that covered the
    case when it does. This test does test primarily implementation
    details, but will have to do until we can test the actual use case
    which involves uncleanly killing a compute server and doing a
    'host evacuate' from it.

    Conflicts:
            networking_ovn/tests/unit/ovsdb/test_ovsdb_monitor.py

    (cherry picked from commit f234deba187e8b56c574aa0d6a24cf69cbbd7cb1)

    Closes-Bug: #1840876
    (cherry picked from commit 36f06a32bbfc34d8e4e49d2a3f5fd385e4b4f863)
    (cherry picked from commit 562f50798bfeb325de2f2b1d443bba05d6b8cb6d)

    3. BaseEvent backport from dd53505c6418888ff74b383a7a824d04b20b6354
    This is needed because the version of ovsdbapp in queens does not have
    the match_fn() changes.
    Change-Id: I8a5e6ad2e98b79a140977ce003609ed5b21e3499

tags: added: in-stable-queens
tags: added: networking-ovn-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 4.0.4

This issue was fixed in the openstack/networking-ovn 4.0.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 6.0.1

This issue was fixed in the openstack/networking-ovn 6.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 5.1.0

This issue was fixed in the openstack/networking-ovn 5.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.