evacuation tests in nova-live-migration post hook fails with VirtualInterfaceCreateException due to vif-plugged event received before nova starts waiting for it.

Bug #1928922 reported by Balazs Gibizer
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Triaged
Medium
Unassigned

Bug Description

Example POST_FAILURE run
https://zuul.opendev.org/t/openstack/build/4d0d5072c1e0479db616a211e0afda42/logs

https://zuul.opendev.org/t/openstack/build/4d0d5072c1e0479db616a211e0afda42/log/controller/logs/screen-n-cpu.txt#13322

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [None req-778f8336-8e29-4282-a112-91a876303fe3 demo admin] [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] Setting instance vm_state to ERROR: nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] Traceback (most recent call last):

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 7188, in _create_guest_with_network

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] guest = self._create_guest(

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] next(self.gen)

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] File "/opt/stack/nova/nova/compute/manager.py", line 479, in wait_for_instance_event

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] actual_event = event.wait()

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] File "/usr/local/lib/python3.8/dist-packages/eventlet/event.py", line 125, in wait

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] result = hub.switch()

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] File "/usr/local/lib/python3.8/dist-packages/eventlet/hubs/hub.py", line 313, in switch

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] return self.greenlet.switch()

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] eventlet.timeout.Timeout: 300 seconds

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9]

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] During handling of the above exception, another exception occurred:

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9]

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] Traceback (most recent call last):

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] File "/opt/stack/nova/nova/compute/manager.py", line 10090, in _error_out_instance_on_exception

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] yield

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] File "/opt/stack/nova/nova/compute/manager.py", line 3391, in rebuild_instance

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] self._do_rebuild_instance_with_claim(

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] File "/opt/stack/nova/nova/compute/manager.py", line 3473, in _do_rebuild_instance_with_claim

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] self._do_rebuild_instance(

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] File "/opt/stack/nova/nova/compute/manager.py", line 3640, in _do_rebuild_instance

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] self._rebuild_default_impl(**kwargs)

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] File "/opt/stack/nova/nova/compute/manager.py", line 3277, in _rebuild_default_impl

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] self.driver.spawn(context, instance, image_meta, injected_files,

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4150, in spawn

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] self._create_guest_with_network(

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 7214, in _create_guest_with_network

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] raise exception.VirtualInterfaceCreateException()

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed

May 14 16:30:35.248552 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: ERROR nova.compute.manager [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9]

Neutron successfully sent the vif-plugged event and nova routes it to both the source and the dest host:

https://zuul.opendev.org/t/openstack/build/4d0d5072c1e0479db616a211e0afda42/log/controller/logs/screen-n-api.txt#9208

May 14 16:25:26.649557 ubuntu-focal-ovh-bhs1-0024684290 <email address hidden>[85904]: DEBUG nova.compute.api [req-dd0a66c4-313b-498f-adb3-02d9d434099a req-236f9daf-5e2f-44ec-be84-ebb58b2ab9ef service nova] Instance 021396f6-40ff-434c-acbe-7092a6b1bcd9 is migrating within the same cell, copying events to all relevant hosts: {'ubuntu-focal-ovh-bhs1-0024684292', 'ubuntu-focal-ovh-bhs1-0024684290'} {{(pid=85904) _get_relevant_hosts /opt/stack/nova/nova/compute/api.py:5584}}

Then the dest node nova-compute ignores that event

https://zuul.opendev.org/t/openstack/build/4d0d5072c1e0479db616a211e0afda42/log/controller/logs/screen-n-cpu.txt#12632

May 14 16:25:26.671999 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: WARNING nova.compute.manager [req-dd0a66c4-313b-498f-adb3-02d9d434099a req-236f9daf-5e2f-44ec-be84-ebb58b2ab9ef service nova] [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] Received unexpected event network-vif-plugged-51f0b6e0-9401-4aaa-872d-e5ac5dda8f2b for instance with vm_state error and task_state rebuilding.

Then 300 sec later nova times out waiting for the vif-plugged event.

---

It seems that the vif-plugged event received right after nova bound the port to the host:

May 14 16:25:25.570570 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: INFO nova.network.neutron [None req-778f8336-8e29-4282-a112-91a876303fe3 demo admin] [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] Updating port 51f0b6e0-9401-4aaa-872d-e5ac5dda8f2b with attributes {'binding:host_id': 'ubuntu-focal-ovh-bhs1-0024684290', 'device_owner': 'compute:nova', 'binding:profile': {}}

May 14 16:25:26.386630 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: DEBUG oslo_concurrency.lockutils [None req-778f8336-8e29-4282-a112-91a876303fe3 demo admin] Acquired lock "refresh_cache-021396f6-40ff-434c-acbe-7092a6b1bcd9" {{(pid=94366) lock /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:266}}

May 14 16:25:26.386929 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: DEBUG nova.network.neutron [None req-778f8336-8e29-4282-a112-91a876303fe3 demo admin] [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] Building network info cache for instance {{(pid=94366) _get_instance_nw_info /opt/stack/nova/nova/network/neutron.py:1854}}

May 14 16:25:26.440967 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: DEBUG nova.compute.manager [req-409a631d-3bde-445b-af9f-c933efbee1ae req-098121aa-ffef-49d4-bc3c-889966398693 service nova] [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] Received event network-changed-51f0b6e0-9401-4aaa-872d-e5ac5dda8f2b {{(pid=94366) external_instance_event /opt/stack/nova/nova/compute/manager.py:10350}}

May 14 16:25:26.441196 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: DEBUG nova.compute.manager [req-409a631d-3bde-445b-af9f-c933efbee1ae req-098121aa-ffef-49d4-bc3c-889966398693 service nova] [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] Refreshing instance network info cache due to event network-changed-51f0b6e0-9401-4aaa-872d-e5ac5dda8f2b. {{(pid=94366) external_instance_event /opt/stack/nova/nova/compute/manager.py:10355}}

May 14 16:25:26.670969 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: DEBUG nova.compute.manager [req-dd0a66c4-313b-498f-adb3-02d9d434099a req-236f9daf-5e2f-44ec-be84-ebb58b2ab9ef service nova] [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] Received event network-vif-plugged-51f0b6e0-9401-4aaa-872d-e5ac5dda8f2b {{(pid=94366) external_instance_event /opt/stack/nova/nova/compute/manager.py:10350}}

May 14 16:25:26.671346 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: DEBUG oslo_concurrency.lockutils [req-dd0a66c4-313b-498f-adb3-02d9d434099a req-236f9daf-5e2f-44ec-be84-ebb58b2ab9ef service nova] Lock "021396f6-40ff-434c-acbe-7092a6b1bcd9-events" acquired by "nova.compute.manager.InstanceEvents.pop_instance_event.<locals>._pop_event" :: waited 0.000s {{(pid=94366) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:355}}

May 14 16:25:26.671346 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: DEBUG oslo_concurrency.lockutils [req-dd0a66c4-313b-498f-adb3-02d9d434099a req-236f9daf-5e2f-44ec-be84-ebb58b2ab9ef service nova] Lock "021396f6-40ff-434c-acbe-7092a6b1bcd9-events" released by "nova.compute.manager.InstanceEvents.pop_instance_event.<locals>._pop_event" :: held 0.000s {{(pid=94366) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:367}}

May 14 16:25:26.671589 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: DEBUG nova.compute.manager [req-dd0a66c4-313b-498f-adb3-02d9d434099a req-236f9daf-5e2f-44ec-be84-ebb58b2ab9ef service nova] [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] No waiting events found dispatching network-vif-plugged-51f0b6e0-9401-4aaa-872d-e5ac5dda8f2b {{(pid=94366) pop_instance_event /opt/stack/nova/nova/compute/manager.py:317}}

May 14 16:25:26.671999 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: WARNING nova.compute.manager [req-dd0a66c4-313b-498f-adb3-02d9d434099a req-236f9daf-5e2f-44ec-be84-ebb58b2ab9ef service nova] [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] Received unexpected event network-vif-plugged-51f0b6e0-9401-4aaa-872d-e5ac5dda8f2b for instance with vm_state error and task_state rebuilding.

which was _before_ nova actually called vif.plug() and started waiting for the plugged event.
https://zuul.opendev.org/t/openstack/build/4d0d5072c1e0479db616a211e0afda42/log/controller/logs/screen-n-api.txt#9208

May 14 16:25:32.530500 ubuntu-focal-ovh-bhs1-0024684290 nova-compute[94366]: DEBUG nova.compute.manager [None req-778f8336-8e29-4282-a112-91a876303fe3 demo admin] [instance: 021396f6-40ff-434c-acbe-7092a6b1bcd9] Preparing to wait for external event network-vif-plugged-51f0b6e0-9401-4aaa-872d-e5ac5dda8f2b {{(pid=94366) prepare_for_instance_event /opt/stack/nova/nova/compute/manager.py:280}}

Changed in nova:
status: New → Triaged
assignee: nobody → Balazs Gibizer (balazs-gibizer)
importance: Undecided → Medium
tags: added: evacuate gate-failure
summary: - evacuation testing in nova-live-migration job is unstable due to libvirt
- restart
+ positive evacuation testing in nova-live-migration fails with
+ VirtualInterfaceCreateException
description: updated
description: updated
description: updated
Changed in nova:
status: Triaged → New
description: updated
summary: - positive evacuation testing in nova-live-migration fails with
+ evacuation testing in nova-live-migration post hook fails with
VirtualInterfaceCreateException
description: updated
description: updated
description: updated
summary: - evacuation testing in nova-live-migration post hook fails with
- VirtualInterfaceCreateException
+ evacuation tests in nova-live-migration post hook fails with
+ VirtualInterfaceCreateException due to vif-plugged event received before
+ nova starts waiting for it.
Changed in nova:
assignee: Balazs Gibizer (balazs-gibizer) → nobody
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Based on a discussion[1] with Sean on #openstack-nova there is a set of neutron fixes that can be enabled with [nova]/live_migration_events that might fix these as well. Sean will propose a fix that changes the default of that config to True

[1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2021-05-19.log.html#t2021-05-19T10:19:51

Changed in nova:
status: New → Triaged
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

This looks like a duplicate for bug 1901707

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.