Tempest test test_resize_server_revert: failed to build and is in ERROR status: Virtual Interface creation failed

Bug #1998110 reported by Anton Kurbatov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

In my CI run I got an error in test_resize_server_revert test case [1]

{3} tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert [401.454625s] ... FAILED

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):

      File "/opt/stack/tempest/tempest/api/compute/servers/test_server_actions.py", line 430, in test_resize_server_revert
    waiters.wait_for_server_status(self.client, self.server_id, 'ACTIVE')

      File "/opt/stack/tempest/tempest/common/waiters.py", line 101, in wait_for_server_status
    raise lib_exc.TimeoutException(message)

    tempest.lib.exceptions.TimeoutException: Request timed out
Details: (ServerActionsTestJSON:test_resize_server_revert) Server e69e6d33-c494-415a-9cb8-b597af2ea052 failed to reach ACTIVE status and task state "None" within the required time (196 s). Current status: REVERT_RESIZE. Current task state: resize_reverting.

Captured traceback-1:
~~~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):

      File "/opt/stack/tempest/tempest/api/compute/base.py", line 228, in server_check_teardown
    waiters.wait_for_server_status(cls.servers_client,

      File "/opt/stack/tempest/tempest/common/waiters.py", line 81, in wait_for_server_status
    raise exceptions.BuildErrorException(details, server_id=server_id)

    tempest.exceptions.BuildErrorException: Server e69e6d33-c494-415a-9cb8-b597af2ea052 failed to build and is in ERROR status
Details: Fault: {'code': 500, 'created': '2022-11-23T21:46:15Z', 'message': 'Virtual Interface creation failed'}.

The test checks the following:
1) resize to new flavor;
2) wait for a VM VERIFY_RESIZE status;
3) revert a resizing;
4) wait for a VM ACTIVE status < I got fail here.

The test did a resize with a change of node:
VM on the node 0032209120 -> resize VM, new VM node is 0032209122 -> revert resize

The `resize revert` (p3) started here:
Nov 23 21:41:05.514686 ubuntu-jammy-rax-dfw-0032209120 <email address hidden>[54681]: DEBUG nova.api.openstack.wsgi [None req-83266751-d6d9-4a35-89fc-b4c97c1b481d tempest-ServerActionsTestJSON-1939410532 tempest-ServerActionsTestJSON-1939410532-project] Action: 'action', calling method: <bound method ServersController._action_revert_resize of <nova.api.openstack.compute.servers.ServersController object at 0x7f4a793df3d0>>, body: {"revertResize": {}} {{(pid=54681) _process_stack /opt/stack/nova/nova/api/openstack/wsgi.py:511}}

The Nova got unexpected event network-vif-plugged:
Nov 23 21:41:12.404453 ubuntu-jammy-rax-dfw-0032209122 nova-compute[31414]: WARNING nova.compute.manager [req-b389f403-c195-4fa0-b578-7b687f85b79d req-c9eab04d-708d-4666-b4ce-f7bb760c7aa6 service nova] [instance: e69e6d33-c494-415a-9cb8-b597af2ea052] Received unexpected event network-vif-plugged-775d8945-1367-4e08-8306-9c683e1891cf for instance with vm_state resized and task_state resize_reverting.

The Nova is preparing to receive network-vif-plugged notification:
Nov 23 21:41:13.497369 ubuntu-jammy-rax-dfw-0032209122 nova-compute[31414]: DEBUG nova.compute.manager [None req-83266751-d6d9-4a35-89fc-b4c97c1b481d tempest-ServerActionsTestJSON-1939410532 tempest-ServerActionsTestJSON-1939410532-project] [instance: e69e6d33-c494-415a-9cb8-b597af2ea052] Preparing to wait for external event network-vif-plugged-775d8945-1367-4e08-8306-9c683e1891cf {{(pid=31414) prepare_for_instance_event /opt/stack/nova/nova/compute/manager.py:281}}

So, there is an unexpected network-vif-plugged event.
I believe that the trigger of this event is the `resize` operation from p1: The Nova does not wait for network interfaces to be plugged when resizing a VM (vifs_already_plugged=True) and a VM can switch to the VERIFY_RESIZE status without waiting for the port processing by Neutron [2]

At the same time on the Newtron server side:

Binding the port to the node 0032209120 in `resize` operation (p1):
Nov 23 21:40:57.981780 ubuntu-jammy-rax-dfw-0032209120 neutron-server[55724]: DEBUG neutron.api.v2.base [req-b1a98064-7e8e-4ad3-84cf-09e3bf12727e req-036d315c-21e6-47d5-be1d-44a4efc8a3e9 service neutron] Request body: {'port': {'binding:host_id': 'ubuntu-jammy-rax-dfw-0032209120', 'device_owner': 'compute:nova'}} {{(pid=55724) prepare_request_body /opt/stack/neutron/neutron/api/v2/base.py:731}}

Binding the port to the node 0032209122 in `resize revert` operation (p3):
Nov 23 21:41:10.832391 ubuntu-jammy-rax-dfw-0032209120 neutron-server[55723]: DEBUG neutron.api.v2.base [req-83266751-d6d9-4a35-89fc-b4c97c1b481d req-268a8b14-b6b9-438d-bc3f-446f5eaad88d service neutron] Request body: {'port': {'binding:host_id': 'ubuntu-jammy-rax-dfw-0032209122', 'device_owner': 'compute:nova'}} {{(pid=55723) prepare_request_body /opt/stack/neutron/neutron/api/v2/base.py:731}}

Provisioning completed by L2 from `resize` operation (p1):
Nov 23 21:41:10.950190 ubuntu-jammy-rax-dfw-0032209120 neutron-server[55725]: DEBUG neutron.db.provisioning_blocks [None req-793235de-b92d-459f-b016-a3d9ba1a1ddd None None] Provisioning complete for port 775d8945-1367-4e08-8306-9c683e1891cf triggered by entity L2. {{(pid=55725) provisioning_complete

Port is bound to the node 0032209122 from `resize revert` operation (p3):
Nov 23 21:41:11.796762 ubuntu-jammy-rax-dfw-0032209120 neutron-server[55723]: DEBUG neutron.plugins.ml2.managers [req-83266751-d6d9-4a35-89fc-b4c97c1b481d req-268a8b14-b6b9-438d-bc3f-446f5eaad88d service neutron] Bound port: 775d8945-1367-4e08-8306-9c683e1891cf, host: ubuntu-jammy-rax-dfw-0032209122, vif_type: ovs, vif_details: {"connectivity": "l2", "port_filter": true, "ovs_hybrid_plug": false, "datapath_type": "system", "bridge_name": "br-int"}, binding_levels: [{'bound_driver': 'openvswitch', 'bound_segment': {'id': '40023962-2c7c-4acb-8356-9875b42b7488', 'network_type': 'vxlan', 'physical_network': None, 'segmentation_id': 402, 'network_id': '62c9aca8-6dce-42d3-a1f6-613e4da77967'}}] {{(pid=55723) _bind_port_level /opt/stack/neutron/neutron/plugins/ml2/managers.py:947}}

Switch to ACTIVE state and send notification to the Nova (triggered by the `resize` operation p1):
Nov 23 21:41:12.158179 ubuntu-jammy-rax-dfw-0032209120 neutron-server[55725]: DEBUG neutron.plugins.ml2.plugin [None req-793235de-b92d-459f-b016-a3d9ba1a1ddd None None] Current status of the port 775d8945-1367-4e08-8306-9c683e1891cf is: DOWN; New status is: ACTIVE {{(pid=55725) _update_individual_port_db_status /opt/stack/neutron/neutron/plugins/ml2/plugin.py:2330}}
Nov 23 21:41:12.293154 ubuntu-jammy-rax-dfw-0032209120 neutron-server[55725]: DEBUG neutron.notifiers.nova [-] Sending events: [{'server_uuid': 'e69e6d33-c494-415a-9cb8-b597af2ea052', 'name': 'network-vif-plugged', 'status': 'completed', 'tag': '775d8945-1367-4e08-8306-9c683e1891cf'}] {{(pid=55725) send_events /opt/stack/neutron/neutron/notifiers/nova.py:279}}

Provisioning completed by L2 from `resize revert` operation (p3):
Nov 23 21:41:15.004731 ubuntu-jammy-rax-dfw-0032209120 neutron-server[55725]: DEBUG neutron.db.provisioning_blocks [None req-7b301e03-5547-4f5c-967f-3fa6a79a1c22 None None] Provisioning complete for port 775d8945-1367-4e08-8306-9c683e1891cf triggered by entity L2. {{(pid=55725) provisioning_complete

A new notification is not sent because the port is already in ACTIVE status:
Nov 23 21:41:15.552453 ubuntu-jammy-rax-dfw-0032209120 neutron-server[55725]: DEBUG neutron.plugins.ml2.plugin [None req-7b301e03-5547-4f5c-967f-3fa6a79a1c22 None None] Current status of the port 775d8945-1367-4e08-8306-9c683e1891cf is: ACTIVE; New status is: ACTIVE {{(pid=55725) _update_individual_port_db_status

[1] https://zuul.opendev.org/t/openstack/build/45b7e1e6af8e4672ae80f1fcef1fb781
[2] https://opendev.org/openstack/nova/src/commit/2d774541c549bd43bff6cafba36d2351d4649a2f/nova/virt/libvirt/driver.py#L11627

Revision history for this message
Anton Kurbatov (akurbatov) wrote :

I managed to reproduce the issue on my devstack setup:
1) Create a VM and migrate it to another host, so during the resize operation the Nova scheduler most likely will change the VM host.
2) Emulate L2 port processing delay. Sleep time can be individual depending on how fast the resize operation takes place at your stand:

diff --git a/neutron/db/provisioning_blocks.py b/neutron/db/provisioning_blocks.py
index cb4392c93d..24f8318f79 100644
--- a/neutron/db/provisioning_blocks.py
+++ b/neutron/db/provisioning_blocks.py
@@ -131,6 +131,11 @@ def provisioning_complete(context, object_id, object_type, entity):
                                                  object_type)
         if not standard_attr_id:
             return
+ import time
+ import random
+ time.sleep(random.randint(500, 700)/100.)
         if remove_provisioning_component(context, object_id, object_type,
                                          entity, standard_attr_id):

3) Run the resize_revert.sh script:

[root@node0 ~]# COUNT=100 SERVER_ID=3db874ed-fb2f-40cc-810e-fcb14a18dee6 ./resize_revert.sh
Iter 1
VERIFY_RESIZE
ACTIVE
...
Iter 59
VERIFY_RESIZE
Wait timeout
[root@node0 ~]#

Revision history for this message
Artom Lifshitz (notartom) wrote :

Which version of Nova is this? There's currently a fix for revert resize vif plugging beind backported here:

https://review.opendev.org/q/I3cb39a9ec2c260f422b3c48122b9db512cdd799b

Changed in nova:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.