Race condition during VM creation - could not open network device tapXXX (No such device)

Bug #2051863 reported by Jan Wasilewski
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
New
Undecided
Unassigned
neutron
New
Medium
Unassigned

Bug Description

[High-Level Description]
While creating an amphora from the Octavia service, we are encountering a race condition. Nova-compute is unable to add a QC ingress rule because the tap interface does not exist yet. This situation results in the service becoming temporarily unavailable, failing in approximately 50% of cases.

[Pre-conditions]
Create a simple octavia loadbalancer with environment deployed by kolla-environment. For amphora server, use flavor with qos properties as from example: https://paste.openstack.org/show/bkx3RgEFqUwxCU0Qr8z2/

[Step-by-Step Reproduction]
Create a load balancer as part of a Heat stack, following the example provided at: https://github.com/syseleven/heat-examples/tree/master/lbaas-octavia-http
Use a qos flavor as shown in example: https://paste.openstack.org/show/bkx3RgEFqUwxCU0Qr8z2/

[Expected Output]
Expect 100% successful creation of the amphora and subsequent Nova instances serving as members for the load balancer.

[Actual Output]
During the creation process, it is observed that nova-compute fails due to:
/var/log/kolla/libvirt/libvirtd.log:2024-01-31 09:24:55.852+0000: 4065628: error : virCommandWait:2748 : internal error: Child process (tc filter add dev tap5f365c23-89 parent ffff: protocol all u32 match u32 0 0 police rate 64000kbps burst 64000kb mtu 64kb drop flowid :1) unexpected exit status 2: Error: Parent Qdisc doesn't exists.
/var/log/kolla/nova/nova-compute.log:2024-01-31 10:24:56.086 7 ERROR nova.virt.libvirt.guest libvirt.libvirtError: internal error: Child process (tc filter add dev tap5f365c23-89 parent ffff: protocol all u32 match u32 0 0 police rate 64000kbps burst 64000kb mtu 64kb drop flowid :1) unexpected exit status 2: Error: Parent Qdisc doesn't exists.

Unfortunately ovs-vswitchd.log file says that tap interface was not created at this point:
/var/log/kolla/openvswitch/ovs-vswitchd.log:2024-01-31T09:24:55.469Z|00094|bridge|WARN|could not open network device tap5f365c23-89 (No such device)
/var/log/kolla/openvswitch/ovs-vswitchd.log:2024-01-31T09:24:55.809Z|00095|bridge|INFO|bridge br-int: added interface tap5f365c23-89 on port 27
/var/log/kolla/openvswitch/ovs-vswitchd.log:2024-01-31T09:24:55.885Z|00096|bridge|INFO|bridge br-int: deleted interface tap5f365c23-89 on port 27
/var/log/kolla/openvswitch/ovs-vswitchd.log:2024-01-31T09:24:55.888Z|00097|bridge|WARN|could not open network device tap5f365c23-89 (No such device)
/var/log/kolla/openvswitch/ovs-vswitchd.log:2024-01-31T09:24:55.927Z|00098|bridge|WARN|could not open network device tap5f365c23-89 (No such device)
/var/log/kolla/openvswitch/ovs-vswitchd.log:2024-01-31T09:24:55.934Z|00099|bridge|WARN|could not open network device tap5f365c23-89 (No such device)
/var/log/kolla/openvswitch/ovs-vswitchd.log:2024-01-31T09:24:55.943Z|00100|bridge|WARN|could not open network device tap5f365c23-89 (No such device)
/var/log/kolla/openvswitch/ovs-vswitchd.log:2024-01-31T09:24:56.024Z|00101|bridge|WARN|could not open network device tap5f365c23-89 (No such device)

[Version]
OpenStack Zed, deployed by kolla-ansible with all defaults
Ubuntu 22.04 LTS
libvirt: 8.0.0-1ubuntu7.7
nova: zed
python3-openvswitch: 3.0.3-0ubuntu0.22.10.3~cloud3
neutron: zed

description: updated
Revision history for this message
LIU Yulong (dragon889) wrote :

Looks more like a libvirt error, or nova side problem. Neutron does not take responsibilities to create the tap-XXX device. It is plugged by nova-compute. Need to find out why the tap device is not created before TC rules creating.

Revision history for this message
Jan Wasilewski (janwasilewski) wrote :

I have recently opened a ticket with Neutron because I am puzzled by certain operations observed:

/var/log/kolla/openvswitch/ovs-vswitchd.log:2024-01-31T09:24:55.809Z|00095|bridge|INFO|bridge br-int: added interface tap5f365c23-89 on port 27
/var/log/kolla/openvswitch/ovs-vswitchd.log:2024-01-31T09:24:55.885Z|00096|bridge|INFO|bridge br-int: deleted interface tap5f365c23-89 on port 27
/var/log/kolla/openvswitch/ovs-vswitchd.log:2024-01-31T09:24:55.888Z|00097|bridge|WARN|could not open network device tap5f365c23-89 (No such device)

It appears that a port was created and subsequently deleted. At least from the logs, it seems like this was done by OVS. If I am mistaken, and this is entirely the responsibility of the Nova service, then that(transfer to Nova) is acceptable.

Miguel Lavalle (minsel)
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Mina (miash) wrote :

We've encountered a similar issue. Following the upgrade of our OpenStack services from Wallaby to Xena, we're experiencing random occurrences of the error: 'libvirt.libvirtError: internal error: Child process (tc filter add dev tap81c8e58f-5a parent ffff: protocol all u32 match u32 0 0 police rate 128000kbps burst 128000kb mtu 64kb drop flowid :1) unexpected exit status 2: Error: Parent Qdisc doesn't exist.' This problem arises during instance creation, hard reboots, and resizing.

Revision history for this message
Sam Schmitt (samcat116) wrote :

I am also encountering this issue on a 2023.2 deployment with kolla-ansible

Revision history for this message
Sam Schmitt (samcat116) wrote :

This is causing us a lot of issues, combined with the fact that you cannot modify flavors to remove these limits after they are added.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.