Neutron deletes port in use and nova errors out when cleaning the VM XML

Bug #1977485 reported by Andrei Tira
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

We recently upgraded to OpenStack XENA on a Kolla deployment and we hit this issue

Neutron is able to delete a port in use and Nova errors out when trying to clean up the VM XML (we are talking about Windows VMs on KVM hypervisor).

We need to move the ports from some VMs to a different subnet. We tried the following procedure:
remove the port of a VM
create a new port on the new subnet
attach the new port to the VM

Result:
We have a VM with no network connectivity.
The port gets deleted from OpenStack, and from OVS as well, but in the VM XML we see this:
<interface type='bridge'>
      <mac address='00:16:3c:7b:2c:1c'/>
      <source bridge='br-int'/>
      <virtualport type='openvswitch'>
        <parameters interfaceid='fb15ad83-bf28-455d-a1b1-14158203b4bf'/>
      </virtualport>
      <target dev='tapfb15ad83-bf'/>
      <model type='virtio'/>
      <mtu size='1500'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='ethernet'>
      <mac address='00:16:3c:7b:2c:1c'/>
      <target dev='tapba91ea48-36'/>
      <model type='virtio'/>
      <mtu size='1500'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </interface>

The bridge part represents the old interface (the interface of the old deleted port) and the ethernet part is the new port.
We also tried to use:
virsh detach-interface in order to remove the stale interface from the XML, the command said it was completed successfully but the interface is still there.
We noticed that rebooting the VM cleans the XML file and the connectivity is back (this is not the desired solution)

In the logs we see:
When we delete the old port:
2022-05-31 12:13:29.935 7 INFO nova.compute.manager [req-1bf9e960-fc99-453f-8bf9-cd6a76c12feb 9879764509c84ca58d054fc3b9575df6 24783cb241264363ad1b8808ba21c131 - default default] [instance: b6c60a66-e571-4d50-984a-101dcb29f6aa] Neutron deleted interface fb15ad83-bf28-455d-a1b1-14158203b4bf; detaching it from the instance and deleting it from the info cache
2022-05-31 12:13:30.076 7 WARNING nova.virt.libvirt.driver [req-1bf9e960-fc99-453f-8bf9-cd6a76c12feb 9879764509c84ca58d054fc3b9575df6 24783cb241264363ad1b8808ba21c131 - default default] [instance: b6c60a66-e571-4d50-984a-101dcb29f6aa] Detaching interface 00:16:3c:7b:2c:1c failed because the device is no longer found on the guest.: nova.exception.DeviceNotFound: Device 'tapfb15ad83-bf' not found.
2022-05-31 12:13:30.740 7 INFO os_vif [req-1bf9e960-fc99-453f-8bf9-cd6a76c12feb 9879764509c84ca58d054fc3b9575df6 24783cb241264363ad1b8808ba21c131 - default default] Successfully unplugged vif VIFOpenVSwitch(active=True,address=00:16:3c:7b:2c:1c,bridge_name='br-int',has_traffic_filtering=True,id=fb15ad83-bf28-455d-a1b1-14158203b4bf,network=Network(b03631f6-6fa7-4ff3-97e6-0a3bd077fac3),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=True,vif_name='tapfb15ad83-bf')

When we attach the new port:
2022-05-31 12:20:16.427 7 WARNING nova.compute.manager [req-f42820d6-1c70-428a-9c2d-305737838bfc 9879764509c84ca58d054fc3b9575df6 24783cb241264363ad1b8808ba21c131 - default default] [instance: b6c60a66-e571-4d50-984a-101dcb29f6aa] Received unexpected event network-vif-plugged-ba91ea48-3676-4934-87ba-1ad4cf80b1bc for instance with vm_state active and task_state None.
2022-05-31 12:20:19.188 7 WARNING nova.compute.manager [req-305324c7-c25b-44a7-96cd-a8cc84284727 9879764509c84ca58d054fc3b9575df6 24783cb241264363ad1b8808ba21c131 - default default] [instance: b6c60a66-e571-4d50-984a-101dcb29f6aa] Received unexpected event network-vif-plugged-ba91ea48-3676-4934-87ba-1ad4cf80b1bc for instance with vm_state active and task_state None.

We found that the following workaround works:
we use virsh detach-interface while the old port of the VM exists (before we delete it)
then we delete the old port
after that, we attach the new port
This works as expected and the VM has network connectivity.

Tags: nova
Andrei Tira (atira)
summary: - Neutron deletes port and nova errors out
+ Neutron deletes port and nova errors out when cleaning the VM XML
summary: - Neutron deletes port and nova errors out when cleaning the VM XML
+ Neutron deletes port in use and nova errors out when cleaning the VM XML
Revision history for this message
Brian Haley (brian-haley) wrote :

I'm not sure this is a neutron issue, as you are allowed to delete a port even if it is in use.

It seems like the issue is with Nova, but before re-assigning, have you tried removing the port via the CLI?

# openstack server remove port <server> <port>

That will probably do the right thing. There is also an 'add port'.

Changed in neutron:
status: New → Incomplete
Revision history for this message
Phil Evans (philthinkhuge) wrote :
Download full text (4.6 KiB)

I have been working with our consultants and can supply a bit more detail here. The important thing to note is that (I believe because of a switch from OVS networking to OVN) *older* servers that have been up for at least a couple of months, of which there are many, have a networking node in their XML of type "bridge", whereas newer servers have a networking node of type "ethernet". That difference is the whole root cause of the issue.

So we are on OVN at the moment, and new VMs are created as "ethernet" interfaces. When I remove an older VM's port through the API, no errors come back and the port is removed in the database, and everything seems to be ok. However if you look at the instance XML, it is still there:

   </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <interface type='bridge'>
      <mac address='00:16:3c:6d:ab:b5'/>
      <source bridge='br-int'/>
      <virtualport type='openvswitch'>
        <parameters interfaceid='85595837-a555-4404-a0fc-9e65bb4be84a'/>
      </virtualport>
      <target dev='tap85595837-a5'/>
      <model type='virtio'/>
      <mtu size='1500'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/59'/>
      <log file='/var/lib/nova/instances/d471c93e-d4ac-46a8-9381-a78f2cf5b3f5/console.log' append='off'/>
      <target type='isa-serial' port='0'>

As noted in the original comment, the logs do show that it had trouble finding the interface to delete it from the instance.

If you then add back a port with the same MAC address, you now end up with both a "bridge" and an "ethernet" interface:

   <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <interface type='bridge'>
      <mac address='00:16:3c:6d:ab:b5'/>
      <source bridge='br-int'/>
      <virtualport type='openvswitch'>
        <parameters interfaceid='85595837-a555-4404-a0fc-9e65bb4be84a'/>
      </virtualport>
      <target dev='tap85595837-a5'/>
      <model type='virtio'/>
      <mtu size='1500'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='ethernet'>
      <mac address='00:16:3c:6d:ab:b5'/>
      <target dev='tape9957b4a-9a'/>
      <model type='virtio'/>
      <mtu size='1500'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/59'/>
      <log file='/var/lib/nova/instances/...

At this point, if you try and remove that port through the API, again there are no errors, and it seems as though it was removed successfully, however the port is still attached both in the instance XML *and* according to the API, and the port is now impossible to remove, no matter how many times you try and remove it.

All these problems are solved by rebooting the server, which of course re-writes fresh XML and of course would now track properly with how it should be.

Our current workarou...

Read more...

Revision history for this message
Brian Haley (brian-haley) wrote :

Thanks for the info, I'll re-assign to Nova component based on that.

affects: neutron → nova
Chris Valean (cvalean)
Changed in nova:
status: Incomplete → New
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

@Phil: From the discussion it is not clear to me that you see the same issue if you use
 $ openstack server remove port <server> <port>
instead of just deleting the port in neutron. Could you clarify this please?

Changed in nova:
status: New → Incomplete
Revision history for this message
Phil Evans (philthinkhuge) wrote :

You'll have to forgive me as I am not a particular expert with OpenStack, but I never did anything directly with Neutron as far as I know. The issue was always when using

 $ openstack server remove port <server> <port>

from the command line as you said, and/or using neutron client SDK in Python:

 neutc.delete_port(<port id>)

They both produce the same result of the failure to actually remove the interface from the KVM instance.

Revision history for this message
Phil Evans (philthinkhuge) wrote :

Oh actually, sorry, I see what you mean. I cannot actually remember if I tried using the remove port command, I think I only deleted the port.

Unfortunately because it only happens on older VMs, I am actually running out of VMs that I can mess with in terms of testing, I'll see if I can find another older VM and see if just detaching produces the same result.

Revision history for this message
Phil Evans (philthinkhuge) wrote :

I have just tested this, and I can confirm the issue also happens when doing "openstack server remove port":

    <interface type='bridge'>
      <mac address='00:16:3c:c8:fb:9c'/>
      <source bridge='br-int'/>
      <virtualport type='openvswitch'>
        <parameters interfaceid='b0dc50f4-7715-4857-9de1-074cf040098a'/>
      </virtualport>
      <target dev='tapb0dc50f4-77'/>
      <model type='virtio'/>
      <mtu size='1500'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='ethernet'>
      <mac address='00:16:3c:c8:fb:9c'/>
      <target dev='tapb0dc50f4-77'/>
      <model type='virtio'/>
      <mtu size='1500'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </interface>

That is after removing the port from the VM, and then re-adding the exact same port again. Same results as with deleting the port.

Chris Valean (cvalean)
Changed in nova:
status: Incomplete → New
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

Looks legit based on comment #2 but I think we would need a reproducer against master.
In order to verify, could you please tell us whether this is with OVN and/or OVS ?

Ideally, a brief explanation of the tested environment would help us to double-check whether the issue is present with a Devstack setup.

Changed in nova:
status: New → Incomplete
Revision history for this message
Chris Valean (cvalean) wrote (last edit ):

As a note to comment #2 to clarify, the OVN with OVS setup has been since deployment, we're not talking about an in-place switch from ml2 to ovn.

This is a kolla-based OpenStack deployment, it is using OVN with OVS. Each compute holds the OVN controller role, as the instances are getting direct connection to outside.
Please let me know what other information are required, unfortunately we don't have a test lab to match production where we can reproduce this and test with master.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.