neutron agent doesn't remove trunk bridge after nova-compute restart

Bug #1756064 reported by Ivan Dyukov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Invalid
Undecided
Unassigned
os-vif
Invalid
Undecided
Unassigned

Bug Description

env:
backend is openvswitch with DPDK
Version is Ocata

Steps:
Create two networks.
Create two ports for each network
Create trunk port
boot virtual machine with trunk port
Restart nova-compute on compute node: # openstack-service restart openstack-nova-compute
Remove the virtual machine
Check ovs configuration on compute node: ovs-vsctl show

Expected result: there is no trunk bridge e.g. tbr-c4ce71ea-7
Actual result: trunk bridge and services ports are still in ovs configuration. e.g.

    Bridge "tbr-c4ce71ea-7"
        Port "spt-63eb23e7-af"
            tag: 102
            Interface "spt-63eb23e7-af"
                type: patch
                options: {peer="spi-63eb23e7-af"}
        Port "tbr-c4ce71ea-7"
            Interface "tbr-c4ce71ea-7"
                type: internal
        Port "tpt-d6c0e47e-ed"
            Interface "tpt-d6c0e47e-ed"
                type: patch
                options: {peer="tpi-d6c0e47e-ed"}

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This is interesting. I would have expected no relationship between the restart of nova-compute and the neutron OVS agent handling of the trunk. I must admit I am a bit puzzled on this one. Let me do some digging.

Can you confirm the version you're running?

tags: added: trunk
Revision history for this message
Ivan Dyukov (i.dyukov) wrote :

I'm using Ocata.
The root cause of the issue is that nova recreate vhu ports during start up. so all metadata stored on parent trunk port is cleared after restart of nova-compute.

Changed in neutron:
assignee: nobody → Ivan Dyukov (i.dyukov)
Ivan Dyukov (i.dyukov)
description: updated
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Um, iirc the metadata are associated to the trunk bridge, and that doesn't go away. This sounds easy to reproduce. I'll get to it this weekend. Thanks for your patience!

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

@Ivan: you have assigned this to yourself. Are you about to propose a fix?

zhaobo (zhaobo6)
tags: added: ocata-backport-potential
Changed in neutron:
importance: Undecided → High
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I cannot reproduce this on Rocky. [1] shows the steps I followed in order to try and reproduce the issue. Now you mentioned vhu ports, so I suspect this is with DPDK? Is this with QEMU client mode or Server mode? I must admit I don't have a clear picture of the wiring in the presence of DPDK and I don't have an environment where I can reproduce this with DPDK, but I am surprised that nova-compute's restart affects the datapath, which it shouldn't server mode. That said, it is true that the trunk metadata info are stored in the parent-port [2] and if there were to be missing [3], there might be a chance to leave things dangling. We could attempt and store data in the trunk bridge as well, but that might turn out to be inefficient, and that's probably why we left it to the posterity!

That said, I'd appreciate if you could clarify the conditions under which you observed this issue. For now I am going to mark it incomplete.

[1] http://paste.openstack.org/show/703724/
[2] https://github.com/openstack/neutron/blob/master/neutron/services/trunk/drivers/openvswitch/agent/ovsdb_handler.py#L418
[3] https://github.com/openstack/neutron/blob/master/neutron/services/trunk/drivers/openvswitch/agent/ovsdb_handler.py#L209

Changed in neutron:
status: New → Incomplete
importance: High → Undecided
assignee: Ivan Dyukov (i.dyukov) → nobody
Ivan Dyukov (i.dyukov)
description: updated
Revision history for this message
Ivan Dyukov (i.dyukov) wrote :

@Armando: yep, I'm using ovs with dpdk, and qemu is in Server mode.(i.e. -chardev socket,id=charnet1,path=/var/run/openvswitch/vhu69bc5ac0-a8,server).

I don't understand why ovs should keep the metadata(bridge_name, trunk_id, supborts) in externals_ids. This metadata is already stored in ovs. We can easy get bridge name for parent port using "ovs-vsctl port-to-br" command, then we can receive all bridge ports using "ovs-vsctl list-ports", from this list we can easy restore patch ports on integration bridge. Could you please clarify why we need to duplicate the info in ovs db? For performance reason?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Ok, then that makes even less sense. I am not sure I understand why a nova-compute restart blow up the metadata on the port. I would argue that this is not a neutron bug but a nova/os-vif bug where not all metadata are restored upon interface recreation, assumed one is necessary.

As for your last question: yes, storing trunk details in the interface external_ids is done for performance lookup reasons. That avoids many OVSDB queries or even going back to the server. It's build on the fundamental premise that OVSDB is reliable.

Changed in neutron:
status: Incomplete → Invalid
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Adding the os-vif project to see if we can gather more insight into what's going on.

Revision history for this message
Ivan Dyukov (i.dyukov) wrote :

it looks like that it was introduced by following nova change:

commit 33cc64fb81773f0c246073d23c525357c9aa3b08
Author: Aaron Rosen <email address hidden>
Date: Mon Jan 20 15:10:27 2014 -0800

Ivan Dyukov (i.dyukov)
no longer affects: nova
Revision history for this message
Ivan Dyukov (i.dyukov) wrote :
Changed in os-vif:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.