[Neutron] Duplicate tags lead to openvswitch-agent failure

Bug #1756897 reported by Alexander Rubtsov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
New
Undecided
Unassigned
9.x
Incomplete
High
Alexander Rubtsov

Bug Description

MOS: 9.2 (build 604)

The issue is observed when several OVS ports have the same tag.
For example:
_uuid : fd288042-c080-4204-b098-378fefed5e4d
bond_active_slave : []
bond_downdelay : 0
bond_fake_iface : false
bond_mode : []
bond_updelay : 0
external_ids : {}
fake_bridge : false
interfaces : [aff46edd-aff1-42d9-b3e8-d4821dd9231f]
lacp : []
mac : []
name : "vhuefaf8b3f-ef"
other_config : {net_uuid="22a06ed0-f7d8-48f2-b789-3d5f0d6c7e07", network_type=vlan, physical_network=default, segmentation_id="78", tag="2"}
qos : []
rstp_statistics : {}
rstp_status : {}
statistics : {}
status : {}
tag : 2
trunks : []
vlan_mode : []

_uuid : 2e3194b5-a207-409a-af5f-a0c7dd73480b
bond_active_slave : []
bond_downdelay : 0
bond_fake_iface : false
bond_mode : []
bond_updelay : 0
external_ids : {}
fake_bridge : false
interfaces : [002dfdbe-c785-4632-9352-9a62179b33cd]
lacp : []
mac : []
name : "vhu4159bd5d-38"
other_config : {net_uuid="81ae691a-ebf7-44eb-9151-43b01f385bdf", network_type=vlan, physical_network=default, segmentation_id="148", tag="2"}
qos : []
rstp_statistics : {}
rstp_status : {}
statistics : {}
status : {}
tag : 2
trunks : []
vlan_mode : []

In this case, openvswitch-agent is unable to start and throws the following traceback:
2018-03-15T04:17:03.106379+01:00 compute-0-2.domain.tld neutron-openvswitch-agent[25638]: 2018-03-15 04:17:03.105 25638 CRITICAL neutron [req-2dfb8bf8-57e7-46b4-8a2a-cced2462f63a - - - - -] KeyError: 2
2018-03-15 04:17:03.105 25638 ERROR neutron Traceback (most recent call last):
2018-03-15 04:17:03.105 25638 ERROR neutron File "/usr/bin/neutron-openvswitch-agent", line 10, in <module>
2018-03-15 04:17:03.105 25638 ERROR neutron sys.exit(main())
2018-03-15 04:17:03.105 25638 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/cmd/eventlet/plugins/ovs_neutron_agent.py", line 20, in main
2018-03-15 04:17:03.105 25638 ERROR neutron agent_main.main()
2018-03-15 04:17:03.105 25638 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/main.py", line 49, in main
2018-03-15 04:17:03.105 25638 ERROR neutron mod.main()
2018-03-15 04:17:03.105 25638 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/ovs_ofctl/main.py", line 37, in main
2018-03-15 04:17:03.105 25638 ERROR neutron ovs_neutron_agent.main(bridge_classes)
2018-03-15 04:17:03.105 25638 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2421, in main
2018-03-15 04:17:03.105 25638 ERROR neutron agent = OVSNeutronAgent(bridge_classes, cfg.CONF)
2018-03-15 04:17:03.105 25638 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 384, in __init__
2018-03-15 04:17:03.105 25638 ERROR neutron self._restore_local_vlan_map()
2018-03-15 04:17:03.105 25638 ERROR neutron File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 499, in _restore_local_vlan_map
2018-03-15 04:17:03.105 25638 ERROR neutron self.available_local_vlans.remove(local_vlan)
2018-03-15 04:17:03.105 25638 ERROR neutron KeyError: 2

It might be related to these upstream bugs:
https://bugs.launchpad.net/neutron/+bug/1625305
https://bugs.launchpad.net/neutron/+bug/1526974

I'm not allowed to attach the log files here.
Please contact me directly if you need to review the entire output of "ovs-vsctl list Port" and openvswitch-agent.log

Revision history for this message
Alexander Rubtsov (arubtsov) wrote :

sla1 for 9.0-updates

tags: added: customer-found sla1
Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Please provide steps to reproduce.

Revision history for this message
Sasikiran (sasikiran-vaddi) wrote :

The issue is not observed frequently, I would like to add some more information.

Say for example there is one vm, vm1 which is hosting on compute-A which belongs to network net1 and another vm, vm2 from net2 is hosting on compute-A. If the tap devices associated to these two vm holds same tag value then neutron-openswitch-agent will fall into repeated restart due to failure in below validation[1].

From the above example, when a port which is looped for vm1 from net1 then it pass this validation[2] and removes vlan from available_local_vlans and updates _local_vlan_hints with this network uuid and tag. On the next iternation for the other network net2 of vm2 will pass this validation[2] and executes self.available_local_vlans.remove(local_vlan) which will fail because this vlan/tag is already deleted when vm1 is iterated. So, this has triggered a ERROR if incase of two vms from different networks holds same tag value.

[1]: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L354-L358
[2]: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L354-L355

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.