Cisco Nexus VXLAN: incomplete switch configuration when launch multiple VMs simultaneously.

Bug #1399998 reported by Danny Choi
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
networking-cisco
Fix Committed
Medium
Carol Bouchard

Bug Description

Cisco Nexus VXLAN setup:
     Compute-1 and Compute-2 connect to N9K-1, Compute-3 and Controller+Network node connect to N9K-2.

Issue: when launch multiple VMs simultaneously, the Nexus switch is not configured properly. Sometimes missing the VLAN, or the VNI mapping. It is not consistent. Sometimes happen to the 1st switch, sometimes to the 2nd switch.

Steps to reproduce:
1. Fresh reboot both N9K switches.
2. At Controller CLI, launch 10 VMs each with 3 interfaces in different subnet.
3. Check the switches configuration.

Tags: cisco
Revision history for this message
Danny Choi (dannchoi) wrote :
Download full text (14.1 KiB)

It also happens with 10 VMs each with one interface.

Note: it happens consistently after a reload of the Nexus switches.

The following tracebacks are logged in screen-q-svc.log:

 from (pid=30887) _edit_config /opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/nexus_network_driver.py:86
2014-12-06 17:09:26.063 ERROR neutron.plugins.ml2.managers [req-d8c65cd0-138f-4397-8d78-86f5196260ca None None] Mechanism driver 'cisco_nexus' failed in update_port_postcommit
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers Traceback (most recent call last):
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/managers.py", line 299, in _call_on_drivers
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context)
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/mech_cisco_nexus.py", line 378, in update_port_postcommit
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers self._configure_nve_member) if vxlan_segment else 0
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/mech_cisco_nexus.py", line 325, in _port_action_vxlan
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers func(vni, device_id, mcast_group, host_id)
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/mech_cisco_nexus.py", line 135, in _configure_nve_member
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers vni, mcast_group)
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/nexus_network_driver.py", line 308, in create_nve_member
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers self._edit_config(nexus_host, config=confstr)
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/nexus_network_driver.py", line 94, in _edit_config
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers raise cexc.NexusConfigFailed(config=config, exc=e)
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers NexusConfigFailed: Failed to configure Nexus:
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers <config xmlns:xc="urn:ietf:params:xml:ns:netconf:base:1.0">
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers <configure>
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers <__XML__MODE__exec_configure>
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers <interface>
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers <nve>nve1</nve>
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers <__XML__MODE_if-nve>
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers <member>member vni 9000 mcast-group 225.1.1.1</member>
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers </__XML__MODE_if-nve>
2014-12-06 17:09:26.063 TRACE neutron.plugins.ml2.managers </interface>
2014-12-06 17:09:26.063...

Revision history for this message
Puneet Konghot Nair (pkonghot) wrote :

Method nxos_connect (in nexus_network_driver.py) sets up SSH sessions to a switch once (whenever its first needed) and saves the connection state in connections[nexus_host]. It “hopes” that the session will perpetually remain up and hence all subsequent calls to the method use this saved state. There seems to be no way to detect if the SSH session is still alive and hence it is possible that the saved state and real state of the SSH session can go out of sync. In summary, if the SSH session drops for any reason, all subsequent communication with the switch (in _get_config and _edit_config) will lead to this kind of error.

tags: added: cisco
Changed in neutron:
importance: Undecided → Medium
status: New → Confirmed
Changed in neutron:
assignee: nobody → Carol Bouchard (caboucha)
affects: neutron → networking-cisco
Changed in networking-cisco:
status: Confirmed → Fix Committed
Sam Betts (sambetts)
Changed in networking-cisco:
milestone: none → 1.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.