Traceback during VM deletion after a Nexus reboot, with switch_heartbeat disabled

Bug #1430308 reported by Danny Choi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-cisco
Fix Released
Undecided
Carol Bouchard
Kilo
Fix Committed
Undecided
Carol Bouchard
Liberty
Fix Released
Undecided
Carol Bouchard

Bug Description

With switch_heartbeat disabled, a traceback is logged during VM deletion after a Nexus reboot.

Steps to repro:
1. Disable switch_heartbeat.
2. Launch a VM.
3. Verify the Nexus switch is configured correctly.
4. Reboot the Nexus switch.
5. Upon boot up, verify the Nexus switch is not configured.
6. Delete the VM.
7. Note the traceback in screen-q-svc.log.

2015-03-09 14:32:30.908 DEBUG neutron.plugins.ml2.drivers.cisco.nexus.nexus_network_driver [req-4eaef7c1-1d56-4399-8fa7-491cc5cb2b60 demo 6cda3290f39d4b4c9696792293c39a78] NexusDriver:
      <config xmlns:xc="urn:ietf:params:xml:ns:netconf:base:1.0">
        <configure>
          <__XML__MODE__exec_configure>
<interface>
    <nve>nve1</nve>
    <__XML__MODE_if-nve>
        <member>no member vni 12345</member>
    </__XML__MODE_if-nve>
</interface>

          </__XML__MODE__exec_configure>
        </configure>
      </config>
 from (pid=3098) delete_nve_member /opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/nexus_network_driver.py:380
2015-03-09 14:32:30.908 DEBUG neutron.plugins.ml2.drivers.cisco.nexus.nexus_network_driver [req-4eaef7c1-1d56-4399-8fa7-491cc5cb2b60 demo 6cda3290f39d4b4c9696792293c39a78] NexusDriver config:
      <config xmlns:xc="urn:ietf:params:xml:ns:netconf:base:1.0">
        <configure>
          <__XML__MODE__exec_configure>
<interface>
    <nve>nve1</nve>
    <__XML__MODE_if-nve>
        <member>no member vni 12345</member>
    </__XML__MODE_if-nve>
</interface>

          </__XML__MODE__exec_configure>
        </configure>
      </config>
 from (pid=3098) _edit_config /opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/nexus_network_driver.py:99
2015-03-09 14:32:30.973 DEBUG neutron.openstack.common.lockutils [req-4eaef7c1-1d56-4399-8fa7-491cc5cb2b60 demo 6cda3290f39d4b4c9696792293c39a78] Releasing semaphore "cisco-nexus-portlock" from (pid=3098) lock /opt/stack/neutron/neutron/openstack/common/lockutils.py:238
2015-03-09 14:32:30.974 DEBUG neutron.openstack.common.lockutils [req-4eaef7c1-1d56-4399-8fa7-491cc5cb2b60 demo 6cda3290f39d4b4c9696792293c39a78] Semaphore / lock released "delete_port_postcommit" from (pid=3098) inner /opt/stack/neutron/neutron/openstack/common/lockutils.py:275
2015-03-09 14:32:30.974 ERROR neutron.plugins.ml2.managers [req-4eaef7c1-1d56-4399-8fa7-491cc5cb2b60 demo 6cda3290f39d4b4c9696792293c39a78] Mechanism driver 'cisco_nexus' failed in delete_port_postcommit
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers Traceback (most recent call last):
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/managers.py", line 299, in _call_on_drivers
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context)
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/openstack/common/lockutils.py", line 272, in inner
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers return f(*args, **kwargs)
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/mech_cisco_nexus.py", line 764, in delete_port_postcommit
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers self._delete_nve_member) if vxlan_segment else 0
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/mech_cisco_nexus.py", line 670, in _port_action_vxlan
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers func(vni, device_id, mcast_group, host_id)
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/mech_cisco_nexus.py", line 374, in _delete_nve_member
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers '(delete_nve_member||disable_vxlan_feature)'))
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/openstack/common/excutils.py", line 82, in __exit__
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers six.reraise(self.type_, self.value, self.tb)
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/mech_cisco_nexus.py", line 366, in _delete_nve_member
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers const.NVE_INT_NUM, vni)
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/nexus_network_driver.py", line 381, in delete_nve_member
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers self._edit_config(nexus_host, config=confstr)
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/cisco/nexus/nexus_network_driver.py", line 116, in _edit_config
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers exc=e)
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers NexusConfigFailed: Failed to configure Nexus switch: 172.20.231.7 XML:
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers <config xmlns:xc="urn:ietf:params:xml:ns:netconf:base:1.0">
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers <configure>
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers <__XML__MODE__exec_configure>
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers <interface>
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers <nve>nve1</nve>
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers <__XML__MODE_if-nve>
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers <member>no member vni 12345</member>
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers </__XML__MODE_if-nve>
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers </interface>
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers </__XML__MODE__exec_configure>
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers </configure>
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers </config>
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers . Reason: Unexpected session close
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers IN_BUFFER: `Unexpected session close`.
2015-03-09 14:32:30.974 TRACE neutron.plugins.ml2.managers
2015-03-09 14:32:30.976 ERROR neutron.plugins.ml2.plugin [req-4eaef7c1-1d56-4399-8fa7-491cc5cb2b60 demo 6cda3290f39d4b4c9696792293c39a78] mechanism_manager.delete_port_postcommit failed for port cca963be-db51-4d6b-8a12-a79f1b4b312a

Changed in networking-cisco:
assignee: nobody → Carol Bouchard (caboucha)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-cisco (master)

Fix proposed to branch: master
Review: https://review.openstack.org/205149

Changed in networking-cisco:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-cisco (master)

Reviewed: https://review.openstack.org/205149
Committed: https://git.openstack.org/cgit/openstack/networking-cisco/commit/?id=6d2be75e00fca5b844562dba9c1f10a1c038a0a9
Submitter: Jenkins
Branch: master

commit 6d2be75e00fca5b844562dba9c1f10a1c038a0a9
Author: Carol Bouchard <email address hidden>
Date: Thu Jul 23 12:34:26 2015 -0400

    Delete fails after switch reset (replay off)

    Many changes have occurred since this bug was written. We now have
    a newer N9K image and dhcp port entries are now created when the CLI
    'neutron subnet-create' is executed. I went thru the steps as
    defined by the bug. The first failure after switch restart is due
    to the ncclient session being stale (fixed by 1399998). This is
    expected when replay is not enabled. When attempting delete on
    2nd retry when ncclient session new, the N9K switch returns the error
    message 'None of the VLANs exist'. This changeset accepts this
    message as an acceptable condition and prevents exceptions from
    bubbling up to higher layer.

    Change-Id: I072dc05590d8c7d376c491c931d18befa507af30
    Closes-bug: #1430308

Changed in networking-cisco:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-cisco (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/206009

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/206169

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-cisco (stable/kilo)

Change abandoned by Carol Bouchard (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/206009
Reason: Use 'git cherry-pick -x' method instead. ref: https://review.openstack.org/#/c/206169/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-cisco (stable/kilo)

Reviewed: https://review.openstack.org/206169
Committed: https://git.openstack.org/cgit/openstack/networking-cisco/commit/?id=ac84fcb861bd594a5a3773c32e06b3e58a729308
Submitter: Jenkins
Branch: stable/kilo

commit ac84fcb861bd594a5a3773c32e06b3e58a729308
Author: Carol Bouchard <email address hidden>
Date: Thu Jul 23 12:34:26 2015 -0400

    Delete fails after switch reset (replay off)

    Many changes have occurred since this bug was written. We now have
    a newer N9K image and dhcp port entries are now created when the CLI
    'neutron subnet-create' is executed. I went thru the steps as
    defined by the bug. The first failure after switch restart is due
    to the ncclient session being stale (fixed by 1399998). This is
    expected when replay is not enabled. When attempting delete on
    2nd retry when ncclient session new, the N9K switch returns the error
    message 'None of the VLANs exist'. This changeset accepts this
    message as an acceptable condition and prevents exceptions from
    bubbling up to higher layer.

    Change-Id: I072dc05590d8c7d376c491c931d18befa507af30
    Closes-bug: #1430308
    (cherry picked from commit 6d2be75e00fca5b844562dba9c1f10a1c038a0a9)

tags: added: in-stable-kilo
Sam Betts (sambetts)
Changed in networking-cisco:
milestone: none → 1.1.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-cisco (master)

Fix proposed to branch: master
Review: https://review.openstack.org/246547

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-cisco (master)
Download full text (23.9 KiB)

Reviewed: https://review.openstack.org/246547
Committed: https://git.openstack.org/cgit/openstack/networking-cisco/commit/?id=7b1eb2b6d5e55563c084f60e44adc1d32706eb17
Submitter: Jenkins
Branch: master

commit d9b9a6421d7ff92e920ed21b01ebc7bf49e38bd6
Author: Sam Betts <email address hidden>
Date: Tue Sep 29 09:18:10 2015 +0100

    Set default branch for stable/kilo

    Change-Id: I31f51ff60f95639f459839f4c7d929d5ec7c458d

commit f08fb31f20c2d8cc1e6b71784cdfd9604895e16d
Author: Rich Curran <email address hidden>
Date: Thu Sep 3 13:23:52 2015 -0400

    ML2 cisco_nexus MD: VLAN not created on switch

    As described in DE588,
    "With neutron multiworkers configured, there is a potential race condition
    issue where some of the VLANs will not be configured on one or more N9k
    switches.

    /etc/neutron/neutron.conf
    -------------------------
    api_workers=3
    rpc_workers=3"

    Fix is to allow the vlan create command to be sent down to a switch
    under most event conditions. Long term fix will be to introduce a new
    column in the port binding DB table that indicates the true state of the
    entry/row.

    Closes-Bug: #1491940
    Change-Id: If1da1fcf16a450c1a4107da9970b18fc64936896
    (cherry picked from commit 0e48a16e77fc5ec5fd485a85f97f3650126fb6fe)

commit d400749e43e9d5a1fc92683b40159afce81edc95
Author: Carol Bouchard <email address hidden>
Date: Thu Sep 3 15:19:48 2015 -0400

    Create knob to prevent caching ssh connection

    Create a new initialization knob named never_cache_ssh_connection.
    This boolean is False by default allowing multiple ssh connections
    to the Nexus switch to be cached as it behaves today. When there
    are multiple neutron processes/controllers and/or non-neutron ssh(xml)
    connections, this is an issue since processes hold onto a connection
    while the Nexus devices supports a maximum of 8 sessions. As a result,
    further ssh connections will fail. In this case, the boolean should be
    set to True causing each connection to be closed when a neutron event
    is complete.

    Change-Id: I61ec303856b757dd8d9d43110fec8e7844ab7c6d
    Closes-bug: #1491108
    (cherry picked from commit 23551a4198c61e2e25a6382f27d47b0665f054b8)

commit 0050ea7f1fb3c22214d7ca49cfe641da86123e2c
Author: Carol Bouchard <email address hidden>
Date: Wed Sep 2 11:10:42 2015 -0400

    Bubble up exceptions when Nexus replay enabled

    There are several changes made surrounding this bug.

    1) When replay is enabled, we should bubble exceptions
       for received port create/update/delete post_commit
       transactions. This was suppressed earlier by
       1422738.

    2) When an exception is encountered during a
       post_commit transaction, the driver will no longer
       mark the switch state to inactive to force a replay.
       This is no longer needed since 1481856 was introduced.
       So from this point on, only the replay thread will
       determine the state of the connection to the switch.

    3) In addition to accommodating 1 & 2 above, more detail
       data verification was added to the test code.

    Change-Id: I97...

Changed in networking-cisco:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.