Nexus VLAN config hangs when replay enabled on RHEL set-up

Bug #1454738 reported by Carol Bouchard
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-cisco
Fix Released
Undecided
Rich Curran
Kilo
Fix Committed
Undecided
Rich Curran
Liberty
Fix Released
Undecided
Rich Curran

Bug Description

When Nexus VLAN is configured and replay is enabled on a RHEL installation, the Nexus xml request seems to hang and does not continue further.

Changed in networking-cisco:
assignee: nobody → Carol Bouchard (caboucha)
tags: added: cisco ml2
Rich Curran (rcurran)
Changed in networking-cisco:
assignee: Carol Bouchard (caboucha) → Rich Curran (rcurran)
Leon Zachery (lzachery)
tags: added: e-rel
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-cisco (master)

Fix proposed to branch: master
Review: https://review.openstack.org/188891

Changed in networking-cisco:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-cisco (master)

Reviewed: https://review.openstack.org/188891
Committed: https://git.openstack.org/cgit/stackforge/networking-cisco/commit/?id=347073a248ecd844f259c6ef821c5cc27d1b07d1
Submitter: Jenkins
Branch: master

commit 347073a248ecd844f259c6ef821c5cc27d1b07d1
Author: Rich Curran <email address hidden>
Date: Fri Jun 5 15:00:14 2015 -0400

    ML2 cisco_nexus MD: Config hangs when replay enabled

    When multiple neutron-server processes are created (rpc_workers > 0)
    the cisco_nexus monitor_thread was hanging.

    Similar issues seen with other MDs. More info:
    http://lists.openstack.org/pipermail/openstack-dev/2015-May/063515.html
    http://lists.openstack.org/pipermail/openstack-dev/2015-June/065558.html
    https://bugs.launchpad.net/vmware-nsx/+bug/1420278

    09-Jun - Upstream community should be fixing this problem properly
    in near future - https://review.openstack.org/#/c/189391/1
    Once this fix is pushed up then a new bug can be used to update this
    patch to use the new method for avoiding these multi-process/thread
    issues.

    Change-Id: I1cbf6a4f9bc2e795720c75aca10abd4d8d458434
    Closes-Bug: #1454738

Changed in networking-cisco:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-cisco (master)

Fix proposed to branch: master
Review: https://review.openstack.org/191823

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-cisco (master)

Change abandoned by rcurran (<email address hidden>) on branch: master
Review: https://review.openstack.org/191823
Reason: submitted change under different patch
(this patch was not complete)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-cisco (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/209241

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/209248

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-cisco (stable/kilo)

Reviewed: https://review.openstack.org/209248
Committed: https://git.openstack.org/cgit/openstack/networking-cisco/commit/?id=ffabc773febb9a8df7853588ae27a4fe3bc4069b
Submitter: Jenkins
Branch: stable/kilo

commit ffabc773febb9a8df7853588ae27a4fe3bc4069b
Author: Rich Curran <email address hidden>
Date: Sat Jun 20 13:38:27 2015 -0400

    ML2 cisco_nexus MD: Multiprocess replay issue

    When api or rpc_workers are set > 0 replay is not working in multi-node setups.
    Issue is that access to nexus switches is to set to False (can't access) by
    default in the monitor code. In multiprocess configurations the monitor
    thread (thread code determines if switch is accessible) only runs in the
    main process (mech_cisco_nexus.py initialize() method is only called once,
    regardless of number of processes). In all other processes the "access
    switch" setting never gets set to True (can access).

    This patch allows all non-parent processes to continue event handler
    processing. Only the parent process (which starts the monitor thread) will
    verify if the switch is accessible during event processing.

    Change-Id: I637aba7615df2711fdf058e57800c28e3741dadb
    Partial-Bug: #1454738
    (cherry picked from commit 7d37c15e8fc0e20ad9e3b0606d7a2fc534bfdd4a)

tags: added: in-stable-kilo
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/209241
Committed: https://git.openstack.org/cgit/openstack/networking-cisco/commit/?id=b60296644660303fb2341ca6495611621fc486e7
Submitter: Jenkins
Branch: stable/kilo

commit b60296644660303fb2341ca6495611621fc486e7
Author: Rich Curran <email address hidden>
Date: Fri Jun 5 15:00:14 2015 -0400

    ML2 cisco_nexus MD: Config hangs when replay enabled

    When multiple neutron-server processes are created (rpc_workers > 0)
    the cisco_nexus monitor_thread was hanging.

    Similar issues seen with other MDs. More info:
    http://lists.openstack.org/pipermail/openstack-dev/2015-May/063515.html
    http://lists.openstack.org/pipermail/openstack-dev/2015-June/065558.html
    https://bugs.launchpad.net/vmware-nsx/+bug/1420278

    09-Jun - Upstream community should be fixing this problem properly
    in near future - https://review.openstack.org/#/c/189391/1
    Once this fix is pushed up then a new bug can be used to update this
    patch to use the new method for avoiding these multi-process/thread
    issues.

    Change-Id: I1cbf6a4f9bc2e795720c75aca10abd4d8d458434
    Closes-Bug: #1454738
    (cherry picked from commit 347073a248ecd844f259c6ef821c5cc27d1b07d1)

Sam Betts (sambetts)
Changed in networking-cisco:
milestone: none → 1.1.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-cisco (master)

Fix proposed to branch: master
Review: https://review.openstack.org/246547

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-cisco (master)
Download full text (23.9 KiB)

Reviewed: https://review.openstack.org/246547
Committed: https://git.openstack.org/cgit/openstack/networking-cisco/commit/?id=7b1eb2b6d5e55563c084f60e44adc1d32706eb17
Submitter: Jenkins
Branch: master

commit d9b9a6421d7ff92e920ed21b01ebc7bf49e38bd6
Author: Sam Betts <email address hidden>
Date: Tue Sep 29 09:18:10 2015 +0100

    Set default branch for stable/kilo

    Change-Id: I31f51ff60f95639f459839f4c7d929d5ec7c458d

commit f08fb31f20c2d8cc1e6b71784cdfd9604895e16d
Author: Rich Curran <email address hidden>
Date: Thu Sep 3 13:23:52 2015 -0400

    ML2 cisco_nexus MD: VLAN not created on switch

    As described in DE588,
    "With neutron multiworkers configured, there is a potential race condition
    issue where some of the VLANs will not be configured on one or more N9k
    switches.

    /etc/neutron/neutron.conf
    -------------------------
    api_workers=3
    rpc_workers=3"

    Fix is to allow the vlan create command to be sent down to a switch
    under most event conditions. Long term fix will be to introduce a new
    column in the port binding DB table that indicates the true state of the
    entry/row.

    Closes-Bug: #1491940
    Change-Id: If1da1fcf16a450c1a4107da9970b18fc64936896
    (cherry picked from commit 0e48a16e77fc5ec5fd485a85f97f3650126fb6fe)

commit d400749e43e9d5a1fc92683b40159afce81edc95
Author: Carol Bouchard <email address hidden>
Date: Thu Sep 3 15:19:48 2015 -0400

    Create knob to prevent caching ssh connection

    Create a new initialization knob named never_cache_ssh_connection.
    This boolean is False by default allowing multiple ssh connections
    to the Nexus switch to be cached as it behaves today. When there
    are multiple neutron processes/controllers and/or non-neutron ssh(xml)
    connections, this is an issue since processes hold onto a connection
    while the Nexus devices supports a maximum of 8 sessions. As a result,
    further ssh connections will fail. In this case, the boolean should be
    set to True causing each connection to be closed when a neutron event
    is complete.

    Change-Id: I61ec303856b757dd8d9d43110fec8e7844ab7c6d
    Closes-bug: #1491108
    (cherry picked from commit 23551a4198c61e2e25a6382f27d47b0665f054b8)

commit 0050ea7f1fb3c22214d7ca49cfe641da86123e2c
Author: Carol Bouchard <email address hidden>
Date: Wed Sep 2 11:10:42 2015 -0400

    Bubble up exceptions when Nexus replay enabled

    There are several changes made surrounding this bug.

    1) When replay is enabled, we should bubble exceptions
       for received port create/update/delete post_commit
       transactions. This was suppressed earlier by
       1422738.

    2) When an exception is encountered during a
       post_commit transaction, the driver will no longer
       mark the switch state to inactive to force a replay.
       This is no longer needed since 1481856 was introduced.
       So from this point on, only the replay thread will
       determine the state of the connection to the switch.

    3) In addition to accommodating 1 & 2 above, more detail
       data verification was added to the test code.

    Change-Id: I97...

Changed in networking-cisco:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.