Eqlx driver can leave orphaned SSH connections

Bug #1374613 reported by Sean McGinnis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Medium
Sean McGinnis

Bug Description

The EqualLogic driver uses an SSH connection to the PS array to perform CLI operations. The config option eqlx_cli_timeout can be set to adjust the enforced timeouts for these CLI commands to execute.

If a heavily loaded system takes too long, or if a user has set that CLI timeout too low, the driver aborts the thread performing the SSH operation. There currently is no handling in that thread to perform any cleanup, so when this happens the SSH connection is lost and the array will hold on to that session until it times out.

By default the PS array will only allow 7 concurrent CLI sessions. If several operations get aborted by the driver it will no longer be able to connect resulting in all subsequent operations failing until sessions start timing out and becoming available.

Changed in cinder:
assignee: nobody → Sean McGinnis (sean-mcginnis)
Changed in cinder:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/124509

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/124509
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=5cb23b67c53437fc51a6b37acac477fba4d6a7ab
Submitter: Jenkins
Branch: master

commit 5cb23b67c53437fc51a6b37acac477fba4d6a7ab
Author: Sean McGinnis <email address hidden>
Date: Fri Sep 26 15:21:35 2014 -0500

    Handle eqlx SSH connection close on abort.

    EqualLogic array CLI operation timeout causes the
    SSH thread to be aborted. This would cause SSH
    sessions to be orphaned and hit a max connection
    limit on the array. This fix catches these aborts
    and makes sure the connection is closed.

    Change-Id: I9392fd5dd79eb44f252bf50217f17cc473e6f2f0
    Closes-Bug: 1374613

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
milestone: none → juno-rc2
Thierry Carrez (ttx)
Changed in cinder:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (proposed/juno)

Fix proposed to branch: proposed/juno
Review: https://review.openstack.org/126863

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (proposed/juno)

Reviewed: https://review.openstack.org/126863
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=04cd35fd88768ec0f5d23619cec2df4981ee7d8c
Submitter: Jenkins
Branch: proposed/juno

commit 04cd35fd88768ec0f5d23619cec2df4981ee7d8c
Author: Sean McGinnis <email address hidden>
Date: Fri Sep 26 15:21:35 2014 -0500

    Handle eqlx SSH connection close on abort.

    EqualLogic array CLI operation timeout causes the
    SSH thread to be aborted. This would cause SSH
    sessions to be orphaned and hit a max connection
    limit on the array. This fix catches these aborts
    and makes sure the connection is closed.

    Change-Id: I9392fd5dd79eb44f252bf50217f17cc473e6f2f0
    Closes-Bug: 1374613
    (cherry picked from commit 5cb23b67c53437fc51a6b37acac477fba4d6a7ab)

Changed in cinder:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: juno-rc2 → 2014.2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/128920

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by Mike Perez (<email address hidden>) on branch: master
Review: https://review.openstack.org/128920

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)
Download full text (11.8 KiB)

Reviewed: https://review.openstack.org/128920
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=66494f54112fdfa135b3974c75aa388c8d1fb49e
Submitter: Jenkins
Branch: master

commit be3d4604dc0566e0838959d998ff1d37755de6d3
Author: Tomoki Sekiyama <email address hidden>
Date: Tue Oct 14 19:09:44 2014 -0400

    Fix LVM iSCSI driver tgtadm CHAP authentication

    Currently CHAP Authentication in LVM iSCSI driver with tgtadm does not work.
    This is because the tgtadm helper creates the target configuration file
    with an 'IncomingUser' entry, which is ignored by tgtd.
    This patch fixes it to 'incominguser'.

    Change-Id: I14871985a2a916834122f849238f05b75726bc1a
    Closes-Bug: #1329214
    (cherry picked from commit e3563891545c801726d227f752cf99488ed5c7dd)

commit f7ee62cc58d8b642af67510a310f6259492a4508
Author: Mitsuhiro Tanino <email address hidden>
Date: Tue Oct 14 12:41:41 2014 -0400

    Export cinder volumes only if the status is 'in-use'

    Currently, cinder volumes are exported both 'in-use' and 'available'
    after restarting cinder-volume service.
    This behavior was introduced following commit.

      commit ffefe18334a9456250e1b6ff88b7b47fb366f374
      Author: Zhiteng Huang <email address hidden>
      Date: Sat Aug 23 18:32:57 2014 +0000

    If the volumes are attached to nova instances, they should be exported
    via tgtd after restarting cinder-volume.
    But the volumes which are not attached to instances must not be exported
    because everyone can connect these volumes.

    This patch changes volume export behavior that exports a volume only if
    the volume status is 'in-use'.

    Change-Id: I4c598c240b9290c81bd8001e5a0720c8c329aeb9
    Signed-off-by: Mitsuhiro Tanino <email address hidden>
    Closes-bug: #1381106
    (cherry picked from commit e2f28b967910625432be0eab6a851adf53ac58ea)

commit 01e7c516852e53df661b2eedc970c327c1ff10ce
Author: Vipin Balachandran <email address hidden>
Date: Fri Oct 10 23:06:27 2014 +0530

    Revert "Relocate volume to compliant datastore"

    Commit 4be8913520f5e9fe4109ade101da9509e4a83360 introduced a regression
    which causes failures during cinder volume re-attach. This patch reverts
    commit 4be8913520f5e9fe4109ade101da9509e4a83360 as an immediate fix.

    Closes-Bug: #1379830
    Change-Id: I5dfbd45533489c3c81db8d256bbfd2f85614a357
    (cherry picked from commit 48cb82971e0418f9a629e2b39d0433dc2c0e6919)

commit 900d49723f65e87658381ff955559f54ac98c487
Author: Andreas Jaeger <email address hidden>
Date: Thu Oct 9 12:25:28 2014 +0200

    Updated translations

    Commands run:-
    $ python setup.py extract_messages
    $ python setup.py update_catalog --no-fuzzy-matching \
      --ignore-obsolete=true
    $ source \
      ../openstack-infra/project-config/jenkins/scripts/common_translation_update.sh
    $ setup_loglevel_vars
    $ cleanup_po_files cinder

    Change-Id: I73f3bdccb4be98df95fa853864e465f4d83a8884

commit 8e94aaa2b28b491314fe8642061ac73e3fe8e966
Author: Navneet Singh <email address hidden>
Date: Thu Aug 28 16:03:41 2014 +0530

    NetApp fix eseries unit test mock clean

 ...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.