Dell eqlx/PS driver timeout error in periodic task

Bug #1661154 reported by John Haan
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Rajini Karthik

Bug Description

Cinder volume driver gets the capacity information from periodic task.
Eqlx driver, which is EqualLogic Storage of Dell, gets the information using 'pool select default show' CLI command.
The CLI command brings up default pool information and all list of volumes.

Below is output of command.

TEST-PS6210> pool select default show
______________________________ Pool Information _______________________________
Name: default
Description:
Default: true
Data-Reduction: not-started
TotalVolumes: 4
VolumesOnline: 4
.
.
TotalCapacity: 9.1TB
VolumeReserve: 102.47GB
.
.
FreeSpace: 8.89TB
.
.
_______________________________________________________________________________
Name Status Model Version Disks Capacity FreeSpace Connections
---------- ------- ------- ---------- ----- ---------- ---------- -----------
PS6210XS-0 online 70-0425 V9.0.3 (R4 24 9.1TB 8.89TB 2
  1 27117)

___________________________________ Volumes ___________________________________

Name Size Snapshots Status Permission Connections
--------------- ---------- --------- -------- ---------- -----------
volume-d961ebbb 1GB 2 online read-write 0
  -ccfe-4983-95
  a1-cdb5faf7a4
  72

It could make a problem when cinder hava a lot of volumes in eqlx driver.

Below is cinder-volume log,
2017-02-02 11:29:44.540 151491 ERROR cinder.volume.drivers.eqlx [-] The EQL array has closed the connection.
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx [req-2feb8b12-ea6f-45c2-81e6-09ff45078df4 - - - - -] Error running command.
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx Traceback (most recent call last):
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx File "/usr/local/lib/python2.7/dist-packages/cinder/volume/drivers/eqlx.py", line 263, in _run_ssh
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx timeout=self.configuration.ssh_conn_timeout)
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx File "/usr/local/lib/python2.7/dist-packages/cinder/volume/drivers/eqlx.py", line 94, in __inner
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx res = gt.wait()
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 175, in wait
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx return self._exit_event.wait()
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 121, in wait
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx return hubs.get_hub().switch()
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 294, in switch
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx return self.greenlet.switch()
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx result = function(*args, **kwargs)
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx File "/usr/local/lib/python2.7/dist-packages/cinder/volume/drivers/eqlx.py", line 211, in _ssh_execute
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx self._get_output(chan)
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx File "/usr/local/lib/python2.7/dist-packages/cinder/volume/drivers/eqlx.py", line 189, in _get_output
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx raise exception.VolumeBackendAPIException(data=msg)
2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx VolumeBackendAPIException: Bad or unexpected response
from the storage volume backend API: The EQL array has closed the connection.

In our case, we can get the error when we have more than 220 volumes for both PS6110XV and PS5110XV model.

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Was this the one that was corrected by configuring the correct group name?

Revision history for this message
John Haan (yongiman) wrote :

Here is my elqx driver configuration.

[eqlx4]
volume_driver = cinder.volume.drivers.eqlx.DellEQLSanISCSIDriver
volume_backend_name = EQLX
san_ip = *.*.*.*
san_login = grpadmin
san_password = grpadmin
eqlx_group_name = TEST-PS6210
eqlx_pool = default

The group name is correct, but timeout error is still raising now.

Changed in cinder:
assignee: nobody → Rajini Ram (rajini-ram)
summary: - eqlx driver timeout error in periodic task
+ Dell eqlx/PS driver timeout error in periodic task
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/508938

Changed in cinder:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/508938
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=a9a0c2ee2e973d0594f2707d64846e882e179c94
Submitter: Jenkins
Branch: master

commit a9a0c2ee2e973d0594f2707d64846e882e179c94
Author: rajinir <email address hidden>
Date: Mon Oct 2 10:51:48 2017 -0500

    Dell EMC PS: Optimize parsing of capacity info from backend

    The backend api returns large amounts of information and
    times out when there are lots of volumes. Accelerated the
    process by terminating the parsing after the capacity info
    is retrieved.

    Closes Bug: #1661154

    Change-Id: I1f0adaa8e25cd3ec74084b22bbe1573b92713959

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/512684

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (stable/ocata)

Change abandoned by Rajini Karthik (<email address hidden>) on branch: stable/ocata
Review: https://review.openstack.org/512684

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/512762

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (driverfixes/newton)

Fix proposed to branch: driverfixes/newton
Review: https://review.openstack.org/514346

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (driverfixes/ocata)

Fix proposed to branch: driverfixes/ocata
Review: https://review.openstack.org/514364

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 12.0.0.0b1

This issue was fixed in the openstack/cinder 12.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/pike)

Reviewed: https://review.openstack.org/512762
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=dbde6a3cad318ad8a9e23e23184bccb442b069aa
Submitter: Zuul
Branch: stable/pike

commit dbde6a3cad318ad8a9e23e23184bccb442b069aa
Author: rajinir <email address hidden>
Date: Mon Oct 2 10:51:48 2017 -0500

    Dell EMC PS: Optimize parsing of capacity info from backend

    The backend api returns large amounts of information and
    times out when there are lots of volumes. Accelerated the
    process by terminating the parsing after the capacity info
    is retrieved.

    Closes Bug: #1661154

    Change-Id: I1f0adaa8e25cd3ec74084b22bbe1573b92713959
    (cherry picked from commit a9a0c2ee2e973d0594f2707d64846e882e179c94)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (driverfixes/ocata)

Reviewed: https://review.openstack.org/514364
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=9a0a74c30f0954627f77223ec36a4927751aaf8b
Submitter: Zuul
Branch: driverfixes/ocata

commit 9a0a74c30f0954627f77223ec36a4927751aaf8b
Author: rajinir <email address hidden>
Date: Mon Oct 2 10:51:48 2017 -0500

    Dell EMC PS: Optimize parsing of capacity info from backend

    The backend api returns large amounts of information and
    times out when there are lots of volumes. Accelerated the
    process by terminating the parsing after the capacity info
    is retrieved.

    Closes Bug: #1661154

    Change-Id: I1f0adaa8e25cd3ec74084b22bbe1573b92713959
    (cherry picked from commit a9a0c2ee2e973d0594f2707d64846e882e179c94)

tags: added: in-driverfixes-ocata
tags: added: in-driverfixes-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (driverfixes/newton)

Reviewed: https://review.openstack.org/514346
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=02e858b5da4066988fec8ed2574522c9ad90c4fb
Submitter: Zuul
Branch: driverfixes/newton

commit 02e858b5da4066988fec8ed2574522c9ad90c4fb
Author: rajinir <email address hidden>
Date: Mon Oct 2 10:51:48 2017 -0500

    Dell EMC PS: Optimize parsing of capacity info from backend

    The backend api returns large amounts of information and
    times out when there are lots of volumes. Accelerated the
    process by terminating the parsing after the capacity info
    is retrieved.

    Closes Bug: #1661154

    Change-Id: I1f0adaa8e25cd3ec74084b22bbe1573b92713959
    (cherry picked from commit a9a0c2ee2e973d0594f2707d64846e882e179c94)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 11.0.1

This issue was fixed in the openstack/cinder 11.0.1 release.

Revision history for this message
Deepa (dpaclt) wrote :

Is there way we can fix this in Ocata version

Revision history for this message
Rajini Karthik (rajini-karthik) wrote :

This fix has been backported to ocata, pike and newton. See above messages in the thread

Revision history for this message
Deepa (dpaclt) wrote : Re: [Bug 1661154] Re: Dell eqlx/PS driver timeout error in periodic task
Download full text (4.9 KiB)

Hello Rajini

Yes it fixed with little more tweaks inside cinder.conf.We had to increase the time out value

Thanks again for the response

Sent from my iPhone

> On 27-Nov-2017, at 8:29 PM, Rajini Karthik <email address hidden> wrote:
>
> This fix has been backported to ocata, pike and newton. See above
> messages in the thread
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1661154
>
> Title:
> Dell eqlx/PS driver timeout error in periodic task
>
> Status in Cinder:
> Fix Released
>
> Bug description:
> Cinder volume driver gets the capacity information from periodic task.
> Eqlx driver, which is EqualLogic Storage of Dell, gets the information using 'pool select default show' CLI command.
> The CLI command brings up default pool information and all list of volumes.
>
> Below is output of command.
>
> TEST-PS6210> pool select default show
> ______________________________ Pool Information _______________________________
> Name: default
> Description:
> Default: true
> Data-Reduction: not-started
> TotalVolumes: 4
> VolumesOnline: 4
> .
> .
> TotalCapacity: 9.1TB
> VolumeReserve: 102.47GB
> .
> .
> FreeSpace: 8.89TB
> .
> .
> _______________________________________________________________________________
> Name Status Model Version Disks Capacity FreeSpace Connections
> ---------- ------- ------- ---------- ----- ---------- ---------- -----------
> PS6210XS-0 online 70-0425 V9.0.3 (R4 24 9.1TB 8.89TB 2
> 1 27117)
>
> ___________________________________ Volumes ___________________________________
>
> Name Size Snapshots Status Permission Connections
> --------------- ---------- --------- -------- ---------- -----------
> volume-d961ebbb 1GB 2 online read-write 0
> -ccfe-4983-95
> a1-cdb5faf7a4
> 72
>
> It could make a problem when cinder hava a lot of volumes in eqlx
> driver.
>
> Below is cinder-volume log,
> 2017-02-02 11:29:44.540 151491 ERROR cinder.volume.drivers.eqlx [-] The EQL array has closed the connection.
> 2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx [req-2feb8b12-ea6f-45c2-81e6-09ff45078df4 - - - - -] Error running command.
> 2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx Traceback (most recent call last):
> 2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx File "/usr/local/lib/python2.7/dist-packages/cinder/volume/drivers/eqlx.py", line 263, in _run_ssh
> 2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx timeout=self.configuration.ssh_conn_timeout)
> 2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx File "/usr/local/lib/python2.7/dist-packages/cinder/volume/drivers/eqlx.py", line 94, in __inner
> 2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx res = gt.wait()
> 2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 175, in wait
> 2017-02-02 11:29:44.541 151491 ERROR cinder.volume.drivers.eqlx return self._ex...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/ocata)

Reviewed: https://review.openstack.org/512684
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=047c3f87b590ea2d627692d05347fcb49c060bab
Submitter: Zuul
Branch: stable/ocata

commit 047c3f87b590ea2d627692d05347fcb49c060bab
Author: rajinir <email address hidden>
Date: Mon Oct 2 10:51:48 2017 -0500

    Dell EMC PS: Optimize parsing of capacity info from backend

    The backend api returns large amounts of information and
    times out when there are lots of volumes. Accelerated the
    process by terminating the parsing after the capacity info
    is retrieved.

    Closes Bug: #1661154

    Change-Id: I1f0adaa8e25cd3ec74084b22bbe1573b92713959
    (cherry picked from commit a9a0c2ee2e973d0594f2707d64846e882e179c94)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 10.0.7

This issue was fixed in the openstack/cinder 10.0.7 release.

Revision history for this message
Krzysztof Pawlowski (krzysztof.pawlowski) wrote :

In my opinion issue still exists. Function _update_volume_stats executes self._eql_execute -> _run_ssh -> _ssh_execute. _ssh_execute function return output after receiving all information from array including volumes and above fix does not prevent timeout during reading output from array.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.