Cinder IBM SVC Driver Timeout

Bug #1625499 reported by Christian Schlotter
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Won't Fix
Medium
IBM Storage

Bug Description

We are using IBM Storwize SVC with iscsi on Mitaka
.
We encountered that the driver needs a long time at the function get_host_from_connector (about 15 seconds, https://github.com/openstack/cinder/blob/stable/mitaka/cinder/volume/drivers/ibm/storwize_svc/storwize_svc_common.py#L703) because we there are a lot of hosts registered in our v7000 and it is using the "exhaustive search".

We can reproduce this problem the following way:
1. Use heat template with single instance and empty additional volume which gets attached
2. Create multiple stacks by a script

If we for example did create 8 Stacks/Instances only about 4 get successfully created. The others are reaching a timeout in nova for attaching the volume (waiting for cinder).

Here a part of the logfile, to see it needs a long time:
2016-09-15 09:19:50.523 4801 cinder.volume.drivers.ibm.storwize_svc.storwize_svc_common ... Enter: get_host_from_connector: .... get_host_from_connector /usr/lib/python2.7/site-packages/cinder/volume/drivers/ibm/storwize_svc/storwize_svc_common.py:705
2016-09-15 09:20:05.433 4801 cinder.volume.drivers.ibm.storwize_svc.storwize_svc_common ... Leave: get_host_from_connector: host gridnode03. get_host_from_connector /usr/lib/python2.7/site-packages/cinder/volume/drivers/ibm/storwize_svc/storwize_svc_common.py:747
2016-09-15 09:20:13.403 4801 cinder.volume.drivers.ibm.storwize_svc.storwize_svc_common ... Enter: get_host_from_connector: .... get_host_from_connector /usr/lib/python2.7/site-packages/cinder/volume/drivers/ibm/storwize_svc/storwize_svc_common.py:705
2016-09-15 09:20:28.119 4801 cinder.volume.drivers.ibm.storwize_svc.storwize_svc_common ... Leave: get_host_from_connector: host gridnode03. get_host_from_connector /usr/lib/python2.7/site-packages/cinder/volume/drivers/ibm/storwize_svc/storwize_svc_common.py:747
2016-09-15 09:20:29.889 4801 cinder.volume.drivers.ibm.storwize_svc.storwize_svc_common ... Enter: get_host_from_connector: .... get_host_from_connector /usr/lib/python2.7/site-packages/cinder/volume/drivers/ibm/storwize_svc/storwize_svc_common.py:705
2016-09-15 09:20:44.889 4801 cinder.volume.drivers.ibm.storwize_svc.storwize_svc_common ... Leave: get_host_from_connector: host gridnode02. get_host_from_connector /usr/lib/python2.7/site-packages/cinder/volume/drivers/ibm/storwize_svc/storwize_svc_common.py:747

Our workaround is a patch (see attachment). I tested it in our environment on Mitaka. As an alternative the rpc_timeout could be changed on nova and cinder.
Because the hostname of the nova-compute node is known in this function and if a host get's created on the storage it will get named by its hostname we could first limit the search to this hostname.

I can attach more logfiles if needed :-)

Revision history for this message
Christian Schlotter (christi-schlotter) wrote :

My patch throws a error if the host does not exist on the storage system.
The line "resp = self.ssh.lshost(host=connector['host'].partition('.')[0])" should maybe be wrapped with a try catch.

Revision history for this message
Jay Bryant (jsbryant) wrote :

Xiao Qin,

Can you please take a look at this bug/patch and see if we can get something put together?

Christian is an IBMer so don't hesitate to enlist his help!

Thanks!

Changed in cinder:
importance: Undecided → Medium
Revision history for this message
Christian Schlotter (christi-schlotter) wrote :

This version of the patch fixes the exception raised if the host does not exist on the storage.

Revision history for this message
xiaoqin (xiaoqin-li) wrote :

Hi Christian
I do not think the patch you provided is a good solution or even work since we should not search the host by host name. we need to search the host exactly by wwpn(fc) or iqn(iscsi) the nova node use. There are two reasons for this.
1. The host created by cinder is not gridnode03, It should be gridnode03-xxxx xxxx is a random 8-character suffix. Please refer to the code https://github.com/openstack/cinder/blob/stable/mitaka/cinder/volume/drivers/ibm/storwize_svc/storwize_svc_common.py#L795
2. If some user already create the host in SVC manually with host name for example, "novanode" using the nova node wwpn or iqn, then we do not need to create the host again. But if we use the nova node name(gridnode03) to do the filter, it will be empty.

So overall we still need to search the host exactly by wwpn(fc) or iqn(iscsi). And there is no way in SVC CLI to filter a host by wwpn/iqn directly. So we still need to use exhaustive search.

Revision history for this message
xiaoqin (xiaoqin-li) wrote :

Hi Christian
Currently, you can use set rpc_timeout as a workaround and we can also try to think about how to improve it.

For your solution, if we plan to use host name as a filter, the code in create_host needs to be changed to use connector['host'] as host name, but we also need to consider the follow condition
1. connector['host'] may contain special character that SVC does not accept.
2. If the connector['host'] is too simple, for example, "node1" and there is already some host existed in SVC with the same host name.
2. If some user already create the host in SVC manually using the nova node wwpn or iqn, we still need to do filter by wwpn or iqn. So exhaustive search is still needed in such solution.

So overall we still need to search the host exactly by wwpn(fc) or iqn(iscsi).

tags: added: ibm
Isaac Beckman (isaacb)
Changed in cinder:
assignee: nobody → IBM Storage (ibm-storage)
Eric Harney (eharney)
Changed in cinder:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.