solidfire driver crashes during ensure export if solidfire not available

Bug #1215064 reported by Vish Ishaya
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
John Griffith
Grizzly
Fix Released
High
John Griffith

Bug Description

If the solidfire device isn't available for some reason and volumes have been created, the solidfire driver will crash during startup.

2013-08-20 03:08:06 DEBUG cinder.volume.drivers.solidfire Executing SolidFire ensure_export...
2013-08-20 03:08:06 DEBUG cinder.volume.drivers.solidfire Payload for SolidFire API call: {"params":
{"username": "690534e8073a40d1b027c35808feb6da"}
, "method": "GetAccountByName", "id": 190619619316511908876065095856189853893}
2013-08-20 03:08:06 ERROR cinder.volume.drivers.solidfire Failed to make httplib connection SolidFire Cluster: (verify san_ip settings)
2013-08-20 03:08:06 ERROR cinder.service Unhandled exception
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 227, in _start_child
self._child_process(wrap.server)
File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 204, in _child_process
launcher.run_server(server)
File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 95, in run_server
server.start()
File "/usr/lib/python2.7/dist-packages/cinder/service.py", line 355, in start
self.manager.init_host()
File "/usr/lib/python2.7/dist-packages/cinder/volume/manager.py", line 149, in init_host
self.driver.ensure_export(ctxt, volume)
File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/solidfire.py", line 545, in ensure_export
return self._do_export(volume)
File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/solidfire.py", line 267, in _do_export
sfaccount = self._get_sfaccount(volume'project_id')
File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/solidfire.py", line 225, in _get_sfaccount
sfaccount = self._get_sfaccount_by_name(sf_account_name)
File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/solidfire.py", line 211, in _get_sfaccount_by_name
data = self._issue_api_request('GetAccountByName', params)
File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/solidfire.py", line 146, in _issue_api_request
raise exception.SolidFireAPIException(msg)
SolidFireAPIException: Failed to make httplib connection:
2013-08-20 03:08:06 INFO cinder.service Child 13752 exited with status 2
2013-08-20 03:08:06 INFO cinder.service _wait_child 1
2013-08-20 03:08:06 INFO cinder.service wait wrap.failed True

Something better than this should occur, for example retrying the export indefinitely or simply ignoring the error.

Changed in cinder:
status: New → Triaged
importance: Undecided → High
milestone: none → havana-3
assignee: nobody → John Griffith (john-griffith)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/43181

Changed in cinder:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/43181
Committed: http://github.com/openstack/cinder/commit/e13b7d8a46cd54e61bdc20500ad597c2545ffaed
Submitter: Jenkins
Branch: master

commit e13b7d8a46cd54e61bdc20500ad597c2545ffaed
Author: John Griffith <email address hidden>
Date: Wed Aug 21 13:05:30 2013 -0600

    Dont crash service if sf cluster isnt available

    Currently if the SolidFire driver is configured but the cluster
    isn't available for some reason the ensure_export call will raise
    an unhandled exception and crash the volume-service.

    We should be able to handle things like loosing connectivity to
    a single back-end without impacting the other volume-service
    backends.

    We'll wrap the ensure_export call in a try block here and
    return None in the case that the connection can't be made.
    This will keep the service from crashing and log an error
    message that the connection timed out.

    Additional work would include adding a periodic retry task
    to the manager to try and start the backend service for us
    on some regular interval incase the device comes back.

    Fixes: bug 1215064

    Change-Id: Ice3f517d220c40113074bb77adbb10d5e32abd0b

Changed in cinder:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/grizzly)

Fix proposed to branch: stable/grizzly
Review: https://review.openstack.org/43526

Thierry Carrez (ttx)
Changed in cinder:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/grizzly)

Reviewed: https://review.openstack.org/43526
Committed: http://github.com/openstack/cinder/commit/3329158d784d5966fa2fdccd1c097295ebe37a67
Submitter: Jenkins
Branch: stable/grizzly

commit 3329158d784d5966fa2fdccd1c097295ebe37a67
Author: John Griffith <email address hidden>
Date: Wed Aug 21 13:05:30 2013 -0600

    Dont crash service if sf cluster isnt available

    Currently if the SolidFire driver is configured but the cluster
    isn't available for some reason the ensure_export call will raise
    an unhandled exception and crash the volume-service.

    We should be able to handle things like loosing connectivity to
    a single back-end without impacting the other volume-service
    backends.

    We'll wrap the ensure_export call in a try block here and
    return None in the case that the connection can't be made.
    This will keep the service from crashing and log an error
    message that the connection timed out.

    Additional work would include adding a periodic retry task
    to the manager to try and start the backend service for us
    on some regular interval incase the device comes back.

    Fixes: bug 1215064

    Change-Id: Ice3f517d220c40113074bb77adbb10d5e32abd0b
    (cherry picked from commit e13b7d8a46cd54e61bdc20500ad597c2545ffaed)

Thierry Carrez (ttx)
Changed in cinder:
milestone: havana-3 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.