manager should loop on driver init failure

Bug #1517012 reported by Duncan Thomas
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Wishlist
Unassigned

Bug Description

In the case where volume driver init fails in c-vol or c-bak (e.g. when the driver gets an error doing a test login to the array) the manager currently logs an error then continued. The init is never retried, leaving behind a process that can't do any useful work, but logs 'driver not initialized' for every request, obscuring the cause of the problem from the operator - we've had plenty of confused support requests on the channel where the operator has found the 'not initialized' message but did not take the (not entirely intuitive) step of reading back through the log to find the cause of the error.

I propose that if the driver fails in initialize, then the manager should not continue, but rather log the error, sleep for a while, then retry - there is no useful work it can be doing anyway, and this makes the logs much clearer. It also means that if the problem in initialize is transient, e.g. during power up when c-vol comes up before the array has finished booting, then the system will recover automatically.

Tags: bugsmash
Revision history for this message
Gorka Eguileor (gorka) wrote :

I agree with your proposal, this is something we should be doing.

Revision history for this message
John Griffith (john-griffith) wrote :

Probably the right answer, it is annoying that drivers aren't handling this properly themselves though:
https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/solidfire.py#L176

We also talked about this here:
https://github.com/openstack/cinder/commit/3329158d784d5966fa2fdccd1c097295ebe37a67#diff-edd3b1109653835d85834f80ae3c9b2c

A periodic here would probably be a good thing. None of this however will preclude the need for drivers to be better about how they behave IMO. There's more than just init that needs considered, because even if the device isn't reachable via API, it's volumes may or may not still be accessible. Just an additional item to consider when looking at this.

Revision history for this message
Eric Harney (eharney) wrote :

I think we need to be careful about which "initialize" we mean here. Looping if check_for_setup_error() fails, I agree with. To accomplish what you are proposing, I think this has to include do_setup() as well.

We shouldn't loop if driver __init__ fails, though.

Changed in cinder:
importance: Undecided → Wishlist
tags: added: bugsmash
Revision history for this message
Eric Harney (eharney) wrote :
Changed in cinder:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.