cinder does not handle missing volume group gracefully (stuck in "creating")
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Cinder |
Fix Released
|
Low
|
Flavio Percoco | ||
Havana |
Fix Released
|
High
|
Flavio Percoco |
Bug Description
Tested with Havana rc2 from the UCA on Precise.
If the cinder-volumes (in a default configuration) LVM volume group does not exist, cinder will try to create a volume, which is bound to fail. The volume then gets stuck in the "creating" state and can't be deleted. The log will contain:
2013-10-17 09:29:58.188 16676 ERROR cinder.
2013-10-17 09:29:58.189 16676 ERROR cinder.
2013-10-17 09:29:58.189 16676 ERROR cinder.
2013-10-17 09:29:58.189 16676 TRACE cinder.
2013-10-17 09:29:58.189 16676 TRACE cinder.
2013-10-17 09:29:58.189 16676 TRACE cinder.
2013-10-17 09:29:58.189 16676 TRACE cinder.
2013-10-17 09:29:58.189 16676 TRACE cinder.
2013-10-17 09:29:58.189 16676 TRACE cinder.
2013-10-17 09:29:58.189 16676 TRACE cinder.
2013-10-17 09:30:47.198 16676 WARNING cinder.
Resetting the state with "cinder reset-state" will get the volume to the "available" state, which it isn't. Deleting or force-deleting will also fail, getting stuck in "deleting" state forever. The only solution I found was to directly kill the relevant rows in the volumes table and cinder-manage db sync.
I know in grizzly, cinder-volume would refuse to start if it couldn't find the volume-group. I thought that behavior was better. Failing when attempting to create the volume, instead of getting stuck in the creating state, would also be acceptable.
summary: |
- cinder does not handle missing volume group gracefully + cinder does not handle missing volume group gracefully (stuck in + "creating") |
Changed in cinder: | |
milestone: | none → icehouse-2 |
Changed in cinder: | |
status: | Fix Committed → Fix Released |
Changed in cinder: | |
milestone: | icehouse-2 → 2014.1 |
So we made some changes that now allow the volume service to start, mostly to accommodate the case of multiple back-ends, there's no reason that if one backend service isn't ready/available why the entire service should not run.
The delete should now work on a volume in error-state (ie you should not have needed to use the reset-state cmd). By changing the state to "available" you've now created a situation where the manager is going to try and connect to the backend driver which isn't running and thus your hang.
To summarize, you should be able to just do a regular delete in this scenario. That being said we should have a look and try to make things a bit more forgiving if the state is changed like in your case.