creating volume stuck with "creating" status

Bug #1263691 reported by ugvddm
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Cinder
Invalid
Medium
Ivan Kolodyazhny

Bug Description

Creating volume stuck with "creating" status, when the cinder-scheduler or cinder-volume is not running.
It's essential to check the status of scheduler or volume before volume-api create volume.

Reproduce:
1. stop the cinder-scheduler or cinder-volume
2. create a volume

Excepted:
If the scheduler or volume is not running, create a volume, some error message should be reported,
and reset the volume's status from "creating" to "error".

ugvddm (271025598-9)
Changed in cinder:
assignee: nobody → ugvddm (271025598-9)
Changed in cinder:
status: New → Triaged
importance: Undecided → Medium
milestone: none → icehouse-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/64014

Changed in cinder:
status: Triaged → In Progress
Thierry Carrez (ttx)
Changed in cinder:
milestone: icehouse-2 → icehouse-3
Thierry Carrez (ttx)
Changed in cinder:
milestone: icehouse-3 → icehouse-rc1
Revision history for this message
Mike Perez (thingee) wrote :

I was not able to reproduce this problem.

If I stop the scheduler, create a volume, start the scheduler...it'll eventually pick up the message and hand it off to a cinder-volume host to be created and set to available.

If I stop the cinder-volume service, create a volume, the scheduler will just error out that it couldn't find a suitable host and set the status to error.

Mike Perez (thingee)
Changed in cinder:
status: In Progress → Incomplete
Revision history for this message
ugvddm (271025598-9) wrote :

Hi Mike

Actually, as you said , if you stopped the scheduler, and started it immediately, the volume was created and available.
But, if user don't start the scheduler, the volume will stuck with "creating" status, and user will not know what's happening.

So, I think this is a bug, and we should roll back the volume status if timed out.

Revision history for this message
Mike Perez (thingee) wrote :

I would argue that's what would be expected. It'll sit in the message queue in a creating state until a cinder-volume host puts it in an available state. What do people think of a different starting state like 'pending create' and once the cinder-volume is assigned to create it, it'll put it in a creating state?

Revision history for this message
Huang Zhiteng (zhiteng-huang) wrote :

Zhengguang,

While RabbitMQ has a TTL extension for messages and queues(http://www.rabbitmq.com/ttl.html), Cinder (and most other projects in OpenStack) doesn't make use of that and that is why there is no consideration of message expiration in Cinder. Therefore, it means the message will stay in the queue for quite very long time, and once scheduler service comes back up, it will consume those messages and makes placement decision for 'create' request eventually.

On the other hand, I'd agree we might take time to think about how to handle message lost, but I think taskflow may have taken care of this already.

Revision history for this message
Jay Bryant (jsbryant) wrote :

Mike, the idea of having a pending-create state doesn't seem all bad. Thought, given Winston's info in comment #6 it seems that the better terminology might be 'queued'. Using that term would be consistent with what Glance does when it is preparing to process an image.

What would you guys think of the queued state?

Revision history for this message
Huang Zhiteng (zhiteng-huang) wrote :

Jay, I think the situation for Glance is somewhat different from Cinder. For Glance, the glance-api service acts like a relay point for backing store, since the uploaded images has to be transferred to glance-api and then be stored on back-end. For large images, this process could take quite a while (couple of seconds to even minutes), so the 'queued' state for images indicates the glance service have received uploaded image and is working on putting it to backing store, which is a end user visible normal procedure.
On the contrary, what happens after user submits 'create' request and before the volume is available usually takes only a few jitters to finish because it's just a few message delivery between cinder services and a quick decision making process. End users usually barely notice the 'creating' state of volume until things go wrong. But request left in message queue not being picked up is only one of the failure cases, other bad things can happen and cause the similar 'symptom' - stuck at 'creating' state too, for example, message lost/low delivery due to network congestion. So I think 'queued' state is not suitable for Cinder in general.

Changed in cinder:
milestone: icehouse-rc1 → none
ugvddm (271025598-9)
Changed in cinder:
assignee: ugvddm (271025598-9) → nobody
Ivan Kolodyazhny (e0ne)
Changed in cinder:
assignee: nobody → Ivan Kolodyazhny (e0ne)
Revision history for this message
Sean McGinnis (sean-mcginnis) wrote : Cleanup

Closing stale bug. If this is still an issue please reopen.

Changed in cinder:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.