Cinder create error reason not visible

Bug #1450861 reported by Joe D'Andrea
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Invalid
Wishlist
Unassigned
OpenStack Heat
Triaged
Medium
Joe D'Andrea

Bug Description

1. Create a stack with Cinder volumes *and* a lack of enough disk space on the cluster.
2. Stack reaches CREATE_FAILED state (as expected).
3. Use 'heat stack-show' and look for stack_status_reason:

Resource CREATE failed: ResourceInError: Went to status error due to "Unknown"

4. Expected a reason other than "Unknown" (e.g., out of disk space). However, status_reason is never set in the args for ResourceInError.

5. Look at heat engine log and find:

Traceback (most recent call last):
  File "/opt/stack/heat/heat/engine/resource.py", line 466, in _action_recorder
    yield
  File "/opt/stack/heat/heat/engine/resource.py", line 536, in _do_action
    yield self.action_handler_task(action, args=handler_args)
  File "/opt/stack/heat/heat/engine/scheduler.py", line 312, in wrapper
    step = next(subtask)
  File "/opt/stack/heat/heat/engine/resource.py", line 510, in action_handler_task
    while not check(handler_data):
  File "/opt/stack/heat/heat/engine/resources/aws/volume.py", line 139, in check_create_complete
    resource_status=vol.status)
ResourceInError: Went to status error due to "Unknown"

Note: The above traceback does not refer to the most recent kilo (volume.py has since been moved, see below). However, the source for check_create_complete() doesn't appear to have changed since Juno.

https://github.com/openstack/heat/blob/master/heat/engine/resources/aws/ec2/volume.py

By comparison, Cinder backup objects set fail_reason in the event of an error. There is no fail_reason in Cinder objects, however, leaving folks to go on a proverbial wild goose chase to discover what went wrong.

Joe D'Andrea (jdandrea)
description: updated
Joe D'Andrea (jdandrea)
Changed in heat:
assignee: nobody → Joe D'Andrea (joedandrea)
Joe D'Andrea (jdandrea)
Changed in heat:
assignee: Joe D'Andrea (joedandrea) → nobody
description: updated
Joe D'Andrea (jdandrea)
Changed in heat:
assignee: nobody → Joe D'Andrea (joedandrea)
Joe D'Andrea (jdandrea)
description: updated
Changed in heat:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Joe D'Andrea (jdandrea) wrote :

Question asked of Cinder, with x-ref back here:

https://answers.launchpad.net/ubuntu/+source/cinder/+question/266260

Revision history for this message
Joe D'Andrea (jdandrea) wrote :

Update: Per DuncanT the error reason is not visible at the moment. However, there will be a cross-project session at the Liberty Summit to discuss the best way to go about fixing this. I will look to attend that session.

Joe D'Andrea (jdandrea)
description: updated
Revision history for this message
Joe D'Andrea (jdandrea) wrote :

More info: "Part of the problem is that different providers have different requirements as to what level of detail they want to pass back to tenants. Private cloud might be fine with lots of details, whereas a public cloud might not want to pass on more than 'something went wrong, try again later'."

Squaring this particular circle has proven elusive thus far.

Joe D'Andrea (jdandrea)
description: updated
Revision history for this message
Joe D'Andrea (jdandrea) wrote :
Revision history for this message
John Griffith (john-griffith) wrote :

@Joe D'Andrea
Comment #3 is a pretty good summary of things here IMO.

BUT, I also want to point out... rather than focus so much on reporting failure issues, what about just improving things so shit doesn't fail? Or when it does it is smart enough to dynamically go somewhere else and try again?

If certain service providers have a heavy work load due to support calls for failed items, maybe they need to look at how they've deployed things, or what they used to build their cloud, OR even better making the OpenStack code (particularly Cinder) better.

Changed in cinder:
importance: Undecided → Wishlist
status: New → Confirmed
Revision history for this message
John Griffith (john-griffith) wrote :

I'll mark as confirmed for now and see if anybody ever does anything with it, but it's been one of those things that gets complained about a fair amount but nobody has any good ideas or submissions to try and fix.

Revision history for this message
Joe D'Andrea (jdandrea) wrote :

@john-griffith:

Thanks! I agree, it would be great to improve things so there aren't failures. Alas, stuff will still "fall down go boom" from time to time.

To your comment about heavy work load (if that's even the issue here, I don't know if it is), indeed, it could be the user/admin's fault, but the only way for them to remedy their gaffe is to know what went wrong. "Unknown" doesn't help.

I don't dispute that Cinder, OpensStack could be better. I imagine that will always be the case, but it doesn't obviate the need for correct and informative error reporting. No matter how good we make it, we would do well to help the user/admin know what went wrong and empower them info to help them fix it. User experience FTW!

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

This is somewhat addressed now with the implementation of the user message API for getting the response messages from async operations.

Changed in cinder:
status: Confirmed → Invalid
Rico Lin (rico-lin)
Changed in heat:
milestone: none → no-priority-tag-bugs
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.