Snapshots can fail silently

Bug #1003199 reported by Sam Morrison
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Glance
Triaged
Wishlist
Unassigned
OpenStack Dashboard (Horizon)
Invalid
Wishlist
Gabriel Hurley

Bug Description

If a snapshot fails when it is being created it will do so silently and will disappear from the snapshot list.

Example:

Creating a snap shot the status will go to queued then to saving. If saving fails eg swift backend issue in glance etc. then the snapshot disappears from the dashboard completely with no error message.

Not sure exactly how to fix it as it involves nova and glance too.
On the nova-compute node that the instance is running on that is being snapshoted and error is caught eg.

2012-05-22 05:14:31 ERROR nova.rpc.amqp [req-aa82909d-8fa2-4fae-9443-3db1411f9897 a4a0066852fe4073b65818c883b8625a 2e91ae6fe1334a8480cbb391f376db15] Exception during message handling
2012-05-22 05:14:31 TRACE nova.rpc.amqp Traceback (most recent call last):
2012-05-22 05:14:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 252, in _process_data
2012-05-22 05:14:31 TRACE nova.rpc.amqp rval = node_func(context=ctxt, **node_args)
2012-05-22 05:14:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
2012-05-22 05:14:31 TRACE nova.rpc.amqp return f(*args, **kw)
2012-05-22 05:14:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 177, in decorated_function
2012-05-22 05:14:31 TRACE nova.rpc.amqp sys.exc_info())
2012-05-22 05:14:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2012-05-22 05:14:31 TRACE nova.rpc.amqp self.gen.next()
2012-05-22 05:14:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 171, in decorated_function
2012-05-22 05:14:31 TRACE nova.rpc.amqp return function(self, context, instance_uuid, *args, **kwargs)
2012-05-22 05:14:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 946, in snapshot_instance
2012-05-22 05:14:31 TRACE nova.rpc.amqp self.driver.snapshot(context, instance_ref, image_id)
2012-05-22 05:14:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
2012-05-22 05:14:31 TRACE nova.rpc.amqp return f(*args, **kw)
2012-05-22 05:14:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/connection.py", line 711, in snapshot
2012-05-22 05:14:31 TRACE nova.rpc.amqp image_file)
2012-05-22 05:14:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/image/glance.py", line 306, in update
2012-05-22 05:14:31 TRACE nova.rpc.amqp _reraise_translated_image_exception(image_id)
2012-05-22 05:14:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/image/glance.py", line 304, in update
2012-05-22 05:14:31 TRACE nova.rpc.amqp image_meta = client.update_image(image_id, image_meta, data)
2012-05-22 05:14:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/glance/client.py", line 195, in update_image
2012-05-22 05:14:31 TRACE nova.rpc.amqp res = self.do_request("PUT", "/images/%s" % image_id, body, headers)
2012-05-22 05:14:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/glance/common/client.py", line 58, in wrapped
2012-05-22 05:14:31 TRACE nova.rpc.amqp return func(self, *args, **kwargs)
2012-05-22 05:14:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/glance/common/client.py", line 420, in do_request
2012-05-22 05:14:31 TRACE nova.rpc.amqp headers=headers)
2012-05-22 05:14:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/glance/common/client.py", line 75, in wrapped
2012-05-22 05:14:31 TRACE nova.rpc.amqp return func(self, method, url, body, headers)
2012-05-22 05:14:31 TRACE nova.rpc.amqp File "/usr/lib/python2.7/dist-packages/glance/common/client.py", line 542, in _do_request
2012-05-22 05:14:31 TRACE nova.rpc.amqp raise exception.Invalid(res.read())
2012-05-22 05:14:31 TRACE nova.rpc.amqp Invalid: Data supplied was not valid.
2012-05-22 05:14:31 TRACE nova.rpc.amqp Details: 400 Bad Request
2012-05-22 05:14:31 TRACE nova.rpc.amqp
2012-05-22 05:14:31 TRACE nova.rpc.amqp The server could not comply with the request since it is either malformed or otherwise incorrect.
2012-05-22 05:14:31 TRACE nova.rpc.amqp
2012-05-22 05:14:31 TRACE nova.rpc.amqp Error uploading image: (BackendException): Failed to add object to Swift. Got error from Swift: put_object('images', '0c6c990f-8f58-4554-b2e6-c045b4c7fb46', ...) failure and no ability to reset contents for reupload.
2012-05-22 05:14:31 TRACE nova.rpc.amqp

In glance the entry for the snapshot exists but it is in status 'killed' maybe I need to lodge a bug against glance too to mark it as "failed" or something and then have the ability for a client (dashboard) to see this.

Revision history for this message
Gabriel Hurley (gabriel-hurley) wrote :

I just made a similar comment on another ticket involving volumes, but this is a case of an asynchronous call that we have very little ability to track in the dashboard. We're working on longer-term solutions to consume notifications from Nova, et. al. and push them back to the client in real time, but that's a ways off still.

If Glance did continue to return the image with a status of "failed" (and allow a delete action on it so that it didn't hang around forever) that would at least let dashboard users know what happened.

Changed in horizon:
assignee: nobody → Gabriel Hurley (gabriel-hurley)
importance: Undecided → Wishlist
status: New → Confirmed
Revision history for this message
Jay Pipes (jaypipes) wrote :

Glance *does* return the image. The status is killed, not failed.

Revision history for this message
Brian Waldon (bcwaldon) wrote :

This situation will be made much better once we develop the v3 Compute API and drop the /images endpoint. The problem now is that Nova doesn't present 'killed' images, as they are useless to its clients. I can't think of a good way to fix this bug other than waiting...

Changed in glance:
status: New → Incomplete
Revision history for this message
Eoghan Glynn (eglynn) wrote :

Could the dashboard just call directly into glance to query the snapshot image by name (as opposed to going though the nova /images endpoint) and use the filters to ensure a killed image is not excluded from the result set?

That way at least the dashboard can give some visual indication of the failed snap-shotting operation, as opposed to it just disappearing from the list.

Revision history for this message
Brian Waldon (bcwaldon) wrote :

Gabriel, any comment on Eoghan's suggestion?

Revision history for this message
Gabriel Hurley (gabriel-hurley) wrote :

I'm confused. The "image_list_detailed" and "snapshot_list_detailed" functions in horizon are wrappers around calls to glanceclient... that should be using the Glance API already. Or am I confusing this issue with something else?

Revision history for this message
Brian Waldon (bcwaldon) wrote :

Ok, I did some testing here and came up with this: 'killed' images are absolutely not returned in either of the image list commands. A 'killed' image will still be accessible directly (i.e. HEAD /images/<ID>) until it is deleted. So that definitely matches up with what you guys are reporting. Back to square one...

Brian Waldon (bcwaldon)
Changed in glance:
status: Incomplete → Triaged
importance: Undecided → Wishlist
Revision history for this message
Gabriel Hurley (gabriel-hurley) wrote :

It was a good idea! Good enough that I had to go check the code before I could say it wasn't gonna work... :-(

Perhaps adding an optional GET parameter to include killed images?

Revision history for this message
Anne Gentle (annegentle) wrote :

Bumping this again as it was reported by Chris Hodge from Univ of Oregon at OScon.

My suggestion would be to add a Compute extension that lets failed images be retrieved through the GET command.

Revision history for this message
Brian Waldon (bcwaldon) wrote :

We could make a v1.2 API release if we feel we really need to. However, I would like to stay away from expanding the v1 API if we could.

Revision history for this message
Gary W. Smith (gary-w-smith) wrote :

This bug was last updated over 5 years ago, and as there have
been many changes to both nova and horizon since then, this is
getting marked as Invalid. If the issue still exists, please
feel free to reopen it.

Changed in horizon:
status: Confirmed → Invalid
Revision history for this message
Tony Karera (tonykarera) wrote :

Hello Team,
It looks like this issue is still available.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.