Image stays in Saving State if any error occurs while snapshotting an instance

Bug #1064386 reported by Arathi
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Medium
Prateek Arora

Bug Description

Image stays in 'Saving' state, if any error occurs while snapshotting an instance. User will be unaware of the exception occurred.

1. Probable fix would be to delete the created image, if any exception occurred as the image will not be used.
2. Keep the image state to 'error', so that user will be aware that some error occurred.

Tags: compute
Revision history for this message
Arathi (arathi-darshanam) wrote :
Download full text (4.4 KiB)

This can be reproduced as follows:
1. Boot a VM
2. Wait till active
3. Snapshot the VM
4. Before snapshot comes to active, delete the vm

2012-10-09 16:41:50 ERROR nova.openstack.common.rpc.amqp [-] Exception during message handling
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 275, in _process_data
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp rval = self.proxy.dispatch(ctxt, version, method, **args)
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 145, in dispatch
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp return getattr(proxyobj, method)(ctxt, **kwargs)
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 117, in wrapped
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp temp_level, payload)
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp self.gen.next()
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 92, in wrapped
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp return f(*args, **kw)
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 210, in decorated_function
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp instance_uuid, e, sys.exc_info())
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp self.gen.next()
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 199, in decorated_function
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1140, in snapshot_instance
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp self.driver.snapshot(context, instance, image_id)
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 117, in wrapped
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp temp_level, payload)
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp self.gen.next()
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 92, in wrapped
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp return f(*args, **kw)
2012-10-09 16:41:50 TRACE nova.openstack.common.rpc.amqp Fi...

Read more...

Revision history for this message
Dan Prince (dan-prince) wrote :

Hi Arathi,

Couple of things here.:

In the case you describe above... did the instance being snapshotted go into an ERROR state? If so I think Nova is doing the correctly thing here.

I do agree that we could do better about reporting this type of errors back to the user. Ideally we'd keep track of more fine grained error messages so that the user might better understand what the initial cause of the error was. There are a couple Nova features dealing with this that might get implemented in the future. In the scope of this ticket however I think Nova is mostly doing things correctly.

---

From a Glance prospective there is actually an image scrubber feature that already exists and is meant to help cleanup images in an itermittent state. If we need to do anything to cleanup images stuck in a 'SAVING' state I think perhaps the right place for that code to go is there. Have a look at the Glance API's scrub_time, delayed_delete, and the associated glance-scrubber bin file.

Changed in nova:
status: New → Incomplete
Revision history for this message
Arathi (arathi-darshanam) wrote :

Hi Dan,

Currently, nova is not making snapshot into 'ERROR' state if any error occurs, snapshot stays in 'SAVING' state.

In your comment above, you mentioned that there are a couple of Nova features dealing with this implementation in the future. Could you please brief me on that or point me to the documentation of those features, so that i can decide whether a fix should be proposed for this ticket.

Thanks

Thierry Carrez (ttx)
Changed in nova:
status: Incomplete → New
Revision history for this message
Vish Ishaya (vishvananda) wrote :

Yes failed snapshots should be moved into error and the intermediate snap deleted if this is not occurring.

Changed in nova:
status: New → Triaged
importance: Undecided → Medium
milestone: none → grizzly-rc1
Revision history for this message
Russell Bryant (russellb) wrote :

Since this is Medium priority and unassigned, I'm moving it off of the grizzly-rc1 blocker list and into the "nice to have for grizzly" list (tagged with grizzly-rc-potential).

tags: added: grizzly-rc-potential
Changed in nova:
milestone: grizzly-rc1 → none
Revision history for this message
Tiantian Gao (gtt116) wrote :

I found the status of image should be one of 'active', 'saving', 'queued', 'killed', 'pending_delete', 'deleted' in glance. And if some exceptions in transferring data when uploading image, the image will be in status 'killed'.

So IMHO, if snapshoting failed, the image be 'killed' will be OK.

BTW, in general glance will not show killed images. If we decide to make broken image to 'killed', we should firstly show killed images in glance.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/25236

Changed in nova:
assignee: nobody → TianTian Gao (gtt116)
status: Triaged → In Progress
Revision history for this message
Thierry Carrez (ttx) wrote :

This is very close to bug 991941 and bug 1143659... could you make sure your fix addresses those cases as well ? Also check proposed fix at https://review.openstack.org/24820 is compatible with yours...

Revision history for this message
Tiantian Gao (gtt116) wrote :

bug 9911941 is caused by stop nova-compute when do snapshotting.

this bug(1064386) and bug 1143659 are similar. Both of them is some errors occur when do libvirt.snapshot(). But the former is the instance disappear when do snapshotting, the latter is more general exception happened when do snapshotting. IMHO, the former is special case of the latter.

The two (1064386, 1143659) will conflict, so we should merge one, the manual rebase the other on it.

Thierry Carrez (ttx)
tags: added: grizzly-backport-potential
removed: grizzly-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/26945

Changed in nova:
assignee: TianTian Gao (gtt116) → Yang Yu (yuyangbj)
Alan Pevec (apevec)
tags: removed: grizzly-backport-potential
Joe Gordon (jogo)
Changed in nova:
assignee: Yang Yu (yuyangbj) → nobody
status: In Progress → New
Tracy Jones (tjones-i)
tags: added: compute
Revision history for this message
melanie witt (melwitt) wrote :

Moving back to Triaged since it was Triaged before.

Changed in nova:
status: New → Triaged
Revision history for this message
Jaroslav Henner (jhenner) wrote :

It seem this happens on RHOS as well. It breaks the Tempest test_list_image_filters time to time. I think I have reproduced the problem by having a VM foo and then doing two snapshots simultaneously:

nova image-create foo foo-img --show& nova image-create foo foo-img2 --show

check the reproducer attachement for more info

Prateek Arora (parora)
Changed in nova:
assignee: nobody → Prateek Arora (parora)
Prateek Arora (parora)
Changed in nova:
assignee: Prateek Arora (parora) → nobody
assignee: nobody → Prateek Arora (parora)
Revision history for this message
Prateek Arora (parora) wrote :

The issue seems to be fixed

nova image-create test test-img & nova delete test
[1] 16828
ERROR (Conflict): Cannot 'createImage' instance 772e51a5-be40-4cf1-ad74-06850b2154a6 while it is in task_state deleting (HTTP 409) (Request-ID: req-7d95863d-cb52-4f45-9e2d-889d94e570f4)
Request to delete server test has been accepted.
[1]+ Exit 1 nova image-create test test-img

This bug shuld be closed as such.

Revision history for this message
Prateek Arora (parora) wrote :

While on the same setup, i see this

[stack@localhost devstack]$ nova image-create test test-img & nova delete test

Request to delete server test has been accepted.
ERROR (ClientException): Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'nova.exception.UnexpectedDeletingTaskStateError'> (HTTP 500) (Request-ID: req-6d9fb33b-26af-429f-91d6-7e222ff88cf8)

Can you please confirm whether you still see the issue ?

Revision history for this message
Prateek Arora (parora) wrote :
Download full text (7.2 KiB)

While the bug is reproducible the stack trace is different from what the original bug mentions

nova image-create test test-img & nova delete test
[1] 2506
ERROR (ClientException): Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<class 'nova.exception.UnexpectedDeletingTaskStateError'> (HTTP 500) (Request-ID: req-f699c746-0b29-4d84-b258-c8233d5b7a42)
Request to delete server test has been accepted.
[1]+ Exit 1 nova image-create test test-img

In the nova api logs i can see the following stacktrace

'c59f52ce-b51f-433e-81b0-099e3b65b0a3', '89ba4861-99e2-4376-8015-3f5ac02538d0']^[[00m ^[[00;33mfrom (pid=12399) reserve /opt/stack/nova/nova/quota.py:1345^[[00m
2016-03-09 02:04:25.304 ^[[01;31mERROR nova.api.openstack.extensions [^[[01;36mreq-f699c746-0b29-4d84-b258-c8233d5b7a42 ^[[00;36madmin admin^[[01;31m] ^[[01;35m^[[01;31mUnexpected exception in API method^[[00m
^[[01;31m2016-03-09 02:04:25.304 TRACE nova.api.openstack.extensions ^[[01;35m^[[00mTraceback (most recent call last):
^[[01;31m2016-03-09 02:04:25.304 TRACE nova.api.openstack.extensions ^[[01;35m^[[00m File "/opt/stack/nova/nova/api/openstack/extensions.py", line 478, in wrapped
^[[01;31m2016-03-09 02:04:25.304 TRACE nova.api.openstack.extensions ^[[01;35m^[[00m return f(*args, **kwargs)
^[[01;31m2016-03-09 02:04:25.304 TRACE nova.api.openstack.extensions ^[[01;35m^[[00m File "/opt/stack/nova/nova/api/openstack/common.py", line 391, in inner
^[[01;31m2016-03-09 02:04:25.304 TRACE nova.api.openstack.extensions ^[[01;35m^[[00m return f(*args, **kwargs)
^[[01;31m2016-03-09 02:04:25.304 TRACE nova.api.openstack.extensions ^[[01;35m^[[00m File "/opt/stack/nova/nova/api/validation/__init__.py", line 73, in wrapper
^[[01;31m2016-03-09 02:04:25.304 TRACE nova.api.openstack.extensions ^[[01;35m^[[00m return func(*args, **kwargs)
^[[01;31m2016-03-09 02:04:25.304 TRACE nova.api.openstack.extensions ^[[01;35m^[[00m File "/opt/stack/nova/nova/api/validation/__init__.py", line 73, in wrapper
^[[01;31m2016-03-09 02:04:25.304 TRACE nova.api.openstack.extensions ^[[01;35m^[[00m return func(*args, **kwargs)
^[[01;31m2016-03-09 02:04:25.304 TRACE nova.api.openstack.extensions ^[[01;35m^[[00m File "/opt/stack/nova/nova/api/openstack/compute/servers.py", line 1113, in _action_create_image
^[[01;31m2016-03-09 02:04:25.304 TRACE nova.api.openstack.extensions ^[[01;35m^[[00m extra_properties=metadata)
^[[01;31m2016-03-09 02:04:25.304 TRACE nova.api.openstack.extensions ^[[01;35m^[[00m File "/opt/stack/nova/nova/compute/api.py", line 169, in wrapped
^[[01;31m2016-03-09 02:04:25.304 TRACE nova.api.openstack.extensions ^[[01;35m^[[00m return func(self, context, target, *args, **kwargs)
^[[01;31m2016-03-09 02:04:25.304 TRACE nova.api.openstack.extensions ^[[01;35m^[[00m File "/opt/stack/nova/nova/compute/api.py", line 186, in _wrapped
^[[01;31m2016-03-09 02:04:25.304 TRACE nova.api.openstack.extensions ^[[01;35m^[[00m return fn(self, context, instance, *args, **kwargs)
^[[01;31m2016-03-09 02:04:25.304 TRACE nova.api.openstack.extensions ^[[01;35m^[[00m File "/opt/stack/nova/nova/compute/api.py", line 139, in inner
^[[01;31m2016-03-09 02:04:25...

Read more...

Prateek Arora (parora)
Changed in nova:
status: Triaged → In Progress
Revision history for this message
Prateek Arora (parora) wrote :

The following bug tries fixing the above problem in nova api while the case is already handled in manager

https://review.openstack.org/#/c/294513/

Changed in nova:
status: In Progress → Fix Committed
status: Fix Committed → In Progress
Revision history for this message
Augustina Ragwitz (auggy) wrote :

Hi Prateek, I saw that no activity has taken place on your submitted change. It looks like you submitted it pretty close to the Summit so the team may have been busy and it got overlooked. If you would like reviews on your fix, please bring it to the attention of the team in the #openstack-nova irc channel. I will bring it to the attention to the bugs team during our meeting next week as well.

Revision history for this message
Prateek Arora (parora) wrote :

Thanks Augustina, I would do that in the channel as well, and maybe you can get in some reviews as well and perhaps review yourself too.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.