Cannot recover from Error_Deleting state

Bug #1039706 reported by Yosef Berman
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
clayg
Ubuntu
Fix Released
Undecided
Unassigned

Bug Description

If you create a volume and delete the volume, dd is used to zero out the volume. If you kill the dd process, the volume status becomes Error_Deleting, and it does not recover from this state.

2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp Traceback (most recent call last):
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp File "/opt/stack/cinder/cinder/openstack/common/rpc/amqp.py", line 275, in _process_data
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp rval = self.proxy.dispatch(ctxt, version, method, **args)
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp File "/opt/stack/cinder/cinder/openstack/common/rpc/dispatcher.py", line 145, in dispatch
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp return getattr(proxyobj, method)(ctxt, **kwargs)
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp File "/opt/stack/cinder/cinder/volume/manager.py", line 196, in delete_volume
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp {'status': 'error_deleting'})
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp self.gen.next()
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp File "/opt/stack/cinder/cinder/volume/manager.py", line 185, in delete_volume
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp self.driver.delete_volume(volume_ref)
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp File "/opt/stack/cinder/cinder/volume/driver.py", line 186, in delete_volume
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp self._delete_volume(volume, volume['size'])
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp File "/opt/stack/cinder/cinder/volume/driver.py", line 138, in _delete_volume
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp self._copy_volume('/dev/zero', self.local_path(volume), size_in_g)
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp File "/opt/stack/cinder/cinder/volume/driver.py", line 123, in _copy_volume
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp run_as_root=True)
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp File "/opt/stack/cinder/cinder/utils.py", line 228, in execute
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp cmd=' '.join(cmd))
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp ProcessExecutionError: Unexpected error while running command.
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp Command: sudo /usr/local/bin/cinder-rootwrap /etc/cinder/rootwrap.conf dd if=/dev/zero of=/dev/mapper/stack--volumes-volume--684727ea--6157--439a--b762--849da8db32dd count=10240 bs=1M
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp Exit code: 241
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp Stdout: ''
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp Stderr: ''
2012-08-21 19:26:24 TRACE cinder.openstack.common.rpc.amqp

Rongze Zhu (zrzhit)
Changed in cinder:
assignee: nobody → Rongze Zhu (zrzhit)
Revision history for this message
Mike Perez (thingee) wrote :

Talked with John about this. It would be great if there was a way to request a recover, but I'm not sure how I feel about auto recovering.

Revision history for this message
Rongze Zhu (zrzhit) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/12004

Changed in cinder:
assignee: Rongze Zhu (zrzhit) → clayg (clay-gerrard)
status: New → In Progress
Revision history for this message
clayg (clay-gerrard) wrote :

Rongze, having looked at some of your updates here and the blueprint - I'm not sure the my change will fully address the issue as you'd described your desired behavior, but we seem to be thinking about the same problem (stuck in error_deleting). Feel free to comment on my change, or assign the bug back to yourself. Sorry about jumping in the middle of your work, I think it was just weird timing. I'm available to discuss on irc, or the openstack-meeting for cinder this week.

Best Regards,

-clayg

Revision history for this message
Rongze Zhu (zrzhit) wrote :

Hi clayg, It doesn't matter :) and you do good work.

Best Regards,

-Rongze

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/12004
Committed: http://github.com/openstack/cinder/commit/c191d0d10c6cb5730a4fb6540198ca3be6595b02
Submitter: Jenkins
Branch: master

commit c191d0d10c6cb5730a4fb6540198ca3be6595b02
Author: Clay Gerrard <email address hidden>
Date: Mon Aug 27 07:48:32 2012 +0000

    Add admin actions extension

    The optional os-admin-actions extension adds new wsgi_actions to the
    volumes/action resource and a new snapshots/action endpoint.

    With this extension both controllers will support an os-reset_status
    action to force a database update of a volume or snapshot that is stuck
    in a failed/incorrect status. The os-reset_status action works
    similarly to the compute api's os-reset_state action for instances.

    The os-force_delete action behaves similarly to the "cinder-manage
    volume delete" command and allows operators/admins to retry the delete
    operation after it has gone into an error_deleting status with an admin
    api call.

    The os-admin-actions extension is enabled by default, but limited to the
    admin api by the default policy.json rules. Individual admin actions
    can be disabled with policy rules as well.

    Example of os-reset_status action on a volume:

    curl http://localhost:8776/v1/${PROJECT_ID}/volumes/${VOLUME_ID}/action \
        -H 'x-auth-token: ${ADMIN_AUTH_TOKEN}' \
        -H 'content-type: application/json' \
        -d '{"os-reset_status": {"status": "error"}}'

    The new admin only api can assist deployers who encounter bugs or
    operational issues that result in failed actions.

    It can also be used by future storage backends to support async callback
    style status updates from long running actions or operations which have
    encountered an error will be retried.

    Also updates the api.openstack.wsgi.ControllerMetaclass to support
    sub-classing wsgi.Controllers that define wsgi_actions.

    Partial fix for bug #1039706

    Change-Id: I29f4b892a99108b6c24eebc3eb58033a9e01e679

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
milestone: none → folsom-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: folsom-rc1 → 2012.2
Changed in ubuntu:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.