Volume deletion is never completed if volumes service is restarted

Bug #1011150 reported by David Naori
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
John Griffith
OpenStack Compute (nova)
Fix Released
High
John Griffith

Bug Description

Volume stuck in Deleting status forever when deleting a volume and restarting nova-volume service while dd is running

[root@camel-nova nova]# nova volume-create --display_name stuck_in_deleting 10
[root@camel-nova nova]# nova volume-list
+----+-----------+-------------------+------+-------------+--------------------------------------+
| ID | Status | Display Name | Size | Volume Type | Attached to |
+----+-----------+-------------------+------+-------------+--------------------------------------+
| 13 | available | stuck_in_deleting | 10 | None | |
+----+-----------+-------------------+------+-------------+--------------------------------------+
[root@camel-nova nova]# nova volume-delete 13

2012-06-10 15:36:49 DEBUG nova.utils [req-e0a54dc4-7217-4994-aef6-d7b95cff6c76 2f4edfa99cab42de92eacda360043116 7eed65dd55474b9e94cd412d9f66b406] Running cmd (subprocess): sudo nova-rootwrap dd if=/dev/zero of=/dev/mapper/nova--volumes-volume--0000000d count=10240 bs=1M from (pid=21997) execute /usr/lib/python2.6/site-packages/nova/utils.py:220

[root@camel-nova nova]# /etc/init.d/openstack-nova-volume restart

-Verify dd finished
[root@camel-nova nova]# ps -ww `pgrep dd`
  PID TTY STAT TIME COMMAND
    2 ? S 0:00 [kthreadd]
 4167 ? S 0:00 hald-addon-input: Listening on /dev/input/event2 /dev/input/event0
 4178 ? S 0:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
11135 ? Ssl 9:48 /usr/sbin/qpidd --data-dir /var/lib/qpidd --daemon

[root@camel-nova nova]# nova volume-list
+----+-----------+-------------------+------+-------------+--------------------------------------+
| ID | Status | Display Name | Size | Volume Type | Attached to |
+----+-----------+-------------------+------+-------------+--------------------------------------+
| 13 | deleting | stuck_in_deleting | 10 | None | |
+----+-----------+-------------------+------+-------------+--------------------------------------+

[root@camel-nova nova]# nova volume-delete 13
ERROR: Invalid volume: Volume status must be available or error (HTTP 400)

Actual result:
Stuck forever in deleting status and cannot be deleted.

Tags: volume
Revision history for this message
LinuxMalaysia (linuxmalaysia) wrote :

Hi

Deleting Nova Volume will only produce status volume still Deleting. Its still running like forever.

Ubuntu version : Ubuntu Server 12.04 LTS

admin@controller:~$ uname -a
Linux controller 3.2.0-24-generic #39-Ubuntu SMP Mon May 21 16:52:17 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

mysql> select id,created_at, size, instance_id, status, attach_status, display_name from volumes;
+----+---------------------+------+-------------+-----------+---------------+------------------------+
| id | created_at | size | instance_id | status | attach_status | display_name |
+----+---------------------+------+-------------+-----------+---------------+------------------------+
| 1 | 2012-06-10 03:45:02 | 100 | NULL | deleting | detached | logs-data |
| 2 | 2012-06-10 06:43:31 | 5 | NULL | deleting | detached | Ujian1 |
| 3 | 2012-06-11 03:36:28 | 100 | NULL | deleting | detached | logs-server |
| 4 | 2012-06-11 03:48:01 | 100 | NULL | available | detached | logs-management-server |
+----+---------------------+------+-------------+-----------+---------------+------------------------+
4 rows in set (0.00 sec)

mysql> quit
Bye

root@controller:~# euca-describe-volumes
VOLUME vol-00000003 100 nova deleting 2012-06-11T03:36:28.000Z
VOLUME vol-00000004 100 nova available 2012-06-11T03:48:01.000Z

root@controller:~# euca-delete-volume vol-00000003
EC2APIError: Delete Failed

root@controller:~# euca-describe-volumes
VOLUME vol-00000003 100 nova deleting 2012-06-11T03:36:28.000Z
VOLUME vol-00000004 100 nova available 2012-06-11T03:48:01.000Z

root@controller:~# euca-delete-volume vol-00000004
VOLUME vol-00000004

root@controller:~# euca-describe-volumes
VOLUME vol-00000003 100 nova deleting 2012-06-11T03:36:28.000Z
VOLUME vol-00000004 100 nova deleting 2012-06-11T03:48:01.000Z

no longer affects: ubuntu
Revision history for this message
LinuxMalaysia (linuxmalaysia) wrote :

This bug may related to this bug (packing issue)

https://bugs.launchpad.net/nova/+bug/795428

Refering to

http://docs.openstack.org/diablo/openstack-compute/admin/content/managing-volumes.html

For Ubuntu distros, the nova-volumes component will not properly work (regarding the part which deals with volumes deletion) without a small fix. In dorder to fix that, do the following :

sudo visudo

Then add an entry for the nova user (here is the default sudoers file with our added nova user) :

nova ALL = (root) NOPASSWD: /bin/dd

That will allow the nova user to run the "dd" command (which empties a volume before its deletion).

Revision history for this message
Mark McLoughlin (markmc) wrote :

In delete_volume() we do:

        self._copy_volume('/dev/zero', self.local_path(volume), size_in_g)
        self._try_execute('lvremove', '-f', "%s/%s" %
                          (FLAGS.volume_group,
                           self._escape_snapshot(volume['name'])),
                          run_as_root=True)

however, if the service shuts down before this completes, we have no logic to re-start the operation when the service re-starts

We probably need to hook a finish_deleting_volumes() thing into VolumeManager's init_host()

Applies to Cinder too

Changed in nova:
status: New → Confirmed
importance: Undecided → High
tags: added: volume
summary: - [nova-volume] Volume stuck in Deleting status forever when deleting a
- volume and restarting nova-volume service
+ Volume deletion is never completed if volumes service is restarted
Changed in cinder:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
John Griffith (john-griffith) wrote :

This will not be a problem once https://review.openstack.org/#/c/12192/ lands. However, we'll have the same sort of potential if the service is restarted during the dd on create (we just move the problem).

I'll look into getting this fixed for RC1 in Cinder and Nova.

Revision history for this message
Mark McLoughlin (markmc) wrote :

You sure https://review.openstack.org/12192 is the right one? Doesn't seem related to me

Revision history for this message
John Griffith (john-griffith) wrote :

Sorry... you're correct Mark, that's the wrong review. I was referring to: https://review.openstack.org/#/c/12659/

Revision history for this message
John Griffith (john-griffith) wrote :

We've reverted the zero on create change, so my statements regarding where this needs to occur can be ignored.

Changed in cinder:
assignee: nobody → John Griffith (john-griffith)
Changed in nova:
assignee: nobody → John Griffith (john-griffith)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/12955

Changed in cinder:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/12965

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/12955
Committed: http://github.com/openstack/cinder/commit/2aa501560384205b0fd3891ed5dd2a3864afcdc2
Submitter: Jenkins
Branch: master

commit 2aa501560384205b0fd3891ed5dd2a3864afcdc2
Author: John Griffith <email address hidden>
Date: Thu Sep 13 10:53:52 2012 -0600

    Add a resume delete on volume manager startup

      Currently if for some reason the volume service was stopped
      during the zero out operation of a volume delete there was
      no way to get the volume removed from the system (it would
      be present in deleting status forever).

      This change adds a simple check of volumes in the DB with status
      of deleting, and if any are found it restarts the delete process
      on them.

      addresses bug #1011150

    Change-Id: Id4c4a3bc61f95245ebc6658234b4b88029956562

Changed in cinder:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/12965
Committed: http://github.com/openstack/nova/commit/b7f140351acf63c0054fc772edff2328fac3fcfb
Submitter: Jenkins
Branch: master

commit b7f140351acf63c0054fc772edff2328fac3fcfb
Author: John Griffith <email address hidden>
Date: Thu Sep 13 11:44:51 2012 -0600

    Add a resume delete on volume manager startup

      Currently if for some reason the volume service was stopped
      during the zero out operation of a volume delete there was
      no way to get the volume removed from the system (it would
      be present in deleting status forever).

      This change adds a simple check of volumes in the DB with status
      of deleting, and if any are found it restarts the delete process
      on them.

      addresses bug #1011150

    Change-Id: I6aa26e9eaa94da4b620f01160931cbfcad9dadf7

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
milestone: none → folsom-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: none → folsom-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: folsom-rc1 → 2012.2
Thierry Carrez (ttx)
Changed in nova:
milestone: folsom-rc1 → 2012.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.