force-delete of cinder volume errors with Can\'t remove open logical volume

Bug #1191960 reported by Hrushikesh
74
This bug affects 11 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
John Griffith
OpenStack Compute (nova)
Fix Released
Medium
Vishakha Agarwal

Bug Description

As a consequence of Bug #1191431, few volumes were left in error_deleting state. Few of the cleared off by issuing cinder delete <uuid>, however few of the errored out.

1.When you try deleting such volume from Horizon > volume > check box > Delete Volumes
Error: You do not have permission to delete volume: <Volume: 078cd44b-7b39-4867-a1e9-78bb758ae0a7>

2.When you try using 'Force Delete Volume' option against the suspected volume. The request gets submitted successfully, however you will see following error messages in /var/log/cinder/cinder-volume on the controller node:
ProcessExecutionError: Unexpected error while running command. Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvremove -f cinder-volumes/volume-078cd44b-7b39-4867-a1e9-78bb758ae0a7
Exit code: 5
Stdout: ''Stderr: ' Can\'t remove open logical volume "volume-078cd44b-7b39-4867-a1e9-78bb758ae0a7"\n'

3.When you try delete manually through command line, you get the following error:
lvremove -f /dev/cinder-volumes/volume-078cd44b-7b39-4867-a1e9-78bb758ae0a7 Can't remove open logical volume "volume-078cd44b-7b39-4867-a1e9-78bb758ae0a7"

Workaround
1.The volume is left in in-use state by tgtd service that causes cinder delete and force-delete not to work. Stop the service that is using it:
service tgt stop
lvremove /dev/cinder-volumes/volume-078cd44b-7b39-4867-a1e9-78bb758ae0a7

2.Now, remove it through cinder-api or cli
service tgt start
cinder force-delete 078cd44b-7b39-4867-a1e9-78bb758ae0a7

Note: lsof /dev/cinder-volumes/volume-078cd44b-7b39-4867-a1e9-78bb758ae0a7 reported tgtd using it.

Expected behavior: force-delete option must address such anomalies.

Revision history for this message
Avishay Traeger (avishay-il) wrote :

Can you please provide the full log and version? Force delete should take care of tgtd - maybe there is a clue as to why it didn't. Thanks!

Revision history for this message
Hrushikesh (hrushikesh-gangur) wrote :

Please see the uploaded cinder-volume.log and let me know if you nee any other logs.

root@controlnode:/var/log# dpkg -s cinder-volume
Package: cinder-volume
Status: install ok installed
Priority: extra
Section: net
Installed-Size: 101
Maintainer: Chuck Short <email address hidden>
Architecture: all
Source: cinder
Version: 1:2013.1-0ubuntu2~cloud0

Revision history for this message
John Griffith (john-griffith) wrote :

Negative, we can NOT restart the tgtd service in the code on force delete as you suggest. That introduces an entire new set of issues for other volumes that may currently be in use.

Revision history for this message
Hrushikesh (hrushikesh-gangur) wrote :

Am not suggesting restarting tgt. Instead, an analysis on why force-delete did not work and a code-fix in that logic.

Revision history for this message
John Griffith (john-griffith) wrote :

We've seen similar things like this in the past and had a 'dmsetup remove' call, but that was suspect and had been removed along the way. I think it's worth putting back in as an attempt to help alleviate the sort of issue you're seeing but ignore if it throws.

Changed in cinder:
importance: Undecided → Low
assignee: nobody → John Griffith (john-griffith)
status: New → Triaged
tags: added: grizzly-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/36594

Changed in cinder:
status: Triaged → In Progress
Revision history for this message
Joe Gordon (jogo) wrote :

We are seeing this a lot logstash.openstack.org query: "Unable to deactivate logical volume" AND @fields.filename:"logs/screen-c-vol.txt"

Revision history for this message
Joe Gordon (jogo) wrote :

better query:

@message:"Exit code: 5" AND @message:" sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvremove -f" AND @fields.filename:"logs/screen-c-vol.txt" AND @fields.build_status:"FAILURE"

Revision history for this message
Joe Gordon (jogo) wrote :

Logstash shows that this was hit 63 times this week.

Revision history for this message
John Griffith (john-griffith) wrote :

thanks for updating this Joe, by the way, any chance for some context regarding 63 times this week out of how many lvremove's?

Changed in cinder:
importance: Low → High
milestone: none → havana-rc1
Revision history for this message
John Griffith (john-griffith) wrote :

I've finally been able to reproduce this on a machine. What I've found is that a udevadm settle and then a retry seems to allow removal of the LV in all cases so far.

I also think that part of the issue that leads to this is infact the create-target failures that we've been seeing. Once I rolled that change in I don't seem to be able to reproduce this any longer. I'm going to submit the udevsettle and retry as a fall-back as it definitely addresses issues that I've seen in my setup. We'll have to keep an eye on logstash to make sure this infact hits it.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/47632

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/47632
Committed: http://github.com/openstack/cinder/commit/de257d1a2b91e8060ff3532ced25cb2a67b14267
Submitter: Jenkins
Branch: master

commit de257d1a2b91e8060ff3532ced25cb2a67b14267
Author: John Griffith <email address hidden>
Date: Fri Sep 20 20:55:46 2013 +0000

    Fix issues with failed lvremove

    There are some race conditions that
    can cause problems with lvremove commands. In
    most cases these seem to recover nicely just
    with a simple retry of the lvremove. Adding
    a udev settle seems to elimate the rest of them.

    This is a difficult issue to reproduce, and there's
    a suspiscion that it relates to failed target
    creeates.

    The patch adds a catch on the lvremove failure,
    followed by a udevadm settle and a retry of the
    lvremove. With the setup that I've been able
    to reproduce this issue these changes have eliminated
    any force delete failures.

    The other option that had been proposed was using dmsetup remove
    but there are concerns that this may cause problems.

    Change-Id: I2a2b0d0f4fefd0daf9424ab96aaf87ba53ebc171
    Closes-Bug: #1191960

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: havana-rc1 → 2013.2
Alan Pevec (apevec)
tags: removed: grizzly-backport-potential
Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/240611

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/240611
Reason: Long-term direction should be to move the common lvm code into os-brick and re-use from there, like here:

https://review.openstack.org/#/c/260739/4

Changed in nova:
assignee: Matt Riedemann (mriedem) → Thomas Maddox (thomas-maddox)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/240611
Reason: thomasem said this didn't seem to help, so dropping it.

Changed in nova:
status: In Progress → Confirmed
assignee: Thomas Maddox (thomas-maddox) → nobody
Changed in nova:
assignee: nobody → Vishakha Agarwal (vishakha.agarwal)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/565703

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/565703
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8b8c5da59efb087295b676d4261f84dfadf62503
Submitter: Zuul
Branch: master

commit 8b8c5da59efb087295b676d4261f84dfadf62503
Author: Vishakha Agarwal <email address hidden>
Date: Wed May 2 16:42:58 2018 +0530

    Re-using the code of os brick cinder

    To avoid the errors during force delete of logical volume,
    cinder library os brick is already using udevadm settle for it.
    Calling the same library of cinder in nova too.

    Change-Id: I092afdd0409ab27187cf74cd1514e9e0c550d52c
    Closes-Bug: #1191960

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b3

This issue was fixed in the openstack/nova 18.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.