OpenStack Compute (nova)

Nova doesn't allow cleanup of volumes stuck in 'attaching' or 'detaching' status

Bug #1449221 reported by Scott DAngelo on 2015-04-27

This bug affects 15 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Confirmed	High	Unassigned

Bug Description

Cinder volumes can get stuck in a state of 'attaching' or 'detaching' and they need to be cleaned up or they will be incapable of being used. This is not possible at the moment as Nova doesn't allow any actions on volumes in 'ing' status.
For detaching a volume nova should do 3 things:
1 Detach the volume from the instance
2 inform cinder about the detach
3 delete the record in the nova BDM table

At the moment if 1 fails we do a roll back, if 2 fails we are stuck with a volume in detaching status. Nova shouldn't stop to complete the detach from its side if it gets some errors from cinder.
What we can do is to modify the nova code in order to manage a potential error coming from cinder, log it and go ahead with the deletion of the BDM record, then an operator can try to fix the cinder side calling the appropriate cinder call, like force-delete.
Basically, if there is a BDM record in nova, we allow the user to call the detach volume as many time as he/she likes.
Nova will delete the BDM record only if the call to cinder "terminate_connection" will success.

This bug has been discussed in a spec: https://review.openstack.org/84048
where we agreed that a spec is not required but we consider this change as a bug fix.

See original description

Tags:

Scott DAngelo (scott-dangelo) on 2015-04-27

Changed in nova:
assignee:	nobody → Scott DAngelo (scott-dangelo)

Scott DAngelo (scott-dangelo) on 2015-04-28

Changed in nova:
assignee:	Scott DAngelo (scott-dangelo) → nobody

Andrea Rosa (andrea-rosa-m) on 2015-04-28

Changed in nova:
assignee:	nobody → Andrea Rosa (andrea-rosa-m)

Revision history for this message

Andrea Rosa (andrea-rosa-m) wrote on 2015-04-30:

I am wonder if this should be marked as "Wishlist", what do you think?

Revision history for this message

Andrea Rosa (andrea-rosa-m) wrote on 2015-05-01:

To reproduce the issue:
- nova boot --image <image_id> --flavor <flavor_id> test
- cinder create 1
- nova volume-attach <server_id> <volume_id> /dev/vdb
- kill/stop cinder volume
- nova volume-detach <server_id> <volume_id>
- restart cinder volume

At this point the volume is reported in "detaching" status and it is no possible to recover from this situation.
If you try to delete the volume you get:

Delete for volume <volume_id> failed: Volume <volume_id> is still attached, detach volume first. (HTTP 400)

and it fails the detach as well:

ERROR (BadRequest): Invalid input received: Invalid volume: Unable to detach volume. Volume status must be 'in-use' and attach_status must be 'attached' to detach. Currently: status: 'detaching', attach_status: 'attached.' (HTTP 400)

Changed in nova:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-05-20: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/184537

Andrea Rosa (andrea-rosa-m) on 2015-07-17

summary:

- Nova volume-detach lacks '--force' command for cleanup
+ Nova doesn't allow to cleanup volumes stuck in 'attaching' or
+ 'detaching' status

Andrea Rosa (andrea-rosa-m) on 2015-07-17

description:

updated

Revision history for this message

wanghao (wanghao749) wrote on 2015-07-17: Re: Nova doesn't allow to cleanup volumes stuck in 'attaching' or 'detaching' status

About this "Nova will delete the BDM record only if the call to cinder "terminate_connection" will success".

There is other option IMO: Nova will clean up BDM no matter terminated_connection exception, and then admin/user call force-detach API in cinder side to ensure not exported volume and detach it.

What's your guys suggestion about this option?

Revision history for this message

Scott DAngelo (scott-dangelo) wrote on 2015-07-17:

wanghao, I think the problem with ignoring the success of cinder's terminate_connection was pointed out by Walt_Boring:

" If Nova only calls libvirt volume's disconnect_volume, without Cinder's terminate_connection being called, then volumes may show back up on the nova host. Specifically for iSCSI volumes.

If an iSCSI session from the compute host to the storage backend still exists (because other volumes are connected), then the volume you just removed will show back up on the next scsi bus rescan."

So, the user should not think that the detach succeeded until the terminate_connection succeeds. Since terminate_connection is asynchronous, the Nova volume-detach will have to verify this somehow.

Revision history for this message

Andrea Rosa (andrea-rosa-m) wrote on 2015-07-20:

wanghao the problem is what Scott said in comment #5.

@scott you raised an interesting point about the fact that terminate_connection is async.
At the moment Nova considers the call succeeded if can send the requests without any errors, but it doesn't check if the connection has been actually terminated on the cinder side.
Is there a cinder call we can make to get the status of the connection from cinder?
If so we could check the status in a small fixedInternalLoop before deleting the BDM device, even if I do not like this solution it seems a bit hacky.
Any other ideas?

John Garbutt (johngarbutt) on 2015-07-24

Changed in nova:
importance:	Undecided → High
tags:	added: volumes

OpenStack Infra (hudson-openstack) on 2015-08-06

Changed in nova:
assignee:	Andrea Rosa (andrea-rosa-m) → John Garbutt (johngarbutt)

John Garbutt (johngarbutt) on 2015-08-06

Changed in nova:
assignee:	John Garbutt (johngarbutt) → Andrea Rosa (andrea-rosa-m)

OpenStack Infra (hudson-openstack) on 2015-08-13

Changed in nova:
assignee:	Andrea Rosa (andrea-rosa-m) → wanghao (wanghao749)

OpenStack Infra (hudson-openstack) on 2015-09-07

Changed in nova:
assignee:	wanghao (wanghao749) → Andrea Rosa (andrea-rosa-m)

Revision history for this message

Scott DAngelo (scott-dangelo) wrote on 2015-10-05:

Proposed fix:
https://review.openstack.org/#/c/184537/9

I think that the proposed fix should be automatically linked to this bug, but was not for some reason.

summary:

- Nova doesn't allow to cleanup volumes stuck in 'attaching' or
+ Nova doesn't allow cleanup of volumes stuck in 'attaching' or
'detaching' status

Davanum Srinivas (DIMS) (dims-v) on 2016-03-06

Changed in nova:
status:	In Progress → Confirmed
assignee:	Andrea Rosa (andrea-rosa-m) → nobody

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-07-19: Change abandoned on nova (master)

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/184537
Reason: This code hasn't been updated in a long time, and is in merge conflict. I am going to abandon this review, but feel free to restore it if you're still working on this.

Revision history for this message

Tang Chen (tangchen) wrote on 2016-07-22:

Hi,

Is anyone still working on this bug ? And do we still need this patch ? If we need, I'd like to go on with it if you don't mind.

Thanks.

Tang Chen (tangchen) on 2016-07-22

Changed in nova:
assignee:	nobody → Tang Chen (tangchen)

Revision history for this message

srividyaketharaju (srividya) wrote on 2017-01-03:

#10

Hi,

Is anyone still working on this bug ? And do we still need this patch ? If we need, I'd like to go on with it if you don't mind.

Thanks.

Revision history for this message

Nazeema Begum (nazeema123) wrote on 2017-01-09:

#11

Hi,

Is anyone still working on this bug ? And do we still need this patch ? If we need, I'd like to go on with it if you don't mind.

Thanks.

Nazeema Begum (nazeema123) on 2017-01-09

Changed in nova:
assignee:	Tang Chen (tangchen) → Nazeema Begum (nazeema123)

Revision history for this message

Nazeema Begum (nazeema123) wrote on 2017-01-25:

#12

I request the bug reporter to close the bug as this bug is already fixed in mitaka version and here is my analysis on this bug and the delta between liberty and the mitaka

Analysis:

In Liberty:
There is no proper volume attach/detach handling in compute/api.py in liberty. Also, there is no local cleanup of the bdm table.

Fix in Mitaka:
Here, 3 new methods are included to handle volume attach/detach in /compute/api.py.
1) _attach_volume_shelved_offloaded - This method handles attaching volumes in shelved offloaded state.
2) _detach_volume_shelved_offloaded - This method handles detaching volumes in shelved offloaded state on terminate_connection call.
3) _local_cleanup_bdm_volumes - This method deletes the bdm record and takes care of cleanup of volumes

The same as above is even mentioned in the latest release notes of Mitaka version in new features list:
'''It is possible to call attach and detach volume API operations for instances which are in shelved and shelved_offloaded state. For an instance in shelved_offloaded state Nova will set to None the value for the device_name field, the right value for that field will be set once the instance will be unshelved as it will be managed by a specific compute manager.'''

REFFERED FILES:

/opt/stack/nova/nova/compute/api.py
/opt/stack/nova/nova/compute/manager.py
/opt/stack/nova/nova/test/unit/compute/test_compute.py

Sean Dague (sdague) on 2017-06-23

Changed in nova:
assignee:	Nazeema Begum (nazeema123) → nobody

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-05-31: Fix proposed to nova (master)

#13

Fix proposed to branch: master
Review: https://review.openstack.org/571472

Changed in nova:
assignee:	nobody → Chen (chenn2)
status:	Confirmed → In Progress

Revision history for this message

Mike Chen (chenn2) wrote on 2018-06-01:

#14

I can still reproduce this bug following the steps in comment #2 (environment: queens, nova version 17.0.1).

The status of the volume gets stuck in "detaching".

When trying to do detach again (nova volume-detach vm_id volume_id):

ERROR (BadRequest): Invalid volume: Invalid input received: Invalid volume: Unable to detach volume. Volume status must be 'in-use' and attach_status must be 'attached' to detach. (HTTP 400)

When trying to do delete the volume (cinder delete volume_id):

Delete for volume volume_id failed: Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots or be disassociated from snapshots after volume transfer. (HTTP 400)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-07: Change abandoned on nova (master)

#15

Change abandoned by Stephen Finucane (<email address hidden>) on branch: master
Review: https://review.opendev.org/571472
Reason: WIP for some time. It seems this has stalled so abandoning.

Revision history for this message

wang (yunhua) wrote on 2020-04-26:

#16

stay tuned

Lee Yarwood (lyarwood) on 2021-01-07

Changed in nova:
status:	In Progress → Confirmed
assignee:	Mike Chen (chenn2) → nobody

Maurice Wei (mauricewei) on 2021-02-04

Changed in nova:
assignee:	nobody → Maurice Wei (mauricewei)

Han Guangyu (han-guangyu) on 2021-08-11

Changed in nova:
assignee:	Maurice Wei (mauricewei) → nobody
assignee:	nobody → HanGuangyu (hanguangyu)

Han Guangyu (han-guangyu) on 2021-08-13

Changed in nova:
assignee:	HanGuangyu (hanguangyu) → nobody

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Related blueprints

Add force detach volume to nova

Remote bug watches

Bug watches keep track of this bug in other bug trackers.