After unrescue a instance,del a detached rbd volume will go wrong

Bug #1631692 reported by tanyy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Incomplete
Undecided
Unassigned

Bug Description

It's not the rescue image volume,it‘s the instance attach volume(ceph rbd device,maybe vdb) before rescue.
After unrescue the instance,if try to del a detached the volume vbd will go wrong.
Specifically:
1.the ceph rdb watchers is still watcher
   use the cmd-line: rbd status volumes/volume-uuid you will see that .
   When in this state,the rbd cannot be rm
2.the database was updated the volumes info,status->available,attach_status->detached.
3.the instance still can see and use this rbd ,you can format、mount and r/w the disk.But if you reboot the instance , this disk will disappear.

the tempest case and error report:
tempest.api.compute.servers.test_server_rescue_negative.ServerRescueNegativeTestJSON.test_rescued_vm_detach_volume
ERROR cinder.volume.manager [req-xxx] Unable to delete busy volume

Reproduce the steps:
1.nova volume-attach instance-uuid volume-uuid
2.nova rescue --password admin --image image-uuid instance-uuid
3.nova unrescue instance-uuid
4.nova volume-detach instance-uuid volume-uuid
  and you can see the ceph rdb status and check the rbd still watcher or not
5.if the rbd still watcher ,any operation to del the volume , will fail。

Tags: ceph liberty rdb
tanyy (tanyy1990)
tags: added: ceph liberty rdb
Revision history for this message
tanyy (tanyy1990) wrote :

I found the same bug report in here:
https://bugzilla.redhat.com/show_bug.cgi?id=1303549
Same phenomenon,and almost same soft version,but According to his steps,I was failed to reproduce that.

Revision history for this message
Prateek Arora (parora) wrote :

tanyy, I am not able to reproduce this and works perfectly well for me. Can you give some more details how you got here ?

Changed in nova:
status: New → Incomplete
Revision history for this message
Prateek Arora (parora) wrote :

[stack@controller devstack]$ nova volume-detach a3e5e950-a51f-483a-ac48-85203bdb0bc9 29dd9434-7559-404b-9c61-9d943936c2bf
[stack@controller devstack]$ cinder list
+--------------------------------------+-----------+----------+------+-------------+----------+-------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+----------+------+-------------+----------+-------------+
| 29dd9434-7559-404b-9c61-9d943936c2bf | available | test_vol | 1 | ceph | false | |
+--------------------------------------+-----------+----------+------+-------------+----------+-------------+
[stack@controller devstack]$ cinder delete 29dd9434-7559-404b-9c61-9d943936c2bf
Request to delete volume 29dd9434-7559-404b-9c61-9d943936c2bf has been accepted.
[stack@controller devstack]$ cinder list
+----+--------+------+------+-------------+----------+-------------+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+----+--------+------+------+-------------+----------+-------------+
+----+--------+------+------+-------------+----------+-------------+

For the the volume got deleted, can you confirm where you see it as non deleted ?

Revision history for this message
tanyy (tanyy1990) wrote :

thanks Prateek Arora (parora)
Yes,the volumes status was updated in database,but you still can use this volume in instance,you.
And I found the same bug report :https://bugs.launchpad.net/nova/+bug/1485399

like Augustina Ragwitz (auggy) said:
I just verified this without Ceph to make sure there was no issue with attaching and detaching volumes generally and was unable to reproduce.

To clarify, the problem being reported looks like if you detach a Ceph volume while an instance is booting up, the instance doesn't register it as detached.

Marking as Confirmed to get into the triage queue so someone with Ceph can attempt to reproduce this.

And,you can use this shell script to reproduce:
#!/bin/bash
date
printf "begin to attach volume:\n"
time nova volume-attach instance-uuid volume-uuid
#get some sleep , and make sure attach success
declare -i time=1
while [ $time -lt 5 ]
do
    printf "$time\n"
    time+=1
    sleep 1
done

printf "begin to reboot the instance:\n"
time nova reboot instance-uuid
#Sleep 1 sec , so we can make sure the instance status update to active
declare -i time=1
while [ $time -lt 2 ]
do
    printf "$time\n"
    time+=1
    sleep 1
done

date
printf "begin to detach the volume:\n"
time nova volume-detach volume-uuid

Revision history for this message
tanyy (tanyy1990) wrote :

use the cmd-line: rbd status volumes/volume-uuid you will see it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.