Detach volume from instance sometimes does not work well

Bug #2055271 reported by Jan Jasek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
New
Undecided
Unassigned

Bug Description

Description
=============================
From the Horizon UI perspective:
Detach volume from server, it shows success popup message, it starts detaching (detaching loading bar), it takes a few minutes, detaching is finished but the volume is still attached. (it is happening only sometimes, roughly once in 5 runs)
Checked also status of Volume using OpenstackSDK and the Volume is still attached. So it is not only a Horizon issue.
See video in Logs & Configs part.

Steps to reproduce
==================
1) Create Instance
2) Create Volume
3) Created Volume -> Manage Attachments -> Attach volume to created instance.
4) Volume attached to Instance
5) Created Volume -> Manage Attachments -> Detach volume.

Expected result
===============
6) The volume should be Available again and not attached.

Actual result
=============
6) The volume is still in-use and attached to the instance.

Environment
===========
Zuul

Logs & Configs
==============
https://zuul.opendev.org/t/openstack/build/f0ab36d3f776416db89f0ca92eb3be78/logs

Video recording from test (0:30 success popup message and status set from "In-use" to "Detaching", 4:02 status set from "Detaching" to "In-use" again):
https://46065918d90ec33ce999-76c305bffd22578a75e29d73d2b2fb28.ssl.cf1.rackcdn.com/910378/3/check/horizon-integration-pytest/f0ab36d/tox/test_reports/test_volumes.py%3A%3Atest_manage_volume_attachments%5Bnew_instance_demo0-1%5D/video.mp4

Jan Jasek (johnnyjj)
description: updated
Jan Jasek (johnnyjj)
description: updated
Jan Jasek (johnnyjj)
description: updated
Revision history for this message
Amit Uniyal (auniyal) wrote :

Hey, the current nova master branch work as expected in CLI and dashboard. can you please tell how did you created volume.

the attached video and logs shows its an automation,
from this log VM (80745088-1d69-45bc-901a-0d4a970fbfd8) deleted immediately after creation at 2:55 https://46065918d90ec33ce999-76c305bffd22578a75e29d73d2b2fb28.ssl.cf1.rackcdn.com/910378/3/check/horizon-integration-pytest/f0ab36d/controller/logs/screen-n-cpu.txt

instance-id: 6a0a909d-4f57-4b22-9c45-eefcff748c9b
vol-id: 28772a87-dbf2-484a-8ce7-8a19e8a3115d

Nova logs: Nova did asked for detach vol and retied till 8 times.
WARNING nova.virt.libvirt.driver [None req-b726265d-ff18-4879-92c9-5f9875e5288b demo demo] Waiting for libvirt event about the detach of device vdb with device alias ua-28772a87-dbf2-484a-8ce7-8a19e8a3115d from instance 6a0a909d-4f57-4b22-9c45-eefcff748c9b is timed out.

DEBUG nova.virt.libvirt.driver [None req-b726265d-ff18-4879-92c9-5f9875e5288b demo demo] Failed to detach device vdb with device alias ua-28772a87-dbf2-484a-8ce7-8a19e8a3115d from instance 6a0a909d-4f57-4b22-9c45-eefcff748c9b from the live domain config. Libvirt did not report any error but the device is still in the config. {{(pid=98861) _detach_from_live_with_retry /opt/stack/nova/nova/virt/libvirt/driver.py:2599}}

ERROR nova.virt.libvirt.driver [None req-b726265d-ff18-4879-92c9-5f9875e5288b demo demo] Run out of retry while detaching device vdb with device alias ua-28772a87-dbf2-484a-8ce7-8a19e8a3115d from instance 6a0a909d-4f57-4b22-9c45-eefcff748c9b from the live domain config. Device is still attached to the guest.

nova.exception.DeviceDetachFailed: Device detach failed for vdb: Run out of retry while detaching device vdb with device alias ua-28772a87-dbf2-484a-8ce7-8a19e8a3115d from instance 6a0a909d-4f57-4b22-9c45-eefcff748c9b from the live domain config. Device is still attached to the guest.

No detaching logs in c-api and c-vol. c-api do have attachments logs and no req for deataching vol.

I need to look deeper, for issue, may be later, but I think this should be a cinder bug,

Revision history for this message
Jan Jasek (johnnyjj) wrote :

Hello Amit,
You are right, those tests are automated.
Instance and Volume is created using OpenstackSDK, then the clicking in Horizon is managed by Selenium.

Fixture for create/delete Volume:

@pytest.fixture
def new_volume_demo(volume_name, openstack_demo, config):

    for vol in volume_name:
        volume = openstack_demo.create_volume(
            name=vol,
            image=config.image.images_list[0],
            size=1,
            wait=True,
        )
    yield volume
    for vol in volume_name:
        openstack_demo.delete_volume(
            name_or_id=vol,
            wait=True,
        )

If you have any else questions or something I can help, feel free to ping me anytime.

Revision history for this message
Kashyap Chamarthy (kashyapc) wrote (last edit ):

A couple of quick questions / comments:

- Is this also reproducible _outside_ of Horizon? If yes, and if it's not too much to ask, please outline the reproducer here with `openstack` command-line client.

- It's also strange that I don't see anything of these device detach errors in the libvirtd log; maybe this indicates the problem is entirely in the Nova driver

https://46065918d90ec33ce999-76c305bffd22578a75e29d73d2b2fb28.ssl.cf1.rackcdn.com/910378/3/check/horizon-integration-pytest/f0ab36d/controller/logs/libvirt/libvirt/libvirtd_log.txt

(Sorry, I'm buried in too many things; I'm slow here.)

Revision history for this message
Jan Jasek (johnnyjj) wrote (last edit ):

I tried to reproduce it in my own environment using Horizon UI and also using CLI - without any success.
I ran those automated tests maybe a hundred times (in last weeks) from my local against my devstack deploy and I am not facing that issue.
So we face that issue specifically in Zuul environment. And as I mentioned before, it is not happening every time. It happens roughly once in 5 runs.

Yep, I think that if there would be an detach error then that error would appear immediately also in UI in an error popup message after clicking Detach Volume button. But there is a success message so the detaching started without any issue and then ended but the volume is still attached.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.