If volume detach fails, it cannot be retried and the instance must be rebooted to detach

Bug #1633236 reported by melanie witt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
melanie witt
Newton
Fix Committed
Medium
Lee Yarwood

Bug Description

There is a problem where if a volume detach fails at the libvirt driver level for some reason, the volume detach cannot be retried and the volume cannot be detached until the instance is rebooted.

Currently, a volume detach at the libvirt driver level happens in two steps:

 1. Detach from the persistent domain (this will affect the instance upon next reboot)
 2. Detach from the transient domain (this will affect the running instance)

A detach from a transient domain is a request from the host to the guest, to detach the volume. The guest can choose to ignore this request. For example, if the guest has a file open on the volume by some process, it might ignore the request to detach that volume because it is busy.

If this scenario occurs, when a user tries a later request to detach the volume, it will fail with the error:

 libvirtError: invalid argument: no target device <device>

because the volume was detached from the persistent domain the first time. Because of this, the volume can only be detached by rebooting the instance.

The behavior should be changed to detach from the transient domain first, retrying if necessary, and detach from the persistent domain only if the detach from the transient domain has succeeded. This way, if the guest volume is busy and it ignores the detach request, the detach can be tried again at a later time by the user.

Suggested steps to reproduce:

 1. Boot an instance and attach a volume
 2. Log in to the guest and open a file on that volume in a text editor
 3. Try to detach the volume using 'nova volume-detach' (it should have failed)
 4. Exit the text editor on the guest
 5. Try to detach the volume using 'nova volume-detach' (should get the 'no target device' error)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/386257

Changed in nova:
status: Triaged → In Progress
melanie witt (melwitt)
tags: added: volumes
melanie witt (melwitt)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/386257
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=63b2c8962697280f37fa888f1ab1d255757d1154
Submitter: Jenkins
Branch: master

commit 63b2c8962697280f37fa888f1ab1d255757d1154
Author: melanie witt <email address hidden>
Date: Wed Oct 12 07:37:41 2016 +0000

    Raise DeviceNotFound detaching volume from persistent domain

    Currently, a volume detach at the libvirt driver level happens in two
    steps:

      1. Detach from persistent domain (affect instance upon next reboot)
      2. Detach from live domain (affect running instance)

    A detach from a live domain is a request from the host to the guest,
    which the guest can choose to ignore. For example, if the guest
    has a file open on the volume by some process, it might ignore the
    request to detach that volume because the file is in use.

    If this scenario occurs, when a user tries a later request to detach
    the volume, it will fail with this error when the attempt to detach
    from the persistent domain is made:

      libvirtError: invalid argument: no target device <device>

    because the volume was detached from the persistent domain the first
    time. Because of this, the volume can only be detached by rebooting
    the instance.

    This handles the VIR_ERR_INVALID_ARG
    "invalid argument: no target device" error [1] from libvirt for the
    detach from persistent domain and raises DeviceNotFound. The libvirt
    driver handles DeviceNotFound for volume detach.

    Note: Our code is already handling the VIR_ERR_OPERATION_FAILED
    "operation failed: disk vdb not found" error [2] for the case of the
    detach from live domain.

    Closes-Bug: #1633236

    [1] https://github.com/libvirt/libvirt/blob/f9d57f2/src/qemu/qemu_driver.c#L8055-L8059
    [2] https://github.com/libvirt/libvirt/blob/f81b33b/src/qemu/qemu_hotplug.c#L2859-L2863

    Change-Id: I09230fc47b0950aa5a3db839a070613c9c817576

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/425114

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.0.0b3

This issue was fixed in the openstack/nova 15.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/425114
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b51231c638228f67ab130a7855b9143b202733f6
Submitter: Jenkins
Branch: stable/newton

commit b51231c638228f67ab130a7855b9143b202733f6
Author: melanie witt <email address hidden>
Date: Wed Oct 12 07:37:41 2016 +0000

    Raise DeviceNotFound detaching volume from persistent domain

    Currently, a volume detach at the libvirt driver level happens in two
    steps:

      1. Detach from persistent domain (affect instance upon next reboot)
      2. Detach from live domain (affect running instance)

    A detach from a live domain is a request from the host to the guest,
    which the guest can choose to ignore. For example, if the guest
    has a file open on the volume by some process, it might ignore the
    request to detach that volume because the file is in use.

    If this scenario occurs, when a user tries a later request to detach
    the volume, it will fail with this error when the attempt to detach
    from the persistent domain is made:

      libvirtError: invalid argument: no target device <device>

    because the volume was detached from the persistent domain the first
    time. Because of this, the volume can only be detached by rebooting
    the instance.

    This handles the VIR_ERR_INVALID_ARG
    "invalid argument: no target device" error [1] from libvirt for the
    detach from persistent domain and raises DeviceNotFound. The libvirt
    driver handles DeviceNotFound for volume detach.

    Note: Our code is already handling the VIR_ERR_OPERATION_FAILED
    "operation failed: disk vdb not found" error [2] for the case of the
    detach from live domain.

    Closes-Bug: #1633236

    [1] https://github.com/libvirt/libvirt/blob/f9d57f2/src/qemu/qemu_driver.c#L8055-L8059
    [2] https://github.com/libvirt/libvirt/blob/f81b33b/src/qemu/qemu_hotplug.c#L2859-L2863

    Change-Id: I09230fc47b0950aa5a3db839a070613c9c817576
    (cherry picked from commit 63b2c8962697280f37fa888f1ab1d255757d1154)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.4

This issue was fixed in the openstack/nova 14.0.4 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.