Nova assumes that a volume is fully detached from the compute if the volume is not defined in the instance's libvirt definition

Bug #1727260 reported by Sahid Orentino
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Sahid Orentino
Pike
Fix Committed
High
Chris Friesen
Queens
Fix Committed
High
Sahid Orentino

Bug Description

During a volume detach operation, Nova compute attempts to remove the volume from libvirt for the instance before proceeding to remove the storage lun from the underlying compute host. If Nova discovers that the volume was not found in the instance's libvirt definition then it ignores that error condition and returns (after issuing a warning message "Ignoring DiskNotFound exception while detaching").

However, under certain failure scenarios it may be that although the libvirt definition for the volume has been removed for the instance that the associated storage lun on the compute server may not have been fully cleaned up yet.

Changed in nova:
assignee: nobody → sahid (sahid-ferdjaoui)
Changed in nova:
status: New → In Progress
Revision history for this message
melanie witt (melwitt) wrote :

We have seen this downstream where an initial volume detach fails due to multipath "map in use" during the detach from the hypervisor host, after the volume was already detached from the guest. The volume remains connected in cinder (which is correct). However, when a second detach is tried, nova finds the volume already detached from the guest and assumes it was also successfully detached from the hypervisor host, which is not necessarily true. So it continues on to terminate the connection in cinder, which results in failed paths in multipathd.

Changed in nova:
importance: Undecided → High
tags: added: volumes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/546655

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/515008
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ce531dd1b763704b9043ddde8e80ac99cd193660
Submitter: Zuul
Branch: master

commit ce531dd1b763704b9043ddde8e80ac99cd193660
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Wed Oct 25 05:57:11 2017 -0400

    libvirt: disconnect volume from host during detach

    Under certain failure scenarios it may be that although the libvirt
    definition for the volume has been removed for the instance that the
    associated storage lun on the compute server may not have been fully
    cleaned up yet.

    In case users try an other attempt to detach volume we should not stop
    the process whether the device is not found in domain definition but
    try to disconnect the logical device from host.

    This commit makes the process to attempt a disconnect volume even if
    the device is not attached to the guest.

    Closes-Bug: #1727260
    Change-Id: I4182642aab3fd2ffb1c97d2de9bdca58982289d8
    Signed-off-by: Sahid Orentino Ferdjaoui <email address hidden>

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/546655
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d6a072b5c5a3ff9de7f4b42cda517ead17efe561
Submitter: Zuul
Branch: stable/queens

commit d6a072b5c5a3ff9de7f4b42cda517ead17efe561
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Wed Oct 25 05:57:11 2017 -0400

    libvirt: disconnect volume from host during detach

    Under certain failure scenarios it may be that although the libvirt
    definition for the volume has been removed for the instance that the
    associated storage lun on the compute server may not have been fully
    cleaned up yet.

    In case users try an other attempt to detach volume we should not stop
    the process whether the device is not found in domain definition but
    try to disconnect the logical device from host.

    This commit makes the process to attempt a disconnect volume even if
    the device is not attached to the guest.

    Closes-Bug: #1727260
    Change-Id: I4182642aab3fd2ffb1c97d2de9bdca58982289d8
    Signed-off-by: Sahid Orentino Ferdjaoui <email address hidden>
    (cherry picked from commit ce531dd1b763704b9043ddde8e80ac99cd193660)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.1

This issue was fixed in the openstack/nova 17.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/560690

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b1

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/560690
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=92bd7ea118fec4791b903871cdc95d0cd71583e2
Submitter: Zuul
Branch: stable/pike

commit 92bd7ea118fec4791b903871cdc95d0cd71583e2
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Wed Oct 25 05:57:11 2017 -0400

    libvirt: disconnect volume from host during detach

    Under certain failure scenarios it may be that although the libvirt
    definition for the volume has been removed for the instance that the
    associated storage lun on the compute server may not have been fully
    cleaned up yet.

    In case users try an other attempt to detach volume we should not stop
    the process whether the device is not found in domain definition but
    try to disconnect the logical device from host.

    This commit makes the process to attempt a disconnect volume even if
    the device is not attached to the guest.

    Closes-Bug: #1727260
    Signed-off-by: Sahid Orentino Ferdjaoui <email address hidden>

    (cherry picked from commit ce531dd1b763704b9043ddde8e80ac99cd193660)
    (cherry picked from commit d6a072b5c5a3ff9de7f4b42cda517ead17efe561)

    Conflicts:
     nova/tests/unit/virt/libvirt/test_driver.py

    NOTE: The conflicts were due to the newer testcase mocking
    'nova.virt.libvirt.host.Host._get_domain' where the older code calls
    'nova.virt.libvirt.host.Host.get_domain', and also dealing with the
    fact that the older code doesn't pass 'encryption' to
    self._disconnect_volume().

    This latter issue means that we need to move the call to
    encryptor.detach_volume() to ensure it gets called if we hit
    exception.DeviceNotFound when detaching the device from
    the guest. This is similar to the original code proposed
    in https://review.openstack.org/#/c/515008/9/nova/virt/libvirt/driver.py
    but it requires special handling for the scenario where cryptsetup
    tries to destroy a dm-crypt device that has already been destroyed.

    Signed-off-by: Chris Friesen <email address hidden>
    Change-Id: I4182642aab3fd2ffb1c97d2de9bdca58982289d8

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.2

This issue was fixed in the openstack/nova 16.1.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.