detach_device_with_retry doesn't detach from live domain if persistent domain was already detached in the past

Bug #1707238 reported by melanie witt on 2017-07-28
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
melanie witt
Newton
High
Tony Breeds
Ocata
High
melanie witt

Bug Description

In an attempt to fix a different bug [1] where a later try to detach a volume failed if the guest was busy and ignored the request to detach from the live domain, a new bug was introduced where a later try to detach a volume silently passes even though the device is still attached to the live domain.

This bug is serious because now it's possible for a volume to be attached to two live domains and data corruption can occur. We should be trying to detach from the live domain even if we had already detached from the persistent domain in the past.

[1] https://bugs.launchpad.net/nova/+bug/1633236

Sean Dague (sdague) on 2017-07-28
Changed in nova:
status: New → Confirmed

Fix proposed to branch: master
Review: https://review.openstack.org/488545

Changed in nova:
status: Confirmed → In Progress
melanie witt (melwitt) wrote :

Here are the steps to reproduce the bug:

1. Create an instance.
$ nova boot --flavor m1.nano --image cirros-0.3.5-x86_64-disk repro

2. Create a volume.
$ cinder create --name repro 1

3. Attach the volume to the instance.
$ nova volume-attach repro 552e833a-5c56-45c0-a670-7cfddaa8112b

4. See the device on the domain.
$ virsh domblklist 1
Target Source
------------------------------------------------
vda /opt/stack/data/nova/instances/b626887f-aa9a-4d91-864b-ce4733f5cc24/disk
vdb /dev/sdc

5. Create XML to use for the detach from persistent domain only.
$ cat detach.xml
<disk type='block' device='disk'>
  <driver name='qemu' type='raw' cache='none' io='native'/>
  <source dev='/dev/sdc'/>
  <backingStore/>
  <target dev='vdb' bus='virtio'/>
  <serial>552e833a-5c56-45c0-a670-7cfddaa8112b</serial>
  <alias name='virtio-disk1'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</disk>

6. Detach from persistent domain only.
$ virsh detach-device 1 detach.xml --config
Device detached successfully

7. Detach the volume via the Nova API.
$ nova volume-detach repro 552e833a-5c56-45c0-a670-7cfddaa8112b

Expected Result: Device is no longer attached to the live domain.
$ virsh domblklist 1
Target Source
------------------------------------------------
vda /opt/stack/data/nova/instances/b626887f-aa9a-4d91-864b-ce4733f5cc24/disk

Actual Result: Device is still attached to the live domain.
$ virsh domblklist 1
Target Source
------------------------------------------------
vda /opt/stack/data/nova/instances/b626887f-aa9a-4d91-864b-ce4733f5cc24/disk
vdb /dev/sdc

Reviewed: https://review.openstack.org/488545
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d39934ad6afb7e2729bb45235f363ada86012d15
Submitter: Jenkins
Branch: master

commit d39934ad6afb7e2729bb45235f363ada86012d15
Author: melanie witt <email address hidden>
Date: Fri Jul 28 17:06:53 2017 +0000

    Detach device from live domain even if not found on persistent

    In a past attempt to fix a bug [1], we started raising DeviceNotFound
    if a device wasn't found on the persistent domain. This was to address
    a scenario where the guest ignored the detach from the live domain
    because it was busy and we wanted to avoid failing a later detach
    request to the user (compute handles DeviceNotFound).

    Unfortunately, in the above case, a later detach request won't fail to
    the user but it also won't detach from the live domain. It sees the
    device already detached from the persistent domain and doesn't attempt
    to detach from the live domain.

    This is a serious problem because it's possible for a volume to be
    attached to two live domains and data corruption can occur.

    This adds an attempt to detach from the live domain even if we had
    already detached from the persistent domain in the past.

    Closes-Bug: #1707238

    [1] https://review.openstack.org/386257

    Change-Id: I8cd056fa17184a98c31547add0e9fb2d363d0908

Changed in nova:
status: In Progress → Fix Released

This issue was fixed in the openstack/nova 16.0.0.0rc1 release candidate.

Reviewed: https://review.openstack.org/491625
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=539d3bbb8aced7703914bb7ef0b72ac3a471c54e
Submitter: Jenkins
Branch: stable/ocata

commit 539d3bbb8aced7703914bb7ef0b72ac3a471c54e
Author: melanie witt <email address hidden>
Date: Fri Jul 28 17:06:53 2017 +0000

    Detach device from live domain even if not found on persistent

    In a past attempt to fix a bug [1], we started raising DeviceNotFound
    if a device wasn't found on the persistent domain. This was to address
    a scenario where the guest ignored the detach from the live domain
    because it was busy and we wanted to avoid failing a later detach
    request to the user (compute handles DeviceNotFound).

    Unfortunately, in the above case, a later detach request won't fail to
    the user but it also won't detach from the live domain. It sees the
    device already detached from the persistent domain and doesn't attempt
    to detach from the live domain.

    This is a serious problem because it's possible for a volume to be
    attached to two live domains and data corruption can occur.

    This adds an attempt to detach from the live domain even if we had
    already detached from the persistent domain in the past.

    Closes-Bug: #1707238

    [1] https://review.openstack.org/386257

    Change-Id: I8cd056fa17184a98c31547add0e9fb2d363d0908
    (cherry picked from commit d39934ad6afb7e2729bb45235f363ada86012d15)

Reviewed: https://review.openstack.org/491630
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=dd925025543bfd1c826db728caf9832f44c1bacd
Submitter: Jenkins
Branch: stable/newton

commit dd925025543bfd1c826db728caf9832f44c1bacd
Author: melanie witt <email address hidden>
Date: Fri Jul 28 17:06:53 2017 +0000

    Detach device from live domain even if not found on persistent

    In a past attempt to fix a bug [1], we started raising DeviceNotFound
    if a device wasn't found on the persistent domain. This was to address
    a scenario where the guest ignored the detach from the live domain
    because it was busy and we wanted to avoid failing a later detach
    request to the user (compute handles DeviceNotFound).

    Unfortunately, in the above case, a later detach request won't fail to
    the user but it also won't detach from the live domain. It sees the
    device already detached from the persistent domain and doesn't attempt
    to detach from the live domain.

    This is a serious problem because it's possible for a volume to be
    attached to two live domains and data corruption can occur.

    This adds an attempt to detach from the live domain even if we had
    already detached from the persistent domain in the past.

    Closes-Bug: #1707238

    [1] https://review.openstack.org/386257

    Change-Id: I8cd056fa17184a98c31547add0e9fb2d363d0908
    (cherry picked from commit d39934ad6afb7e2729bb45235f363ada86012d15)
    (cherry picked from commit 539d3bbb8aced7703914bb7ef0b72ac3a471c54e)

This issue was fixed in the openstack/nova 15.0.7 release.

This issue was fixed in the openstack/nova 14.0.8 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers