Detach is broken for multi-attached fs-based volumes

Bug #1888022 reported by Alex Deiter
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Alex Deiter

Bug Description

Description: Detach is broken for multi-attached
LibvirtMountedFileSystemVolumeDriver-based volumes.

Steps to reproduce:
1. Deploy Devstack Master

2. Configure Nova to use KVM/Libvirt

3. Configure Cinder to use any LibvirtMountedFileSystemVolumeDriver-based volume driver - for example - NFS:

cinder.conf:

[nfs]
volume_driver = cinder.volume.drivers.nfs.NfsDriver
volume_backend_name = nfs
nas_secure_file_operations = False
nfs_snapshot_support = True
nas_host = 10.3.35.41
nas_share_path = /volumes/pool1/nas

4. Create a volume type with enabled multi-attach feature:
$ cinder type-create multiattach
$ cinder type-key multiattach set multiattach="<is> True"
$ cinder type-key multiattach set volume_backend_name=nfs

5. Create a volume:
$ cinder create --volume-type nfs 1

6. Boot two Nova virtual machines:
$ nova boot --flavor m1.nano --image linux --nic none a
$ nova boot --flavor m1.nano --image linux --nic none b

7. Attach the volume to both VM's:
$ nova volume-attach ac100d66-e92d-40da-a765-fea72ae0af3c 31b702b9-423b-4402-8a6e-1c3dcf84f956
$ nova volume-attach 0843c96e-2cfe-49ca-a8eb-0d25f806ffeb 31b702b9-423b-4402-8a6e-1c3dcf84f956

8. Check Nova CPU service log file:

Jul 16 22:40:36 openstack-master-lustre7 nova-compute[74494]: INFO nova.compute.manager [None req-bc029573-7eea-4f56-ba89-060c158f2f75 admin admin] [instance: ac100d66-e92d-40da-a765-fea72ae0af3c] Attaching volume 31b702b9-423b-4402-8a6e-1c3dcf84f956 to /dev/sdb

Jul 16 22:40:38 openstack-master-lustre7 nova-compute[74494]: DEBUG nova.virt.libvirt.guest [None req-bc029573-7eea-4f56-ba89-060c158f2f75 admin admin] attach device
xml: <disk type="file" device="disk">
                                                                <driver name="qemu" type="raw" cache="none" io="native"/>
                                                                <source file="/opt/stack/data/nova/mnt/0abe5ba79045d7dd179ddc8a4ff1991c/volume-31b702b9-423b-4402-8a6e-1c3dcf84f956"/>
                                                                <target dev="sdb" bus="scsi"/>
                                                                <serial>31b702b9-423b-4402-8a6e-1c3dcf84f956</serial>
                                                                <shareable/>
                                                                <address type="drive" controller="0" unit="1"/>
                                                              </disk>

Jul 16 22:40:47 openstack-master-lustre7 nova-compute[74494]: INFO nova.compute.manager [None req-d0d5238d-a617-48d4-a439-4c998eea21b5 admin admin] [instance: 0843c96e-2cfe-49ca-a8eb-0d25f806ffeb] Attaching volume 31b702b9-423b-4402-8a6e-1c3dcf84f956 to /dev/sdb

Jul 16 22:40:48 openstack-master-lustre7 nova-compute[74494]: DEBUG nova.virt.libvirt.guest [None req-d0d5238d-a617-48d4-a439-4c998eea21b5 admin admin] attach device xml: <disk type="file" device="disk">
                                                                <driver name="qemu" type="raw" cache="none" io="native"/>
                                                                <source file="/opt/stack/data/nova/mnt/0abe5ba79045d7dd179ddc8a4ff1991c/volume-31b702b9-423b-4402-8a6e-1c3dcf84f956"/>
                                                                <target dev="sdb" bus="scsi"/>
                                                                <serial>31b702b9-423b-4402-8a6e-1c3dcf84f956</serial>
                                                                <shareable/>
                                                                <address type="drive" controller="0" unit="1"/>
                                                              </disk>

9. Check the mountpoint:
$ mount -t nfs4
10.3.35.41:/volumes/pool1/nas on /opt/stack/data/nova/mnt/0abe5ba79045d7dd179ddc8a4ff1991c type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.3.196.7,local_lock=none,addr=10.3.35.41)

Looks good and works as expected.

10. Detach the volume at the VM a

11. Detach the volume at the VM b

12. Check the mountpoint:
$ mount -t nfs4
10.3.35.41:/volumes/pool1/nas on /opt/stack/data/nova/mnt/0abe5ba79045d7dd179ddc8a4ff1991c type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.3.196.7,local_lock=none,addr=10.3.35.41)

it still mounted - but should be unmounted, when we detach the volume at all VM's!

13. Check Nova CPU service log file:

Jul 16 22:46:43 openstack-master-lustre7 nova-compute[74494]: INFO nova.virt.libvirt.driver [None req-ec46e195-645e-4ad7-a495-3b5fd9578935 admin admin] [instance: ac100d66-e92d-40da-a765-fea72ae0af3c] Detected multiple connections on this host for volume: 31b702b9-423b-4402-8a6e-1c3dcf84f956, skipping target disconnect.

Jul 16 22:46:43 openstack-master-lustre7 nova-compute[74494]: INFO nova.virt.libvirt.driver [None req-048bb510-7012-405e-af7b-2b39772f5ad0 admin admin] [instance: 0843c96e-2cfe-49ca-a8eb-0d25f806ffeb] Detected multiple connections on this host for volume: 31b702b9-423b-4402-8a6e-1c3dcf84f956, skipping target disconnect.

Looks like an error in detach logic.

Root cause:
nova/compute/manager.py calls:
  self.driver.destroy(context, instance, network_info, block_device_info)
from nova/virt/libvirt/driver.py
and destroy(self, context, instance, network_info, block_device_info=None) calls:
  self.cleanup(context, instance, network_info, block_device_info) and
  self._cleanup(context, instance, network_info, block_device_info=block_device_info)
  _disconnect_volume((self, context, connection_info, instance, ncryption=None)

and _disconnect_volume checks:

if self._should_disconnect_target(context, connection_info, instance)

and _should_disconnect_target(context, connection_info, instance) function
checks the volume multi-attach property and number of attachments.
And if len(attachments) > 1 and VM is running on the current host => it skip volume disconnect.

And this logic is correct for true block device drivers (iSCSI/FC),
but it don't works as expected for LibvirtMountedFileSystemVolumeDriver-based drivers.

All LibvirtMountedFileSystemVolumeDriver-based drivers uses _HostMountStateManager from nova/virt/libvirt/volume/mount.py and this class manages mounts/unmounts and itself keeps track of the number of mounts for a particular volume.

So if we exclude the _should_disconnect_target() logic for LibvirtMountedFileSystemVolumeDriver-based volumes, then this class will work correctly for multi-attached volumes.

Please take a look my patch.

Thank you!

Alex Deiter (deiter)
Changed in nova:
assignee: nobody → Alex Deiter (deiter)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/741712

Changed in nova:
status: New → In Progress
Revision history for this message
Alex Deiter (deiter) wrote :

Tempest for proposed patch:

$ grep -i multiattach report.txt
{6} tempest.api.compute.admin.test_volumes_negative.UpdateMultiattachVolumeNegativeTest.test_multiattach_rw_volume_update_failure [67.618377s] ... ok
{7} tempest.api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach [230.591590s] ... ok
{7} tempest.api.compute.volumes.test_attach_volume.AttachVolumeMultiAttachTest.test_boot_from_multiattach_volume [25.522699s] ... ok
{7} tempest.api.compute.volumes.test_attach_volume.AttachVolumeMultiAttachTest.test_list_get_volume_attachments_multiattach [79.578258s] ... ok
{7} tempest.api.compute.volumes.test_attach_volume.AttachVolumeMultiAttachTest.test_resize_server_with_multiattached_volume [107.087478s] ... ok
{7} tempest.api.compute.volumes.test_attach_volume.AttachVolumeMultiAttachTest.test_snapshot_volume_backed_multiattach [40.782358s] ... ok

$ tail report.txt
======
Totals
======
Ran: 266 tests in 0.0776 sec.
 - Passed: 261
 - Skipped: 5
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 6381.8061 sec.

Alex Deiter (deiter)
description: updated
Changed in nova:
assignee: Alex Deiter (deiter) → Lee Yarwood (lyarwood)
Alex Deiter (deiter)
Changed in nova:
assignee: Lee Yarwood (lyarwood) → Alex Deiter (deiter)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/741712
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=806575cfd5327f96e62462f484118d06d17cbe8d
Submitter: Zuul
Branch: master

commit 806575cfd5327f96e62462f484118d06d17cbe8d
Author: Alex Deiter <email address hidden>
Date: Fri Jul 17 20:38:55 2020 +0000

    Detach is broken for multi-attached fs-based volumes

    Fixed an issue with detaching multi-attached fs-based volumes.
    Volume drivers using _HostMountStateManager are special case.
    _HostMountStateManager ensures that the compute node only attempts
    to mount a single mountpoint in use by multiple attachments once,
    and that it is not unmounted until it is no longer in use by any
    attachments. So we can skip the multiattach check for volume drivers
    that based on LibvirtMountedFileSystemVolumeDriver.

    Closes-Bug: 1888022
    Change-Id: Ia91b63c0676f42ad8a7a0d16e6870bafc2ee7675

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/796263

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/796936

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/train)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/796936
Reason: stable/train branch of nova projects' have been tagged as End of Life. All open patches have to be abandoned in order to be able to delete the branch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/ussuri)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/796263
Reason: stable/ussuri branch of openstack/nova transitioned to End of Life and is about to be deleted. To be able to do that, all open patches need to be abandoned.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.