detach multiattach volume disconnects innocent bystander

Bug #1752115 reported by Steve Noyes
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
John Griffith
Queens
Fix Committed
High
Matt Riedemann

Bug Description

Detaching a multi-attached lvm volume from one server, causes the other server to lose connectivity to the volume. I found this while developing a new tempest test to test this scenario.

- create 2 instances on the same host, both simple instances with ephemeral disks
- create a multi-attach lvm volume, attach to both instances
- check that you can re-read the partition table from inside each instance (via ssh):

   $ sudo blockdev --rereadpt /dev/vdb

  This succeeds on both instances (no output or err message is returned).

- detach the volume from one of the instances
- recheck connectivity. The expected result is that the command will now fail in the instance where
  the volume was detached. But it also fails on the instance where the volume is still supposedly
  attached:

   $ sudo blockdev --rereadpt /dev/vdb
   BLKRRPART: Input/output error

cinder & nova still think that the volume is attached correctly:

$ cinder show 2cf26a15-8937-4654-ba81-70cbcb97a238 | grep attachment
| attachment_ids | ['f5876aff-5b5b-45a0-a020-515ca339eae4']

$ nova show vm1 | grep attached
| os-extended-volumes:volumes_attached | [{"id": "2cf26a15-8937-4654-ba81-70cbcb97a238", "delete_on_termination": false}] |

cinder version:

:/opt/stack/cinder$ git show
commit 015b1053990f00d1522c1074bcd160b4b57a5801
Merge: 856e636 481535e
Author: Zuul <email address hidden>
Date: Thu Feb 22 14:00:17 2018 +0000

Revision history for this message
Steve Noyes (steve-noyes) wrote :

This is the tempest test wip - https://review.openstack.org/#/c/548356

Revision history for this message
Steve Noyes (steve-noyes) wrote :

The disconnect is happening in a nova-cpu call to os-brick. The path is:

block_device.driver_detach >
libvirt.driver.detach_volume >
libvirt.driver._disconnect_volume >
os_brick.vol_driver.disconnect_volume

which disconnects the volume.

the os-brick version is 2.3.0.

nova version:

nova$ git show
commit 84a661d343a01c4da606ddadf3e9e765a8ed5041
Merge: cac6ab8 9abccf8
Author: Zuul <email address hidden>
Date: Thu Feb 22 20:58:23 2018 +0000

Changed in cinder:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → John Griffith (john-griffith)
Revision history for this message
John Griffith (john-griffith) wrote :

So looking into this the problem appears to be that Nova calls the brick initiator disconnect_volume method indiscriminately. Brick has no way currently to interrogate usage of a connection, and I'm not sure that something like that could be added in this case.

My first thought was that it would be logical to check for multiattach on the same host here:
  https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1249
by using the objects.BlockDeviceMapping.get_by_volume() HOWEVER it turns out that's another special thing that isn't allowed when a volume is multiattach=True (I haven't figured out why that's there yet, but looking).

no longer affects: cinder
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/548427

Changed in nova:
assignee: nobody → John Griffith (john-griffith)
status: New → In Progress
Revision history for this message
Ildiko Vancsa (ildiko-vancsa) wrote :

So I'm a little confused here as this seems to be a case of the shared target issue we solved at least on global level, but not per host where it's applicable. It was known that LVM is exporting one target per host. So is this a case we forgot to cover when we introduced the shared_targets flag and mechanisms or is it a completely separate issue?

It was also stated at some point that brick will not clean up if there's still active connection to a volume or something like this.

Revision history for this message
John Griffith (john-griffith) wrote :

You're correct about the shared-targets flag, however that was specifically to solve for locking create/delete of targets on backends using shared-targets. That's actually a different problem, it means that a single backend has multiple volumes that are all doing data transfer across a single 'shared' target.

In other words it has a different meaning here. We could possibly abuse that flag and use it to solve this problem, but there are a few things that would fall apart if we did that. Given the proposed solution works and I don't see any major drawbacks I would like to stick to the plan so to speak here.

Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → High
tags: added: multiattach volumes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/549411

Changed in nova:
assignee: John Griffith (john-griffith) → Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → John Griffith (john-griffith)
Changed in nova:
assignee: John Griffith (john-griffith) → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/550220

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/550221

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/549411
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d2941bfd165055348dd584b630d4e631ef05e328
Submitter: Zuul
Branch: master

commit d2941bfd165055348dd584b630d4e631ef05e328
Author: Matt Riedemann <email address hidden>
Date: Sat Mar 3 06:11:12 2018 -0500

    Pass user context to virt driver when detaching volume

    We need this in a later change to pull volume attachment
    information from cinder for the volume being detached so
    that we can do some attachment counting for multiattach
    volumes being detached from instances on the same host.

    Change-Id: I751fcb7532679905c4279744919c6cce84a11eb4
    Related-Bug: #1752115

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/548427
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=139426d51416313e8b14a335695adc7165bc4848
Submitter: Zuul
Branch: master

commit 139426d51416313e8b14a335695adc7165bc4848
Author: John Griffith <email address hidden>
Date: Wed Feb 28 00:56:29 2018 +0000

    Check for multiattach before removing connections

    With the addition of multiattach we need to ensure that we
    don't make brick calls to remove connections on detach volume
    if that volume is attached to another Instance on the same
    node.

    This patch adds a new helper method (_should_disconnect_target)
    to the virt driver that will inform the caller if the specified
    volume is attached multiple times to the current host.

    The general strategy for this call is to fetch a current reference
    of the specified volume and then:
    1. Check if that volume has >1 active attachments
    2. Fetch the attachments for the volume and extract the server_uuids
       for each of the attachments.
    3. Check the server_uuids against a list of all known server_uuids
       on the current host. Increment a connection_count for each item
       found.

    If the connection_count is >1 we return `False` indicating that the
    volume is being used by more than one attachment on the host and
    we therefore should NOT destroy the connection.

    *NOTE*
    This scenario is very different than the `shared_targets`
    case (for which we supply a property on the Volume object). The
    `shared_targets` scenario is specifically for Volume backends that
    present >1 Volumes using a single Target. This mechanism is meant
    to provide a signal to consumers that locking is required for the
    creation and deletion of initiator/target sessions.

    Closes-Bug: #1752115

    Change-Id: Idc5cecffa9129d600c36e332c97f01f1e5ff1f9f

Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → John Griffith (john-griffith)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/550220
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b006a5ba91d3d8c7c7bfec7d3df6bb1da0aa3cf4
Submitter: Zuul
Branch: stable/queens

commit b006a5ba91d3d8c7c7bfec7d3df6bb1da0aa3cf4
Author: Matt Riedemann <email address hidden>
Date: Sat Mar 3 06:11:12 2018 -0500

    Pass user context to virt driver when detaching volume

    We need this in a later change to pull volume attachment
    information from cinder for the volume being detached so
    that we can do some attachment counting for multiattach
    volumes being detached from instances on the same host.

    Change-Id: I751fcb7532679905c4279744919c6cce84a11eb4
    Related-Bug: #1752115
    (cherry picked from commit d2941bfd165055348dd584b630d4e631ef05e328)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/550221
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=370c10453f95cfc42465e6f4b19d96d468b440e0
Submitter: Zuul
Branch: stable/queens

commit 370c10453f95cfc42465e6f4b19d96d468b440e0
Author: John Griffith <email address hidden>
Date: Wed Feb 28 00:56:29 2018 +0000

    Check for multiattach before removing connections

    With the addition of multiattach we need to ensure that we
    don't make brick calls to remove connections on detach volume
    if that volume is attached to another Instance on the same
    node.

    This patch adds a new helper method (_should_disconnect_target)
    to the virt driver that will inform the caller if the specified
    volume is attached multiple times to the current host.

    The general strategy for this call is to fetch a current reference
    of the specified volume and then:
    1. Check if that volume has >1 active attachments
    2. Fetch the attachments for the volume and extract the server_uuids
       for each of the attachments.
    3. Check the server_uuids against a list of all known server_uuids
       on the current host. Increment a connection_count for each item
       found.

    If the connection_count is >1 we return `False` indicating that the
    volume is being used by more than one attachment on the host and
    we therefore should NOT destroy the connection.

    *NOTE*
    This scenario is very different than the `shared_targets`
    case (for which we supply a property on the Volume object). The
    `shared_targets` scenario is specifically for Volume backends that
    present >1 Volumes using a single Target. This mechanism is meant
    to provide a signal to consumers that locking is required for the
    creation and deletion of initiator/target sessions.

    Closes-Bug: #1752115

    Change-Id: Idc5cecffa9129d600c36e332c97f01f1e5ff1f9f
    (cherry picked from commit 139426d51416313e8b14a335695adc7165bc4848)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.1

This issue was fixed in the openstack/nova 17.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b1

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.