Concurrent requests to attach the same non-multiattach volume to multiple instances can succeed

Bug #1762687 reported by Lee Yarwood
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
Matt Riedemann
Nominated for Queens by Matt Riedemann

Bug Description

Description
===========

Discovered this by chance yesterday, at first glance this appears to be due to a lack of locking within c-api when we initial create the attachments and no additional validation when we update with the connector later in the attach flow. Reporting this against both nova and cinder for now.

$ nova volume-attach 77c092c2-9664-42b2-bf71-277b1bfad707 b4240f39-da7a-4372-b4ca-15a0c6121ac8 & nova volume-attach 91a7c490-5a9a-4048-a109-b1159b7f0e79 b4240f39-da7a-4372-b4ca-15a0c6121ac8
[1] 24949
+----------+--------------------------------------+
| Property | Value |
+----------+--------------------------------------+
| device | /dev/vdb |
| id | b4240f39-da7a-4372-b4ca-15a0c6121ac8 |
| serverId | 91a7c490-5a9a-4048-a109-b1159b7f0e79 |
| volumeId | b4240f39-da7a-4372-b4ca-15a0c6121ac8 |
+----------+--------------------------------------+
+----------+--------------------------------------+
| Property | Value |
+----------+--------------------------------------+
| device | /dev/vdb |
| id | b4240f39-da7a-4372-b4ca-15a0c6121ac8 |
| serverId | 77c092c2-9664-42b2-bf71-277b1bfad707 |
| volumeId | b4240f39-da7a-4372-b4ca-15a0c6121ac8 |
+----------+--------------------------------------+
$ cinder show b4240f39-da7a-4372-b4ca-15a0c6121ac8
+--------------------------------+----------------------------------------------------------------------------------+
| Property | Value |
+--------------------------------+----------------------------------------------------------------------------------+
| attached_servers | ['91a7c490-5a9a-4048-a109-b1159b7f0e79', '77c092c2-9664-42b2-bf71-277b1bfad707'] |
| attachment_ids | ['31b8c16f-07d0-4f0c-95d8-c56797a270dc', 'a7eb9cb1-b7be-44e3-a176-3c6989459aaa'] |
| availability_zone | nova |
| bootable | false |
| consistencygroup_id | None |
| created_at | 2018-04-09T19:02:46.000000 |
| description | None |
| encrypted | False |
| id | b4240f39-da7a-4372-b4ca-15a0c6121ac8 |
| metadata | attached_mode : rw |
| migration_status | None |
| multiattach | False |
| name | None |
| os-vol-host-attr:host | test.example.com@lvmdriver-1#lvmdriver-1 |
| os-vol-mig-status-attr:migstat | None |
| os-vol-mig-status-attr:name_id | None |
| os-vol-tenant-attr:tenant_id | fe3128ecf4704369ae3f7ede03f6bc29 |
| replication_status | None |
| size | 1 |
| snapshot_id | None |
| source_volid | None |
| status | in-use |
| updated_at | 2018-04-09T19:04:24.000000 |
| user_id | 57293b0839da449580ce7008c8734c1c |
| volume_type | lvmdriver-1 |
+--------------------------------+----------------------------------------------------------------------------------+

$ ll /dev/disk/by-path/ip-192.168.122.79:3260-iscsi-iqn.2010-10.org.openstack:volume-b4240f39-da7a-4372-b4ca-15a0c6121ac8-lun-0
lrwxrwxrwx. 1 root root 9 Apr 9 15:04 /dev/disk/by-path/ip-192.168.122.79:3260-iscsi-iqn.2010-10.org.openstack:volume-b4240f39-da7a-4372-b4ca-15a0c6121ac8-lun-0 -> ../../sdc

$ sudo virsh domblklist 77c092c2-9664-42b2-bf71-277b1bfad707
Target Source
------------------------------------------------
vda /opt/stack/data/nova/instances/77c092c2-9664-42b2-bf71-277b1bfad707/disk
vdb /dev/sdc

$ sudo virsh domblklist 91a7c490-5a9a-4048-a109-b1159b7f0e79
Target Source
------------------------------------------------
vda /opt/stack/data/nova/instances/91a7c490-5a9a-4048-a109-b1159b7f0e79/disk
vdb /dev/sdc

Steps to reproduce
==================
$ nova volume-attach $instance_1_uuid $volume_uuid & nova volume-attach $instance_2_uuid $volume_uuid

Expected result
===============
Only one of the requests succeeds.

Actual result
=============
Both requests succeed.

Environment
===========
1. Exact version of OpenStack you are running. See the following
  list for all releases: http://docs.openstack.org/releases/

$ cd ../nova
$ git describe --tags
17.0.0.0rc1-656-gebb4817ce3

$ cd ../cinder/
$ git describe --tags
12.0.0.0rc1-365-ga8a9dda30

2. Which hypervisor did you use?
   (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
   What's the version of that?

Libvirt + KVM

2. Which storage type did you use?
   (For example: Ceph, LVM, GPFS, ...)
   What's the version of that?

LVM + iSCSI

3. Which networking type did you use?
   (For example: nova-network, Neutron with OpenVSwitch, ...)

N/A

Revision history for this message
Lee Yarwood (lyarwood) wrote :

FWIW this does reproduce on stable/queens but not stable/pike.

description: updated
Revision history for this message
Ildiko Vancsa (ildiko-vancsa) wrote :

We removed some constraints on the Cinder side as we had issues with live migrate. I believe this might be the consequence of that. :/

Revision history for this message
Matt Riedemann (mriedem) wrote :

Right this is an issue since queens because with the new style volume attachment flow, a single volume and instance can have multiple attachments at the same time (for things like live migrate, as Ildiko mentioned). We fixed one side effect of this in Queens with this change:

https://review.openstack.org/#/c/525622/

However, that relies on querying the BDM table for the instance and volume in question, which if you're attaching the same volume to the same instance in concurrent requests, can race.

We can't 'lock' in the nova-api service, not without a DLM like etcd or something anyway. We also can't rely on Cinder for the uniqueness constraint here because we manage attachments using distinct attachment records, so I think this ultimately comes down to nova has to apply a unique constraint in the bdm table on instance and volume_id, which we discussed at the PTG:

https://etherpad.openstack.org/p/nova-ptg-rocky

L575.

And this is also semi-duplicate of bug 1427060 in that respect.

Revision history for this message
Matt Riedemann (mriedem) wrote :

I missed that this was the same volume to more than one instance, so a unique constraint on the bdm table for volume id and instance uuid won't fix that.

Revision history for this message
Matt Riedemann (mriedem) wrote :

So we have this query method:

https://github.com/openstack/nova/blob/0c441e636ba9d287909584b6ddf15eab5d479f0e/nova/objects/block_device.py#L327

Which would at least fail if there is more than one BDM for a given volume, which we could use to detect if the volume we're attaching is multiattach=False, but that doesn't alleviate the race before the BDMs are created, it just makes the window a bit smaller.

I was wondering if a multiattach column on the bdm table could somehow be used as part of a constraint, but I don't think it would, because with a normal multiattach volume, we could have 2 BDMs for the same volume attached to two separate instances, and the multiattach value would be the same, so that's essentially the same as having a unique constraint on volume_id/instance_uuid.

Revision history for this message
Matt Riedemann (mriedem) wrote :

(10:54:09 AM) mriedem: the solution to fix the race with attaching the same volume to the same instance concurrently is a unique constraint on the bdms table over the volume_id and instance_uuid columns
(10:54:16 AM) mriedem: but that doesn't fix lyarwood's new bug
(10:55:11 AM) mriedem: you can't put a unique constraint on just the volume_id column since that would break multiattach
(10:55:35 AM) mriedem: you almost need a conditional constraint, where volume_id must be unique if multiattach=False
(10:55:46 AM) mriedem: but there is no such thing as a conditional unique constraint is there?
(10:56:10 AM) mriedem: and jaypibbles ran off
(10:56:29 AM) dansmith: mriedem: I think you need an active=$id column to do that
(10:56:39 AM) dansmith: that's why we have deleted=$id I think
(10:56:47 AM) mriedem: https://en.wikipedia.org/wiki/Check_constraint

Revision history for this message
Matt Riedemann (mriedem) wrote :

Apparently mysql doesn't support check constraints so we can't rely on that:

http://docs.sqlalchemy.org/en/latest/core/constraints.html#check-constraint

Revision history for this message
Matt Riedemann (mriedem) wrote :

I wonder why the 2nd attachment create on the cinder side doesn't fail, unless Cinder has the same race. Because Cinder shouldn't allow >1 attachment record to the same volume for different instances if the volume is multiattach=False.

Revision history for this message
Matt Riedemann (mriedem) wrote :

This is the validation logic in cinder-api when creating an attachment record:

https://github.com/openstack/cinder/blob/d9242027c1235b63bcff15c2afd392bb9c28c2d7/cinder/volume/api.py#L2084-L2106

I wonder if the vref at this point is stale:

https://github.com/openstack/cinder/blob/d9242027c1235b63bcff15c2afd392bb9c28c2d7/cinder/volume/api.py#L2097

Because if the instance we're trying to attach does not match the instance already attached, this should be true:

   if attachment.instance_uuid != instance_uuid:
       override = False
       break

And then we should fail. I can push a Cinder patch for this, but don't have a reliable recreate at the moment.

Jay Bryant (jsbryant)
Changed in cinder:
importance: Undecided → High
Revision history for this message
John Griffith (john-griffith) wrote :

I'm not sure but I think our race may be here:
https://github.com/openstack/cinder/blob/master/cinder/volume/api.py#L2084

Initially the goal was to get rid of the intermediate states for attach (on the Cinder side) so a volume would go to 'in-use' straight away, but in your example now that we've added reserve back into the mix I suspect your volume is in an "attaching" or "reserved" state for a brief period when the second call is made. That check referenced above is only checking 'in-use', 'downloading' etc NOT reserved or attaching so I suspect that may be the problem.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/560074

Changed in cinder:
assignee: nobody → Matt Riedemann (mriedem)
status: New → In Progress
Matt Riedemann (mriedem)
Changed in nova:
status: New → Invalid
no longer affects: nova
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/560153

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/560074
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=dd93b0c7190681620c3eed9937462901da42c7c4
Submitter: Zuul
Branch: master

commit dd93b0c7190681620c3eed9937462901da42c7c4
Author: Matt Riedemann <email address hidden>
Date: Tue Apr 10 12:28:05 2018 -0400

    Refresh volume when checking for conflicting attachments

    We should only be able to create more than one attachment to the
    same volume if it's (1) multiattach=True or (2) it's multiattach=False
    AND the attachments are to the same instance. It is not valid to
    attach more than one instance to the same multiattach=False volume.

    The _attachment_reserve method is checking for this if the
    conditional update to the volume status fails, but it does
    not refresh the volume before checking the attachments. Since
    we could be racing to create attachments concurrently, when
    the request that failed the conditional update actually pulled
    the volume out of the DB, it might not have had any attachments,
    so we need to refresh it before checking the list of attachments
    to see if the instance we're trying to attach (reserve the volume)
    is the same as what's already attached to the volume.

    Change-Id: Iee78555163bbcbb5ff3e0ba008d7c87a3aedfb0f
    Closes-Bug: #1762687

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/queens)

Reviewed: https://review.openstack.org/560153
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=470e1bd1e42bfeb4da59e9048894be142f4cd25d
Submitter: Zuul
Branch: stable/queens

commit 470e1bd1e42bfeb4da59e9048894be142f4cd25d
Author: Matt Riedemann <email address hidden>
Date: Tue Apr 10 12:28:05 2018 -0400

    Refresh volume when checking for conflicting attachments

    We should only be able to create more than one attachment to the
    same volume if it's (1) multiattach=True or (2) it's multiattach=False
    AND the attachments are to the same instance. It is not valid to
    attach more than one instance to the same multiattach=False volume.

    The _attachment_reserve method is checking for this if the
    conditional update to the volume status fails, but it does
    not refresh the volume before checking the attachments. Since
    we could be racing to create attachments concurrently, when
    the request that failed the conditional update actually pulled
    the volume out of the DB, it might not have had any attachments,
    so we need to refresh it before checking the list of attachments
    to see if the instance we're trying to attach (reserve the volume)
    is the same as what's already attached to the volume.

    Change-Id: Iee78555163bbcbb5ff3e0ba008d7c87a3aedfb0f
    Closes-Bug: #1762687
    (cherry picked from commit dd93b0c7190681620c3eed9937462901da42c7c4)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 13.0.0.0b1

This issue was fixed in the openstack/cinder 13.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 12.0.2

This issue was fixed in the openstack/cinder 12.0.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.