Swap volume of multiattached volume will corrupt data

Bug #1775418 reported by Matthew Booth on 2018-06-06
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
High
Unassigned
Nominated for Queens by Matt Riedemann
OpenStack Compute (nova)
High
Matt Riedemann
Queens
High
Matt Riedemann
Rocky
Undecided
Matt Riedemann
Stein
Undecided
Matt Riedemann

Bug Description

We currently permit the following:

Create multiattach volumes a and b
Create servers 1 and 2
Attach volume a to servers 1 and 2
swap_volume(server 1, volume a, volume b)

In fact, we have a tempest test which tests exactly this sequence: api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach

The problem is that writes from server 2 during the copy operation on server 1 will continue to hit the underlying storage, but as server 1 doesn't know about them they won't be reflected on the copy on volume b. This will lead to an inconsistent copy, and therefore data corruption on volume b.

Also, this whole flow makes no sense for a multiattached volume because even if we managed a consistent copy all we've achieved is forking our data between the 2 volumes. The purpose of this call is to allow the operator to move volumes. We need a fundamentally different approach for multiattached volumes.

In the short term we should at least prevent data corruption by preventing swap volume of a multiattached volume. This would also cause the above tempest test to fail, but as I don't believe it's possible to implement the test safely this would be correct.

Matt Riedemann (mriedem) wrote :

As mentioned in the mailing list, I think this is also something to be controlled in Cinder during retype or volume live migration since that would be a fast fail for this scenario:

http://lists.openstack.org/pipermail/openstack-dev/2018-June/131234.html

Otherwise cinder calls swap volume in nova, which will fail back to cinder, and then cinder has to rollback; it's just easier to fail fast in the cinder API.

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: New → In Progress
importance: Undecided → High
tags: added: libvirt multiattach volumes
Eric Harney (eharney) on 2018-06-07
Changed in cinder:
importance: Undecided → High
Douglas Viroel (dviroel) on 2019-02-27
Changed in cinder:
assignee: nobody → Douglas (viroel)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Lee Yarwood (lyarwood)
Douglas Viroel (dviroel) on 2019-04-03
Changed in cinder:
assignee: Douglas Viroel (dviroel) → nobody
Matt Riedemann (mriedem) on 2019-04-09
Changed in nova:
assignee: Lee Yarwood (lyarwood) → Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Lee Yarwood (lyarwood)
Matt Riedemann (mriedem) on 2019-05-20
Changed in nova:
assignee: Lee Yarwood (lyarwood) → Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Lee Yarwood (lyarwood)
Changed in nova:
assignee: Lee Yarwood (lyarwood) → Stephen Finucane (stephenfinucane)
Matt Riedemann (mriedem) on 2019-05-22
Changed in nova:
assignee: Stephen Finucane (stephenfinucane) → Matt Riedemann (mriedem)

Reviewed: https://review.opendev.org/572790
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5a1d159d142997bb4288d4bf86d4e144334905cd
Submitter: Zuul
Branch: master

commit 5a1d159d142997bb4288d4bf86d4e144334905cd
Author: Matt Riedemann <email address hidden>
Date: Wed Jun 6 10:32:37 2018 -0400

    Block swap volume on volumes with >1 rw attachment

    If we're swapping from a multiattach volume that has more than one
    read/write attachment, another server on the secondary attachment could
    be writing to the volume which is not getting copied into the volume to
    which we're swapping, so we could have data loss during the swap.

    This change does volume read/write attachment counting for the volume
    we're swapping from and if there is more than one read/write attachment
    on the volume, the swap volume operation fails with a 400 BadRequest
    error.

    Depends-On: https://review.openstack.org/573025/
    Closes-Bug: #1775418
    Change-Id: Icd7fcb87a09c35a13e4e14235feb30a289d22778

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.opendev.org/662331
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9b21d1067a071ab17758068dfca5cd2ebd29d868
Submitter: Zuul
Branch: stable/stein

commit 9b21d1067a071ab17758068dfca5cd2ebd29d868
Author: Matt Riedemann <email address hidden>
Date: Wed Jun 6 10:32:37 2018 -0400

    Block swap volume on volumes with >1 rw attachment

    If we're swapping from a multiattach volume that has more than one
    read/write attachment, another server on the secondary attachment could
    be writing to the volume which is not getting copied into the volume to
    which we're swapping, so we could have data loss during the swap.

    This change does volume read/write attachment counting for the volume
    we're swapping from and if there is more than one read/write attachment
    on the volume, the swap volume operation fails with a 400 BadRequest
    error.

    Depends-On: https://review.openstack.org/573025/
    Closes-Bug: #1775418
    Change-Id: Icd7fcb87a09c35a13e4e14235feb30a289d22778
    (cherry picked from commit 5a1d159d142997bb4288d4bf86d4e144334905cd)

Reviewed: https://review.opendev.org/662333
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c19d602f9be536ee8412bac0f0951a995424f25e
Submitter: Zuul
Branch: stable/rocky

commit c19d602f9be536ee8412bac0f0951a995424f25e
Author: Matt Riedemann <email address hidden>
Date: Wed Jun 6 10:32:37 2018 -0400

    Block swap volume on volumes with >1 rw attachment

    If we're swapping from a multiattach volume that has more than one
    read/write attachment, another server on the secondary attachment could
    be writing to the volume which is not getting copied into the volume to
    which we're swapping, so we could have data loss during the swap.

    This change does volume read/write attachment counting for the volume
    we're swapping from and if there is more than one read/write attachment
    on the volume, the swap volume operation fails with a 400 BadRequest
    error.

    Conflicts:
          nova/tests/unit/compute/test_compute_api.py

    NOTE(mriedem): The conflict is due to Stein change
    I7d5bddc0aa1833cda5f4bcebe5e03bdd447f641a changing the
    decorators on the _test_snapshot_and_backup method.

    Depends-On: https://review.openstack.org/573025/
    Closes-Bug: #1775418
    Change-Id: Icd7fcb87a09c35a13e4e14235feb30a289d22778
    (cherry picked from commit 5a1d159d142997bb4288d4bf86d4e144334905cd)
    (cherry picked from commit 9b21d1067a071ab17758068dfca5cd2ebd29d868)

Reviewed: https://review.opendev.org/662340
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=58a140487c8127da727fa7e4fb56892f8c162536
Submitter: Zuul
Branch: stable/queens

commit 58a140487c8127da727fa7e4fb56892f8c162536
Author: Matt Riedemann <email address hidden>
Date: Wed Jun 6 10:32:37 2018 -0400

    Block swap volume on volumes with >1 rw attachment

    If we're swapping from a multiattach volume that has more than one
    read/write attachment, another server on the secondary attachment could
    be writing to the volume which is not getting copied into the volume to
    which we're swapping, so we could have data loss during the swap.

    This change does volume read/write attachment counting for the volume
    we're swapping from and if there is more than one read/write attachment
    on the volume, the swap volume operation fails with a 400 BadRequest
    error.

    Conflicts:
          api-ref/source/os-volume-attachments.inc

    NOTE(mriedem): The conflict is due to not having change
    I5fc4d0ba3bb1c49dfaba2b2eed056441509cb9da in Queens. Also, the
    uuidsentinel has to be imported in the test_volumes module because
    change I9acc2e4d6c57ea0f45ba161116993d9f1a0e1e9f is not in Queens.

    Depends-On: https://review.openstack.org/573025/
    Closes-Bug: #1775418
    Change-Id: Icd7fcb87a09c35a13e4e14235feb30a289d22778
    (cherry picked from commit 5a1d159d142997bb4288d4bf86d4e144334905cd)
    (cherry picked from commit 9b21d1067a071ab17758068dfca5cd2ebd29d868)
    (cherry picked from commit c19d602f9be536ee8412bac0f0951a995424f25e)

This issue was fixed in the openstack/nova 19.0.1 release.

This issue was fixed in the openstack/nova 18.2.1 release.

This issue was fixed in the openstack/nova 17.0.11 release.

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers