NVMe-oF: Connects wrong volume to instance

Bug #1964388 reported by Gorka Eguileor
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
os-brick
Fix Released
Medium
Gorka Eguileor

Bug Description

For NVMe-oF Cinder drivers that use the initial format of the connector properties, don't provide the nguid information, and share the subsystem, we can end up with the wrong volume attached to an instance.

This is caused by a race condition between mapping one or more volumes on Cinder-Volume and attaching a previous one on Nova dure to the auto scans that happen on nvme when receiving the AER message.

Tags: nvme
Changed in os-brick:
importance: Undecided → Medium
Changed in os-brick:
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-brick (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/os-brick/+/836060

Changed in os-brick:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-brick (master)

Reviewed: https://review.opendev.org/c/openstack/os-brick/+/836060
Committed: https://opendev.org/openstack/os-brick/commit/4c21b40df20ef497c7a3c82ffcf32fe21621034c
Submitter: "Zuul (22348)"
Branch: master

commit 4c21b40df20ef497c7a3c82ffcf32fe21621034c
Author: Gorka Eguileor <email address hidden>
Date: Tue Jun 7 18:26:13 2022 +0200

    NVMe-oF: Consolidate code paths

    We currently have 2 different connection information formats: The
    initial one and the new one.

    Within the new format we have 2 variants: Replicated and Not replicated.

    Currently the nvmeof connector has multiple code paths and we end up
    finding bugs that affect one path and not the other, and bugs that are
    present in both may end up getting fixed in only one of them.

    This patch consolidates both formats by converting the connection
    information into a common internal representation, thus reducing the
    number of code paths.

    Thanks to this consolidation the Cinder drivers are less restricted on
    how they can identify a volume in the connection information. They are
    no longer forced to pass the NVMe UUID (in case they cannot get that
    information or the backend doesn't support it) and they can provide
    other information instead (nguid or namespace id).

    This connection properties format consolidation is explained in the new
    NVMeOFConnProps class docstring.

    By consolidating the code paths a number of bugs get fixed
    automatically, because they were broken in one path but not in the
    other. As examples, the old code path didn't have rescans but the new
    one did, and the old code path had device flush but the new one didn't.

    Given the big refactoring needed to consolidate everything this patch
    also accomplishes the following things:

    - Uses Portal, Target, and NVMeOFConnProps classes instead of using the
      harder to read and error prone dictionaries and tuples.

    - Adds method annotations.

    - Documents most methods (exect the raid ones which I'm not familiar
      with)

    - Adds the connector to the docs.

    - Makes method signatures conform with Cinder team standards: no more
      static methods passing self and no more calling class methods using
      the class name instead of self.

    Closes-Bug: #1964590
    Closes-Bug: #1964395
    Closes-Bug: #1964388
    Closes-Bug: #1964385
    Closes-Bug: #1964380
    Closes-Bug: #1964383
    Closes-Bug: #1965954
    Closes-Bug: #1903032
    Related-Bug: #1961102
    Change-Id: I6c2a952f7e286f3e3890e3f2fcb2fdd1063f0c17

Changed in os-brick:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-brick 6.1.0

This issue was fixed in the openstack/os-brick 6.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.