Detaching multiple NVMe-oF volumes may leave the subsystem in connecting state

Bug #2035375 reported by Gorka Eguileor
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Gorka Eguileor

Bug Description

When detaching multiple NVMe-oF volumes from the same host we may end with a NVMe subsystem in "connecting" state, and we'll see a bunch nvme error in dmesg.

This happens on storage systems that share the same subsystem for multiple volumes because Nova has not been updated to support the tri-state "shared_targets" option that groups the detach and unmap of volumes to prevent race conditions.

This is related to the issue mentioned in an os-brick commit message: https://review.opendev.org/c/openstack/os-brick/+/836062/12//COMMIT_MSG

Tags: volumes
Gorka Eguileor (gorka)
Changed in nova:
assignee: nobody → Gorka Eguileor (gorka)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/895192

Changed in nova:
status: New → In Progress
melanie witt (melwitt)
tags: added: volumes
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/895192
Committed: https://opendev.org/openstack/nova/commit/18163761d02fc02d5484f91bf52cd4f25536f95e
Submitter: "Zuul (22348)"
Branch: master

commit 18163761d02fc02d5484f91bf52cd4f25536f95e
Author: Gorka Eguileor <email address hidden>
Date: Tue Sep 12 20:53:15 2023 +0200

    Fix guard for NVMeOF volumes

    When detaching multiple NVMe-oF volumes from the same host we may end
    with a NVMe subsystem in "connecting" state, and we'll see a bunch nvme
    error in dmesg.

    This happens on storage systems that share the same subsystem for
    multiple volumes because Nova has not been updated to support the
    tri-state "shared_targets" option that groups the detach and unmap of
    volumes to prevent race conditions.

    This is related to the issue mentioned in an os-brick commit message [1]

    For the guard_connection method of os-brick to work as expected for
    NVMe-oF volumes we need to use microversion 3.69 when retrieving the
    cinder volume.

    In microversion 3.69 we started reporting 3 states for shared_targets:
    True, False, and None.

    - True is to guard iSCSI volumes and will only be used if the iSCSI
      initiator running on the host doesn't have the manual scans feature.

    - False is that no target/subsystem is being shared so no guard is
      necessary.

    - None is to force guarding, and it's currenly used for NVMe-oF volumes
      when sharing the subsystem.

    [1]: https://review.opendev.org/c/openstack/os-brick/+/836062/12//COMMIT_MSG

    Closes-Bug: #2035375
    Change-Id: I4def1c0f20118d0b8eb7d3bbb09af2948ffd70e1

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 30.0.0.0rc1

This issue was fixed in the openstack/nova 30.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.