Only one NVMe connection may be functional on Compute node

Bug #1792313 reported by Hamdy Khader
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Hamdy Khader
os-brick
Fix Released
Undecided
Hamdy Khader

Bug Description

Only one NVMe connection is active and working properly on the same Compute.
When I try to attach second volume to 2nd VM on the same Compute node, the operation fails.

Nova log shows error when executing "nvme list" (exit code non-zero)

Reproduction:
1. Deploy tripleO master on 3 nodes
- 1st host: Controller and NVMf target as cinder backend
- 2nd and 3rd host: Compute nodes
2. Spawn VM-1, create volume and attach volume to VM-1
3. Spawn 2nd VM-2 on the same compute node, create new volume and attach the volume to VM-2.

Hamdy Khader (hamdyk)
Changed in os-brick:
assignee: nobody → Hamdy Khader (hamdyk)
status: New → Fix Committed
Hamdy Khader (hamdyk)
Changed in nova:
assignee: nobody → Hamdy Khader (hamdyk)
status: New → In Progress
Changed in os-brick:
status: Fix Committed → In Progress
Revision history for this message
Hamdy Khader (hamdyk) wrote :
Revision history for this message
Hamdy Khader (hamdyk) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/602351
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a1325b4c76ba5ad42a09370d6250338cfb3de533
Submitter: Zuul
Branch: master

commit a1325b4c76ba5ad42a09370d6250338cfb3de533
Author: Hamdy Khader <email address hidden>
Date: Thu Sep 13 16:24:54 2018 +0300

    Set defult value of num_nvme_discover_tries=5

    Discovering newly connected devices in the initiator side can be slow
    when there are old connections, increasing the retries is important
    to discover the new connected device.

    Change-Id: I62a8162bf96d51f7252cfefbffc1a46010a3a612
    Closes-Bug: #1792313

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/608683

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-brick (master)

Reviewed: https://review.openstack.org/602332
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=fab367cc265593ac7b46feca8cb13da1385a3d03
Submitter: Zuul
Branch: master

commit fab367cc265593ac7b46feca8cb13da1385a3d03
Author: Hamdy Khader <email address hidden>
Date: Thu Sep 13 14:38:11 2018 +0300

    Retry executing command "nvme list" when fail

    Only one NVMe connection can be active on the same Compute node, when
    initiator side has more than one connection then new connection is made,
    results of the new connected device is not instantaneous.

    Fixing this issue requires retry and sleep when executing command
    "nvme list" to retrieve the newly connected device.

    Change-Id: I6b70140be7023770b83603de723d6d2de3ebb747
    Closes-Bug: #1792313

Changed in os-brick:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-brick (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/613774

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-brick (stable/rocky)

Reviewed: https://review.openstack.org/613774
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=a8215ba452c5d4c5f815cd22f49fa2e20122fb31
Submitter: Zuul
Branch: stable/rocky

commit a8215ba452c5d4c5f815cd22f49fa2e20122fb31
Author: Hamdy Khader <email address hidden>
Date: Thu Sep 13 14:38:11 2018 +0300

    Retry executing command "nvme list" when fail

    Only one NVMe connection can be active on the same Compute node, when
    initiator side has more than one connection then new connection is made,
    results of the new connected device is not instantaneous.

    Fixing this issue requires retry and sleep when executing command
    "nvme list" to retrieve the newly connected device.

    Change-Id: I6b70140be7023770b83603de723d6d2de3ebb747
    Closes-Bug: #1792313
    (cherry picked from commit fab367cc265593ac7b46feca8cb13da1385a3d03)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/608683
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3d4e4a4df64910a5d3d95c12e9acd0ddba70843f
Submitter: Zuul
Branch: stable/rocky

commit 3d4e4a4df64910a5d3d95c12e9acd0ddba70843f
Author: Hamdy Khader <email address hidden>
Date: Thu Sep 13 16:24:54 2018 +0300

    Set defult value of num_nvme_discover_tries=5

    Discovering newly connected devices in the initiator side can be slow
    when there are old connections, increasing the retries is important
    to discover the new connected device.

    Change-Id: I62a8162bf96d51f7252cfefbffc1a46010a3a612
    Closes-Bug: #1792313
    (cherry picked from commit a1325b4c76ba5ad42a09370d6250338cfb3de533)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-brick 2.6.2

This issue was fixed in the openstack/os-brick 2.6.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.1.0

This issue was fixed in the openstack/nova 18.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-brick 2.5.5

This issue was fixed in the openstack/os-brick 2.5.5 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.