os_brick.exception.VolumeDeviceNotFound

Bug #2047580 reported by Dave West
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
os-brick
Fix Committed
Undecided
Unassigned

Bug Description

Running OpenStack 2023.2 with Cinder and PureStorage Flash Array as the backend. I am leveraging NVMe/TCP as the protocol. I am able to create a blank volume on the FA through Horizon. If I mark that volume as bootable and create an instance with it, or create a volume with an image, the volumes fail.

The most prevalent error I get is: "raise exception.VolumeDeviceNotFound(device=target.nqn)#0122023-12-24 18:22:36.440 53820 ERROR oslo_messaging.rpc.server os_brick.exception.VolumeDeviceNotFound: Volume device not found at nqn.2010-06.com.purestorage:flasharray"

When the workflow kicks off, I can see the volume mounted on the host.

nvme list
Node Generic SN Model Namespace Usage Format FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1 /dev/ng0n1 0EC012D099B841EB Pure Storage FlashArray 0x3f 1.07 GB / 1.07 GB 512 B + 0 B 6.5.1

ls /dev/disk/by-id/nvm*
/dev/disk/by-id/nvme-Pure_Storage_FlashArray_0EC012D099B841EB /dev/disk/by-id/nvme-Pure_Storage_FlashArray_0EC012D099B841EB_63 /dev/disk/by-id/nvme-eui.003244296e0f444d24a9379600011caa

I tried turning on debugging and verbose loggin to see the commands being sent to help understand where the process is failing, but unable to see that.

Revision history for this message
Dave West (dwest576) wrote :
Revision history for this message
Dave West (dwest576) wrote :
Revision history for this message
Dave West (dwest576) wrote :
Revision history for this message
Simon Dodsley (simon-dodsley) wrote :

Afdter looking at the logs I see a number of

ERROR os_brick.initiator.connectors.nvmeof [None req-09f091dc-0934-4609-8e7e-cfdca70d5fea 268785dd82794839ba2ff15fc962c0a6 a17cd616f3094b89bbb0a201a8a8a9b2 - - - -] Could not connect to Portal tcp at 10.136.194.68:4420 (ctrl: None): exit_code: 1, stdout: "", stderr: "already connected#012",: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.

These look to be the root cause, as the Pure cinder driver is doing exactly what it is supposed to be doing, with regards to managing the volume and host connections for the FlashArray on the host.

Revision history for this message
Simon Dodsley (simon-dodsley) wrote :

Also note that this is being run on RHEL 9 using openstack-ansible as the deployment toolset.

Revision history for this message
Gorka Eguileor (gorka) wrote :

There are at least 2 known issues with the nvme-of connector in os-brick with the nvme CLI v2 for which we have patches to fix them [1][2], there's also a possible race condition fix [3], and 2 hostnqn creation fixes [4][5].

I cannot tell if they fix this issue because attached logs are not in DEBUG mode, so there is not enough os-brick information to know what's happening.

[1]: https://review.opendev.org/c/openstack/os-brick/+/895194/2?usp=related-change
[2]: https://review.opendev.org/c/openstack/os-brick/+/895195/2?usp=related-change
[3]: https://review.opendev.org/c/openstack/os-brick/+/895193/2?usp=related-change
[4]: https://review.opendev.org/c/openstack/os-brick/+/895202/1?usp=related-change
[5]: https://review.opendev.org/c/openstack/os-brick/+/895203/2?usp=related-change

Revision history for this message
Dave West (dwest576) wrote :

Thanks for the update, do you have a date when these merges will go in?

Revision history for this message
Dave West (dwest576) wrote :

FYI - the nvmeof-wait-reconnecting tops fixes this issue.

Changed in os-brick:
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.