Comment 0 for bug 1803692

Revision history for this message
James Dingwall (a-james-launchpad) wrote :

I have a system containing two identical nvme devices. When booting a trusty PXE image with kernel 4.4.0-38-generic both devices are detected and available:

# nvme id-ctrl /dev/nvme0
NVME Identify Controller:
vid : 0x8086
ssvid : 0x8086
sn : BTHH82250N1X1P0E
mn : INTEL SSDPEKKF010T8L
fr : L08P
...

# nvme id-ctrl /dev/nvme1
NVME Identify Controller:
vid : 0x8086
ssvid : 0x8086
sn : BTHH82250N261P0E
mn : INTEL SSDPEKKF010T8L
fr : L08P
...

# dmesg | grep nvme
[ 5.106516] nvme0n1: p1 p2 p3 p4
[ 5.106615] nvme1n1: p1 p2

After booting a bionic PXE image based on 4.15.0-38-generic only the first nvme device is enabled, the second is detected but disabled as both devices have the same nqn:

nvme nvme1: ignoring ctrl due to duplicate subnqn (nqn.2017-12.org.nvmeexpress:uuid:11111111-2222-3333-4444-555555555555).
nvme nvme1: Removing after probe failure status: -22

The nqn string is found in the device firmware rather than being generated by Linux but there does not seem to be an operation in nvme-cli to change this. (It is also questionable if the device firmware value is correct according to section 7.9 of https://nvmexpress.org/wp-content/uploads/NVM-Express-1_3a-20171024_ratified.pdf. My reading of the specification is that the string should start nqn.2014-08.org.nvmeexpress:uuid: with a random UUID, and I assume a random UUID per device.)

The Windows 10 installation provided on the system did not have any problems operating with both devices.

Looking at the kernel nvme driver history suggests that in 4.4 it didn't care or validate the nqn but now it does there is a problem.

Our typical installation is a zpool mirror across two devices and this is preventing us moving from trusty to bionic.

This is a report of a similar issue: https://ask.fedoraproject.org/en/question/128422/one-of-two-identical-m2-nvme-drives-disabling-due-to-same-nqn/

It may be worth noting that if the nvme device does not provide an nqn then it seems one is generated based on the device serial number so a system with two Samsung MZVLB256HAHQ devices works fine.