Comment 34 for bug 1651602

Revision history for this message
Mike Pontillo (mpontillo) wrote : Re: [2.1.1] MAAS has nvme0n1 set as boot disk, curtin fails

After further troubleshooting with cgregan, we've further narrowed this down.

We ran the following script on the node that was having trouble:

https://gist.github.com/pontillo/0b92a7da2fba43fb5dce705be2dcf38b

Unlike all the other devices MAAS works with, the Intel NVMe device reports a serial number that cannot be found anywhere in /dev/disk/by-id/*. When curtin is supplied a serial number, it uses a heuristic to find the device as follows:

http://bazaar.launchpad.net/~curtin-dev/curtin/trunk/view/435/curtin/commands/block_meta.py#L270

http://bazaar.launchpad.net/~curtin-dev/curtin/trunk/view/435/curtin/block/__init__.py#L601

So arguably, this is a bug in the Intel NVMe serial number; the way it populates /dev/disk/* leaves much to be desired.

This is *arguably* a bug in curtin (and maybe MAAS, since we knowingly use the serial number even though `udevadm` can tell us that the serial cannot be found anywhere in /dev/disk/by-id/*), in that we could do a better job dealing with devices backed by not-so-robust kernel drivers. But I think we shouldn't encourage bad behavior on the part of driver writers, so I'm on the fence about whether or not we should fix it.

But mostly, I would argue that this is a bug in the Intel NVMe driver. The way they expose the device to userland is non-standard and arguably broken. When we ran `udevadm info -q all -n nvme0n1` on the device, we got the following pseudo-output:

nvme0n1:
P: /devices/pci0000:00/0000:00:xx.0/0000:xx:00.0/nvme/nvme0/nvme0n1
N: nvme0n1
S: SSDxxxxxxxxxx_CVMDxxxxxxxxxxxxxx
S: disk/by-id/nvme-INTEL
E: DEVLINKS=/dev/disk/by-id/nvme-INTEL /dev/SSDxxxxxxxxxx_CVMDxxxxxxxxxxxxxx
E: DEVNAME=/dev/nvme0n1
E: DEVPATH=/devices/pci0000:00/0000:00:xx.0/0000:xx:00.0/nvme/nvme0/nvme0n1
E: DEVTYPE=disk
E: ID_SERIAL=INTEL SSDxxxxxxxxxx_CVMDxxxxxxxxxxxxxx
E: ID_SERIAL_SHORT=CVMDxxxxxxxxxxxxxx
E: MAJOR=259
E: MINOR=0
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=xxxxxxx

You can see by the lines that start with "S:" and the "DEVLINKS=" line that the way this device is exposed is very non-standard. One would expect /dev/disk/by-id/* to contain a DEVLINK containing the serial number. Instead they expose a 'nvme-INTEL' link, which is (IMHO) a critical bug, because anyone expecting the things in /dev/disk/by-id/* to be unique will be in for a big surprise when they add a second NVMe device to a machine.