curtin

Bug #1830913
Comment #6

Comment 6 for bug 1830913

Revision history for this message

Dmitrii Shcherbakov (dmitriis) wrote on 2019-05-30:

> separate machines which have different controller serial numbers
> I believe that the controller is going to be the *same* on the commissioned node as when it is deployed

Yes, for 1 controller per server.

For the 1 server/2+ controllers case each controller (say of the same model) will have different serial numbers. If the original controller through which a namespace was accessible is removed from a server the expectation with multi-path is that another controller will provide access to it.

I believe this may be problematic in the following case:

1) 2 controllers are present in a server;
2) the server is commissioned and deployed (udev rules tied to WWN and SERIAL are generated for discovered namespaces);
3) the server is shutdown and one card is removed from it (the one with the serial written to a udev rule);
4) the server is brought back up and namespaces are rediscovered through a different controller with a different serial => by-dname rules did not match based on a different serial number.

Theoretically, just rebooting the server without removing 1 controller might yield this situation as well depending on the order they are enumerated in.

Does it look like a sane scenario or am I missing something obvious?

As for the implementation, in the 1.3 NVMe spec controllers have a CMIC field to indicate whether an NVM subsystem can contain multiple controllers or not and a subsystem NQN (CNTLID) based on which is used to put controllers under a subsystem in sysfs.

https://github.com/torvalds/linux/commit/180de0070048340868c7bc841fc12e75556bb629

> consider it a udev/kernel bug if nvme0c33n1 were to ever show up in a value in the udev database

Right, agreed.