> So, the algorithm outline would be something like this: to clarify my understanding, the 'ABORT' just means to not use the nvme 'format' command, and fallback to some other erase method. Also for reference I'm looking at this nvme spec: https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf > > (1) Check id-ctrl for "oacs" - if bit 1 is not set, ABORT. yep > > (2a) Check id-ctrl for oacs bit 3 (namespace management support). > (2b) Check id-ns for "nn". you mean check id-ctrl for nn, but this field doesn't actually refer to the count of namespaces, it refers to the current maximum namespace number. So if you have just a single namespace, but it's NSID is 0x10, then this field would show 0x10. In sec 6.1.3 "Allocated and Unallocated NSID Types" (and a few following sections), it shows how you can have multiple namespaces, but only the 'allocated' and 'active' ones actually are available in the system (and have anything in them). At least, that's my reading of the spec... > (2c) Check id-ctrl for "fna" bit 1 ( *per-namespace* secure erasing support ). > -> If "fna" bit 1 is set *and* ("oacs" bit 3 is set *and* "nn" > 1), ABORT well, i think it would be more like: if (oacs(bit3) == 1) if (fna(bit1) == 1) if (count(nvme list-ns /dev/nvmeX) > 1) ABORT the nvme 'list-ns' command issues the identify command with CNS set to 0x02 (active namespace id list). You can also use --all to send CNS 0x10 (allocated namespace id list), but I think active is probably what we care about here. See spec section 5.15.1, specifically figure 244. And also, I think this situation - a nvme controller that *does* support namespaces, and *does* support the format command, but *doesn't* support per-namespace formatting - seems *really* unlikely. But yeah, per the spec it's possible. > (unsafe, risking to erase all user's namespaces - see [0] below). > > (3) Check "fna" bit 2: if set, we're going to use crypto erase, faster and less > degrading to device; if not set, we're going to use regular user erase. yep. > > (4) Check id-ns "flbas" to determine the previously used LBA setting. > > (5) Execute "nvme format" command with the previously gathered "--ses" option (2 to > crypto erase or 1 to regular user erase), "--lbaf" option (from id-ns flbas) and > with a timeout to be determined (see [1] below). yep, sounds correct. > > (6) If any step (1)-(5) fails, fallback to zeroing the nvme device. There is also the "sanitize" operation, see spec section 8.15, although this is also optional. And, if neither format nor sanitize are available, there is also the "write zeros" command, spec section 6.17 (unfortunately, this is *also* optional, bit 3 in the ONCS field...seems like everything is optional in the spec...). If write zeros is supported, it also (again, optionally) supports discard. So the 'nvme write-zeroes' command, if the drive supports it, will likely be much faster than actually writing zero-blocks to the entire drive (and, probably much better for the drive, too). And using the -d param (to deallocate/discard the blocks) should help restore nvme performance by 'freeing up' all the drive's blocks for the firmware to use (e.g. like the 'fstrim' command). > > [0] Formatting with multiple namespaces is risky - certain drives will erase data in all > namespaces. For safety reasons, we could fallback to zeroing the device in case more than 1 > namespace is configured, even if fna bit 1 is clear (a buggy device firmware would cause a > irreversible data loss in this case). Opinions are welcome! if maas is managing the entire box, then i'm not sure why maas would ever want to erase *only* specific namespace(s), not the entire nvme (i.e. all namespaces). The current maas web ui I'm looking at has, during Release, the options: "Erase disks before releasing" "Use secure erase" "Use quick erase" (side note: not sure why it's possible to select *both* secure *and* quick erase...) if that's the only user-visible erase config choices, then I guess it depends if "erase disks..." does erase *all* system disks on release, or *only* the disks in the "used" section of the system storage config. If only "used" disks are erased, then it does matter if the secure erase wipes other namespaces...but if maas erases *all* disks, then wiping all namespaces is correct. > > [1] I think 20/30 minutes as a timeout for nvme formatting is more than enough, I'd like > to gather opinions here. I'm pretty sure all the approaches should be quite fast, with the only exception being actual final fallback of writing zero'ed blocks to every lba on the drive. I really, really hope that there is no nvme drive out there that doesn't support *any* of the other methods - I think at *minimum* all nvme drives should support write-zero with discard. I'd be really surprised to see a drive not support write-zero/discard, as (I think) that would mean 'fstrim' would not work on it, and its lifetime would be significantly reduced...