Comment 15 for bug 1835954

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Hi Dan, thanks *a lot* for your input. Very good points, I'll address some of them below, sorry for my delay in responding:

"to clarify my understanding, the 'ABORT' just means to not use the nvme 'format' command"
-> Exactly

"you mean check id-ctrl for nn, but this field doesn't actually refer to the count of namespaces"
-> Agreed, it was my bad understanding of the spec! Thanks for pointing that.

"well, i think it would be more like:
if (oacs(bit3) == 1)
  if (fna(bit1) == 1)
    if (count(nvme list-ns /dev/nvmeX) > 1)
      ABORT"
"if maas is managing the entire box, then i'm not sure why maas would ever want to erase *only* specific namespace(s), not the entire nvme (i.e. all namespaces)."
"if that's the only user-visible erase config choices, then I guess it depends if "erase disks..."
does erase *all* system disks on release, or *only* the disks in the "used" section of the system storage config. If only "used" disks are erased, then it does matter if the secure erase wipes other namespaces...but if maas erases *all* disks, then wiping all namespaces is correct."

-> After your comments, Igor's comments and check the MAAS code in detail, I'd say we should forget about namespaces heheh
MAAS basically checks lsblk output and erase all disks, so...a namespace would show as a disk there, no need to be over-careful with that, let's just erase all namespaces.

"(side note: not sure why it's possible to select *both* secure *and* quick erase...)"
-> Odd right? There's some documentation in MAAS code:

' If both --secure-erase and --quick-erase are specified and the drive does NOT have a secure erase feature, maas-wipe will behave as if only --quick-erase was specified. If --secure-erase is specified and --quick-erase is NOT specified and the drive does NOT have a secure erase feature, maas-wipe will behave as if --secure-erase was NOT specified, i.e. will overwrite the whole disk with null bytes. This can be very slow.'

"There is also the "sanitize" operation, see spec section 8.15, although this is also optional."
-> Optional and pretty *rare* to find, also it may be slow as you commented later, and not all nvme-cli versions support that. I'd rather not mess with sanitize if possible.

"[...] there is also the "write zeros" command, spec section 6.17 (unfortunately, this is *also* optional, bit 3 in the ONCS field...seems like everything is optional in the spec...). If write zeros is supported, it also (again, optionally) supports discard. So the 'nvme write-zeroes' command, if the drive supports it, will likely be much faster than actually writing zero-blocks to the entire drive (and, probably much better for the drive, too). And using the -d param (to deallocate/discard the blocks) should help restore nvme performance by 'freeing up' all the drive's blocks for the firmware to use (e.g. like the 'fstrim' command)."

-> Awesome idea! We can implement that in the zeroing function in MAAS, so if the NVMe device does not support secure erase, we fallback to write-zeroes, much faster and HW-healthier. Also, on the same topic, "quick erase" writes 2MB in the beginning/end of the disk; I might change that, I guess what is more important for quick erase is clear the partition table, so I may add a "wipefs -a" there and write a bit more of zeroes on the beginning of the disk, this should be enough. "Quick erase" should perhaps be default, to prevent issues in subsequent deployments, but guess this is not a discussion for this LP =)