nvme-cli: fguid is printed as binary data and causes MAAS to fail erasing NVME disks
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
High
|
Unassigned | ||
maas-images |
Triaged
|
Medium
|
Unassigned | ||
nvme-cli (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Jammy |
Fix Released
|
Medium
|
Matthew Ruffell |
Bug Description
[Impact]
When a user tries to release a system deployed with MAAS, that has erase disks on release set, erasing NVME disks fails on Jammy.
Traceback (most recent call last):
File "/tmp/user_
main()
File "/tmp/user_
disk_info = get_disk_info()
File "/tmp/user_
return {kname: get_disk_
File "/tmp/user_
return {kname: get_disk_
File "/tmp/user_
return get_nvme_
File "/tmp/user_
output = output.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 385: invalid start byte
This is due to maas_wipe.py running "nvme id-ctrl <device>" and parsing the results. This should be human readable data, in string format, so utf-8 should be appropriate for MAAS to use.
Instead, the "fguid" field is being printed as binary data, and is not parsable as utf-8.
e.g. From comment #8.
The user sees:
`fguid : 2.`
on closer inspection, the hex is:
x32,0x89,0x82,0x2E
Note it is cut off early, likely because the next byte would be 0x00, and is being interprested as a null byte.
Fix nvme-cli such that we print out the fguid as a correct utf-8 string, so MAAS works as intended.
[Testcase]
Deploy Jammy onto a system that has a NVME device.
$ sudo apt install nvme-cli
Run the 'id-ctrl' command and look at the fguid entry:
$ sudo nvme id-ctrl /dev/nvme1n1 | grep fguid
fguid :
Due to the UUID being all zeros, this was interpreted as a null byte, and the UUID was not printed correctly.
There is a test package available in the following ppa:
https:/
If you install the test package, the fguid will be printed as a proper string:
$ sudo nvme id-ctrl /dev/nvme1n1 | grep fguid
fguid : 00000000-
Also check that json output works as expected:
$ sudo nvme id-ctrl -o json /dev/nvme1n1 | grep fguid
"fguid" : "00000000-
Additionally, also test that the new package allows a MAAS deployed system to
be released correctly with the erase option enabled, as maas_wipe.py should now
complete successfully.
[Where problems could occur]
We are changing the output of the 'id-ctrl' subcommand. No other subcommands are changed. Users who for some reason rely on broken, incomplete binary data that is printed might be impacted. For users doing a hard diff of the command output, the output will now change to reflect the actual fguid, and might need a change. The fguid is now supplied in json output for 'id-ctrl', and might change programs parsing the json object.
There are no workarounds, and if a regression were to occur, it would only affect the 'id-ctrl' subcommand, and not change anything else.
[Other info]
Upstream bug:
https:/
This was fixed in the below commit in version 2.2, found in mantic and later:
commit 78b7ad235507ddd
From: Pierre Labat <email address hidden>
Date: Fri, 26 Aug 2022 17:02:08 -0500
Subject: nvme-print: Print fguid as a UUID
Link: https:/
The commit required a minor backport. In later versions, a major refactor occurred that changed nvme_uuid_
description: | updated |
summary: |
- Failed to wipe Micron 7400 MTFDKBA960TDZ during machine release + nvme-cli: fguid is printed as binary data and causes MAAS to fail + erasing NVME disks |
description: | updated |
tags: | added: sts |
description: | updated |
Changed in maas: | |
status: | Triaged → Fix Released |
Changed in maas: | |
milestone: | 3.5.x → 3.5.1 |
Hi maasuser,
Can you run the next commands and post the output?
> nvme id-ctrl $disk
> nvme id-ns $disk