GPGPU devices not fully named in the PCI Devices tab
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Expired
|
Low
|
Unassigned | ||
lxd (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
I have a machine with two nVidia A100 GPUs installed.
Looking at the GPU section of PCI Devices, I can see both cards, however, they are not named (see screenshot)
lshw data gathered during commissioning has the necessary pciid data for discovering the name:
<hints>
<hint name="icon" value="display" />
<hint name="pci.class" value="0x302" />
<hint name="pci.device" value="0x20F1" />
<hint name="pci.
<hint name="pci.
<hint name="pci.vendor" value="0x10DE" />
</hints>
and 10de:20f1 resolves to the nVidia Corporation A100 GPU.
I think the issue is described here:
https:/
Focal has a 2 year out of date pci.ids file, so I think this may be why commissioning can't identify the A100 GPUs.
The version of PCI IDs in Jammy does have the strings for identifying the A100 (and any other hardware added to the database since March 2020).
So while fixing the above bug will resolve this long term, MAAS should have a way to have an up to date pci.id database to avoid lapses like this without relying on SRUs to previous LTSs.
Looking forward, assuming my suspicions are correct and the outdated PCI.ID file is the culprit here, this means MAAS will continue failing to identifying some hardware until after 22.04 (or perhaps 22.04.1?) when we can finally use Jammy to commission harware.
Screenshot showing unidentified NVIDIA GPUs