Nvidia fabric-manager-535 version incompatible with Nvidia Driver
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
fabric-manager-535 (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
(On a server with NVidia HGX platform with 8 A100 on Ubuntu 22.04.4 LTS)
The new version of the cuda-drivers-
```
# apt-cache policy cuda-drivers-
Candidat : 535.161.
Table de version :
535.
500 http://
500 http://
# apt-cache policy nvidia-driver-535
nvidia-driver-535:
Installé : (aucun)
Candidat : 535.171.
Table de version :
535.
500 http://
500 http://
```
They are incompatible, systemctl status on failed nvidia-
```
nv-fabricmanage
```
I also ran into this. In order to obtain a version of nvidia- fabricmanager and nvidia-driver that both had a matching GPU driver interface version, I had to install nvidia- driver- 550-server (550.54.15) which matched nvidia- fabricmanager (550.54.15). nvidia-driver-550 was (550.67), which didn't match.
For your case, driver version 535, the following two packages match: driver- 535-server (535.161. 08-0ubuntu2. 22.04.1) fabricmanager- 535 (535.161. 08-0ubuntu3. 22.04.1)
- nvidia-
- nvidia-
Here are my matching packages as installed.
``` driver- 550-server driver- 550-server: 15-0ubuntu0. 22.04.2 15-0ubuntu0. 22.04.2 15-0ubuntu0. 22.04.2 500 archive. ubuntu. com/ubuntu jammy-updates/ restricted amd64 Packages archive. ubuntu. com/ubuntu jammy-security/ restricted amd64 Packages dpkg/status
$ apt-cache policy nvidia-
nvidia-
Installed: 550.54.
Candidate: 550.54.
Version table:
*** 550.54.
500 http://
500 http://
100 /var/lib/
$ apt-cache policy nvidia- fabricmanager- 550 fabricmanager- 550: 15-0ubuntu0. 22.04.1 15-0ubuntu0. 22.04.1 15-0ubuntu0. 22.04.1 500 archive. ubuntu. com/ubuntu jammy-updates/ restricted amd64 Packages archive. ubuntu. com/ubuntu jammy-security/ restricted amd64 Packages dpkg/status
nvidia-
Installed: 550.54.
Candidate: 550.54.
Version table:
*** 550.54.
500 http://
500 http://
100 /var/lib/
```