* New upstream release (LP: #2037266):
- Fixed an issue where the NVSwitch driver would not retrain
NVLinks on init correctly on HGX 8 H100, in case they faulted
earlier (such a due to GPU resets). This would result in links
being down and CUDA workloads failing with "system not yet
initialized" error. The issue was introduced in the 535.86.10
driver and fixed in 535.104.12 and later drivers.
This bug was fixed in the package nvidia- graphics- drivers- 535-server - 535.104. 12-0ubuntu0. 22.04.1
--------------- graphics- drivers- 535-server (535.104. 12-0ubuntu0. 22.04.1) jammy; urgency=medium
nvidia-
* New upstream release (LP: #2037266):
- Fixed an issue where the NVSwitch driver would not retrain
NVLinks on init correctly on HGX 8 H100, in case they faulted
earlier (such a due to GPU resets). This would result in links
being down and CUDA workloads failing with "system not yet
initialized" error. The issue was introduced in the 535.86.10
driver and fixed in 535.104.12 and later drivers.
-- Alberto Milone <email address hidden> Mon, 25 Sep 2023 16:30:44 +0000