Update the 535 driver series - 25/09/2023

Bug #2037266 reported by Alberto Milone
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers-535 (Ubuntu)
Fix Released
High
Alberto Milone
Focal
Fix Released
High
Alberto Milone
Jammy
Fix Released
High
Alberto Milone
Lunar
Fix Released
High
Alberto Milone
nvidia-graphics-drivers-535-server (Ubuntu)
Fix Released
High
Alberto Milone
Focal
Fix Released
High
Alberto Milone
Jammy
Fix Released
High
Alberto Milone
Lunar
Fix Released
High
Alberto Milone

Bug Description

[Impact]
These releases provide both bug fixes and new features, and we would like to
make sure all of our users have access to these improvements.

See the changelog entry below for a full list of changes and bugs.

[Test Case]
The following development and SRU process was followed:
https://wiki.ubuntu.com/NVidiaUpdates

Certification test suite must pass on a range of hardware:
https://git.launchpad.net/plainbox-provider-sru/tree/units/sru.pxu

The QA team that executed the tests will be in charge of attaching the artifacts and console output of the appropriate run to the bug. nVidia maintainers team members will not mark ‘verification-done’ until this has happened.

[Regression Potential]
In order to mitigate the regression potential, the results of the
aforementioned system level tests are attached to this bug.

[Discussion]

[Changelog]

535 (535.113.01):
https://www.nvidia.com/Download/driverResults.aspx/211711/en-us/

535-server (535.104.12):
https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-104-12/index.html

Changed in nvidia-graphics-drivers-535 (Ubuntu):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Alberto Milone (albertomilone)
Changed in nvidia-graphics-drivers-535 (Ubuntu Focal):
assignee: nobody → Alberto Milone (albertomilone)
Changed in nvidia-graphics-drivers-535 (Ubuntu Jammy):
assignee: nobody → Alberto Milone (albertomilone)
Changed in nvidia-graphics-drivers-535 (Ubuntu Lunar):
assignee: nobody → Alberto Milone (albertomilone)
Changed in nvidia-graphics-drivers-535-server (Ubuntu):
assignee: nobody → Alberto Milone (albertomilone)
Changed in nvidia-graphics-drivers-535-server (Ubuntu Focal):
assignee: nobody → Alberto Milone (albertomilone)
Changed in nvidia-graphics-drivers-535-server (Ubuntu Jammy):
assignee: nobody → Alberto Milone (albertomilone)
Changed in nvidia-graphics-drivers-535-server (Ubuntu Lunar):
assignee: nobody → Alberto Milone (albertomilone)
Changed in nvidia-graphics-drivers-535 (Ubuntu Focal):
status: New → In Progress
Changed in nvidia-graphics-drivers-535 (Ubuntu Jammy):
status: New → In Progress
Changed in nvidia-graphics-drivers-535 (Ubuntu Lunar):
status: New → In Progress
Changed in nvidia-graphics-drivers-535-server (Ubuntu):
status: New → In Progress
Changed in nvidia-graphics-drivers-535-server (Ubuntu Focal):
status: New → In Progress
Changed in nvidia-graphics-drivers-535-server (Ubuntu Jammy):
status: New → In Progress
Changed in nvidia-graphics-drivers-535-server (Ubuntu Lunar):
status: New → In Progress
Changed in nvidia-graphics-drivers-535 (Ubuntu Focal):
importance: Undecided → High
Changed in nvidia-graphics-drivers-535 (Ubuntu Jammy):
importance: Undecided → High
Changed in nvidia-graphics-drivers-535 (Ubuntu Lunar):
importance: Undecided → High
Changed in nvidia-graphics-drivers-535-server (Ubuntu):
importance: Undecided → High
Changed in nvidia-graphics-drivers-535-server (Ubuntu Focal):
importance: Undecided → High
Changed in nvidia-graphics-drivers-535-server (Ubuntu Jammy):
importance: Undecided → High
Changed in nvidia-graphics-drivers-535-server (Ubuntu Lunar):
importance: Undecided → High
description: updated
Revision history for this message
Francis Ginther (fginther) wrote :

I've completed the cuda based testing for these drivers. This includes

 * amd64 testing for the DKMS for bionic, focal, jammy and lunar.
 * arm64 testing for the DKMS for focal and jammy
 * amd64 testing of the LRM packages against the generic kernel for bionic, focal, jammy and lunar
 * amd64 testing of the LRM packages against the nvidia kernel for jammy

This matches the testing done for prior driver releases.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nvidia-graphics-drivers-535-server - 535.104.12-0ubuntu0.23.04.1

---------------
nvidia-graphics-drivers-535-server (535.104.12-0ubuntu0.23.04.1) lunar; urgency=medium

  * New upstream release (LP: #2037266):
    - Fixed an issue where the NVSwitch driver would not retrain
      NVLinks on init correctly on HGX 8 H100, in case they faulted
      earlier (such a due to GPU resets). This would result in links
      being down and CUDA workloads failing with "system not yet
      initialized" error. The issue was introduced in the 535.86.10
      driver and fixed in 535.104.12 and later drivers.

 -- Alberto Milone <email address hidden> Mon, 25 Sep 2023 16:18:16 +0000

Changed in nvidia-graphics-drivers-535-server (Ubuntu Lunar):
status: In Progress → Fix Released
Revision history for this message
Andy Whitcroft (apw) wrote : Update Released

The verification of the Stable Release Update for nvidia-graphics-drivers-535-server has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nvidia-graphics-drivers-535 - 535.113.01-0ubuntu0.23.04.1

---------------
nvidia-graphics-drivers-535 (535.113.01-0ubuntu0.23.04.1) lunar; urgency=medium

  * New upstream release (LP: #2037266):
    - Fixed a bug that could cause GPU memory utilization to be
      reported incorrectly for Multi-Instance GPU (MIG) partitions on
      Grace Hopper systems.
    - Fixed a bug that intermittently caused the display to freeze
      when resuming from suspend on some Ada GPUs.

 -- Alberto Milone <email address hidden> Mon, 25 Sep 2023 09:32:34 +0000

Changed in nvidia-graphics-drivers-535 (Ubuntu Lunar):
status: In Progress → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nvidia-graphics-drivers-535-server - 535.104.12-0ubuntu0.22.04.1

---------------
nvidia-graphics-drivers-535-server (535.104.12-0ubuntu0.22.04.1) jammy; urgency=medium

  * New upstream release (LP: #2037266):
    - Fixed an issue where the NVSwitch driver would not retrain
      NVLinks on init correctly on HGX 8 H100, in case they faulted
      earlier (such a due to GPU resets). This would result in links
      being down and CUDA workloads failing with "system not yet
      initialized" error. The issue was introduced in the 535.86.10
      driver and fixed in 535.104.12 and later drivers.

 -- Alberto Milone <email address hidden> Mon, 25 Sep 2023 16:30:44 +0000

Changed in nvidia-graphics-drivers-535-server (Ubuntu Jammy):
status: In Progress → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nvidia-graphics-drivers-535 - 535.113.01-0ubuntu0.22.04.1

---------------
nvidia-graphics-drivers-535 (535.113.01-0ubuntu0.22.04.1) jammy; urgency=medium

  * New upstream release (LP: #2037266):
    - Fixed a bug that could cause GPU memory utilization to be
      reported incorrectly for Multi-Instance GPU (MIG) partitions on
      Grace Hopper systems.
    - Fixed a bug that intermittently caused the display to freeze
      when resuming from suspend on some Ada GPUs.

 -- Alberto Milone <email address hidden> Mon, 25 Sep 2023 09:45:05 +0000

Changed in nvidia-graphics-drivers-535 (Ubuntu Jammy):
status: In Progress → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nvidia-graphics-drivers-535-server - 535.104.12-0ubuntu0.20.04.1

---------------
nvidia-graphics-drivers-535-server (535.104.12-0ubuntu0.20.04.1) focal; urgency=medium

  * New upstream release (LP: #2037266):
    - Fixed an issue where the NVSwitch driver would not retrain
      NVLinks on init correctly on HGX 8 H100, in case they faulted
      earlier (such a due to GPU resets). This would result in links
      being down and CUDA workloads failing with "system not yet
      initialized" error. The issue was introduced in the 535.86.10
      driver and fixed in 535.104.12 and later drivers.

 -- Alberto Milone <email address hidden> Mon, 25 Sep 2023 16:29:52 +0000

Changed in nvidia-graphics-drivers-535-server (Ubuntu Focal):
status: In Progress → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nvidia-graphics-drivers-535 - 535.113.01-0ubuntu0.20.04.1

---------------
nvidia-graphics-drivers-535 (535.113.01-0ubuntu0.20.04.1) focal; urgency=medium

  * New upstream release (LP: #2037266):
    - Fixed a bug that could cause GPU memory utilization to be
      reported incorrectly for Multi-Instance GPU (MIG) partitions on
      Grace Hopper systems.
    - Fixed a bug that intermittently caused the display to freeze
      when resuming from suspend on some Ada GPUs.

 -- Alberto Milone <email address hidden> Mon, 25 Sep 2023 10:02:47 +0000

Changed in nvidia-graphics-drivers-535 (Ubuntu Focal):
status: In Progress → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nvidia-graphics-drivers-535-server - 535.104.12-0ubuntu2

---------------
nvidia-graphics-drivers-535-server (535.104.12-0ubuntu2) mantic; urgency=medium

  * debian/rules:
    - Add override_dh_installsystemd.
    - Pass in --no-stop-on-upgrade --no-restart-after-upgrade to
      dh_installsystemd (LP: #2025640).

 -- Alberto Milone <email address hidden> Fri, 29 Sep 2023 19:12:51 +0000

Changed in nvidia-graphics-drivers-535-server (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nvidia-graphics-drivers-535 - 535.113.01-0ubuntu3

---------------
nvidia-graphics-drivers-535 (535.113.01-0ubuntu3) mantic; urgency=medium

  * debian/rules:
    - Add override_dh_installsystemd.
    - Pass in dh_installsystemd --no-restart-after-upgrade.

nvidia-graphics-drivers-535 (535.113.01-0ubuntu2) mantic; urgency=medium

  * debian/rules:
   - Pass in --no-stop-on-upgrade to dh_installsystemd.
     Prevent dh_installsystemd from stopping services (LP: #2025640).

 -- Alberto Milone <email address hidden> Fri, 29 Sep 2023 18:07:07 +0000

Changed in nvidia-graphics-drivers-535 (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.