amdgpu hangs on DCN 3.5 at bootup: RIP: 0010:dcn35_clk_mgr_construct+0x183/0x2210 [amdgpu]

Bug #2066233 reported by You-Sheng Yang
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
New
Undecided
Unassigned
linux (Ubuntu)
Status tracked in Oracular
Noble
Fix Released
High
You-Sheng Yang
Oracular
Fix Released
Undecided
Unassigned
linux-oem-6.8 (Ubuntu)
Status tracked in Oracular
Noble
Fix Released
High
You-Sheng Yang
Oracular
Invalid
Undecided
Unassigned

Bug Description

[SRU Justification]

BugLink: https://bugs.launchpad.net/bugs/2066233

[Impact]

Newer VBIOS on DCN 3.5 bumped the version of IntegratedInfo table from 2.2 to 2.3. This version uses same structure. Version 2.3 is missing from the construct_integrated_info() parser, so it leads to NULL pointer dereference.

```
Call Trace:
<TASK>
? show_regs+0x72/0x90
? __die+0x25/0x80
? page_fault_oops+0x154/0x4c0
? ttm_bo_kmap+0x11d/0x310 [ttm]
? dma_resv_wait_timeout+0x48/0xe0
? do_user_addr_fault+0x30e/0x6e0
? exc_page_fault+0x84/0x1b0
? asm_exc_page_fault+0x27/0x30
? dcn35_clk_mgr_construct+0x183/0x2210 [amdgpu]
? dcn35_clk_mgr_construct+0x15a/0x2210 [amdgpu]
? dcn35_hwseq_create+0x23/0x470 [amdgpu]
```

[Fix]

Fix landed to upstream v6.9-rc7: 9a35d205f466 ("drm/amd/display: Atom Integrated System Info v2_2 for DCN35")

[Test Case]

AMDGPU should then be initialized successfully without NULL pointer deref dump at boot.

[Where problems could occur]

No. New hardware revision with same data only.

[Other Info]

While this has been landed to v6.9-rc7, expect every kernel version older than that with planned support to the new VBIOS update should be fixed. So far linux/noble and linux-oem-6.8/noble are nominated by chip vendor.

========== original bug report ==========

Newer VBIOS on DCN 3.5 bumped the version of IntegratedInfo table from 2.2 to 2.3. This version uses same structure. Version 2.3 is missing from the construct_integrated_info() parser, so it leads to NULL pointer dereference.

[Thu May 9 18:02:38 2024] Call Trace:
[Thu May 9 18:02:38 2024] <TASK>
[Thu May 9 18:02:38 2024] ? show_regs+0x72/0x90
[Thu May 9 18:02:38 2024] ? __die+0x25/0x80
[Thu May 9 18:02:38 2024] ? page_fault_oops+0x154/0x4c0
[Thu May 9 18:02:38 2024] ? ttm_bo_kmap+0x11d/0x310 [ttm]
[Thu May 9 18:02:38 2024] ? dma_resv_wait_timeout+0x48/0xe0
[Thu May 9 18:02:38 2024] ? do_user_addr_fault+0x30e/0x6e0
[Thu May 9 18:02:38 2024] ? exc_page_fault+0x84/0x1b0
[Thu May 9 18:02:38 2024] ? asm_exc_page_fault+0x27/0x30
[Thu May 9 18:02:38 2024] ? dcn35_clk_mgr_construct+0x183/0x2210 [amdgpu]
[Thu May 9 18:02:38 2024] ? dcn35_clk_mgr_construct+0x15a/0x2210 [amdgpu]
[Thu May 9 18:02:38 2024] ? dcn35_hwseq_create+0x23/0x470 [amdgpu]
...

Fix landed to upstream v6.9-rc7: 9a35d205f466 ("drm/amd/display: Atom Integrated System Info v2_2 for DCN35")

You-Sheng Yang (vicamo)
tags: added: amd oem-priority originate-from-2065426
Changed in linux (Ubuntu Oracular):
status: New → Fix Released
Changed in linux (Ubuntu Noble):
status: New → In Progress
Changed in linux-oem-6.8 (Ubuntu Noble):
status: New → In Progress
importance: Undecided → High
assignee: nobody → You-Sheng Yang (vicamo)
Changed in linux-oem-6.8 (Ubuntu Oracular):
status: New → Invalid
Revision history for this message
You-Sheng Yang (vicamo) wrote :
description: updated
Revision history for this message
You-Sheng Yang (vicamo) wrote :
description: updated
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

oracular has 6.8 still, how is this fixed there?

Revision history for this message
You-Sheng Yang (vicamo) wrote :
Changed in linux (Ubuntu Noble):
importance: Undecided → High
assignee: nobody → You-Sheng Yang (vicamo)
Stefan Bader (smb)
Changed in linux (Ubuntu Noble):
status: In Progress → Fix Committed
LEE KUAN-YING (kyyc0426)
Changed in linux-oem-6.8 (Ubuntu Noble):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-oem-6.8/6.8.0-1007.7 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-oem-6.8' to 'verification-done-noble-linux-oem-6.8'. If the problem still exists, change the tag 'verification-needed-noble-linux-oem-6.8' to 'verification-failed-noble-linux-oem-6.8'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-oem-6.8-v2 verification-needed-noble-linux-oem-6.8
You-Sheng Yang (vicamo)
tags: added: verification-done-noble-linux-oem-6.8
removed: verification-needed-noble-linux-oem-6.8
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-6.8 - 6.8.0-1007.7

---------------
linux-oem-6.8 (6.8.0-1007.7) noble; urgency=medium

  * noble/linux-oem-6.8: 6.8.0-1007.7 -proposed tracker (LP: #2068142)

  * Packaging resync (LP: #1786013)
    - [Packaging] Replace fs/cifs with fs/smb in inclusion list

  * Panels show garbage or flickering when i915.psr2 enabled (LP: #2069993)
    - SAUCE: drm/i915/display/psr: add a psr2 disable quirk table
    - SAUCE: drm/i915/display/psr: disable psr2 for panel_0x4d_0x10_0x93_0x15
    - SAUCE: drm/i915/display/psr: disable psr2 for panel_0x30_0xe4_0x8b_0x07
    - SAUCE: drm/i915/display/psr: disable psr2 for panel_0x30_0xe4_0x78_0x07
    - SAUCE: drm/i915/display/psr: disable psr2 for panel_0x30_0xe4_0x8c_0x07
    - SAUCE: drm/i915/display/psr: disable psr2 for panel_0x06_0xaf_0x9a_0xf9
    - SAUCE: drm/i915/display/psr: disable psr2 for panel_0x4d_0x10_0x8f_0x15
    - SAUCE: drm/i915/display/psr: disable psr2 for panel_0x06_0xaf_0xa3_0xc3

  * FPS of glxgear with fullscreen is too low on MTL platform (LP: #2069380)
    - drm/i915: Bypass LMEMBAR/GTTMMADR for MTL stolen memory access

  * amdgpu hangs on DCN 3.5 at bootup: RIP:
    0010:dcn35_clk_mgr_construct+0x183/0x2210 [amdgpu] (LP: #2066233)
    - drm/amd/display: Atom Integrated System Info v2_2 for DCN35

  [ Ubuntu: 6.8.0-36.36 ]

  * noble/linux: 6.8.0-36.36 -proposed tracker (LP: #2068150)
  * CVE-2024-26924
    - netfilter: nft_set_pipapo: do not free live element

  [ Ubuntu: 6.8.0-35.35 ]

  * noble/linux: 6.8.0-35.35 -proposed tracker (LP: #2065886)
  * CVE-2024-21823
    - VFIO: Add the SPR_DSA and SPR_IAX devices to the denylist
    - dmaengine: idxd: add a new security check to deal with a hardware erratum
    - dmaengine: idxd: add a write() method for applications to submit work

 -- Kuan-Ying Lee <email address hidden> Wed, 26 Jun 2024 14:17:06 +0800

Changed in linux-oem-6.8 (Ubuntu Noble):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/6.8.0-40.40 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux' to 'verification-done-noble-linux'. If the problem still exists, change the tag 'verification-needed-noble-linux' to 'verification-failed-noble-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-v2 verification-needed-noble-linux
You-Sheng Yang (vicamo)
tags: added: verification-done-noble-linux
removed: verification-needed-noble-linux
Revision history for this message
You-Sheng Yang (vicamo) wrote :

Verified linux/noble version 6.8.0-40.40.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-ibm-gt-tdx/6.8.0-1009.10+tdx1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-ibm-gt-tdx' to 'verification-done-noble-linux-ibm-gt-tdx'. If the problem still exists, change the tag 'verification-needed-noble-linux-ibm-gt-tdx' to 'verification-failed-noble-linux-ibm-gt-tdx'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-ibm-gt-tdx-v2 verification-needed-noble-linux-ibm-gt-tdx
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (72.0 KiB)

This bug was fixed in the package linux - 6.8.0-40.40

---------------
linux (6.8.0-40.40) noble; urgency=medium

  * noble/linux: 6.8.0-40.40 -proposed tracker (LP: #2072201)

  * FPS of glxgear with fullscreen is too low on MTL platform (LP: #2069380)
    - drm/i915: Bypass LMEMBAR/GTTMMADR for MTL stolen memory access

  * a critical typo in the code managing the ASPM settings for PCI Express
    devices (LP: #2071889)
    - PCI/ASPM: Restore parent state to parent, child state to child

  * [UBUNTU 24.04] IOMMU DMA mode changed in kernel config causes massive
    throughput degradation for PCI-related network workloads (LP: #2071471)
    - [Config] Set IOMMU_DEFAULT_DMA_STRICT=n and IOMMU_DEFAULT_DMA_LAZY=yes for
      s390x

  * UBSAN: array-index-out-of-bounds in
    /build/linux-D15vQj/linux-6.5.0/drivers/md/bcache/bset.c:1098:3
    (LP: #2039368)
    - bcache: fix variable length array abuse in btree_iter

  * Mute/mic LEDs and speaker no function on EliteBook 645/665 G11
    (LP: #2071296)
    - ALSA: hda/realtek: fix mute/micmute LEDs don't work for EliteBook 645/665
      G11.

  * failed to enable IPU6 camera sensor on kernel >= 6.8: ivsc_ace
    intel_vsc-5db76cf6-0a68-4ed6-9b78-0361635e2447: switch camera to host
    failed: -110 (LP: #2067364)
    - mei: vsc: Don't stop/restart mei device during system suspend/resume
    - SAUCE: media: ivsc: csi: don't count privacy on as error
    - SAUCE: media: ivsc: csi: add separate lock for v4l2 control handler
    - SAUCE: media: ivsc: csi: remove privacy status in struct mei_csi
    - SAUCE: mei: vsc: Enhance IVSC chipset stability during warm reboot
    - SAUCE: mei: vsc: Enhance SPI transfer of IVSC rom
    - SAUCE: mei: vsc: Utilize the appropriate byte order swap function
    - SAUCE: mei: vsc: Prevent timeout error with added delay post-firmware
      download

  * failed to probe camera sensor on Dell XPS 9315: ov01a10 i2c-OVTI01A0:00:
    failed to check hwcfg: -22 (LP: #2070251)
    - ACPI: utils: Make acpi_handle_path() not static
    - ACPI: property: Ignore bad graph port nodes on Dell XPS 9315
    - ACPI: property: Polish ignoring bad data nodes
    - ACPI: scan: Ignore camera graph port nodes on all Dell Tiger, Alder and
      Raptor Lake models

  * Update amd_sfh for AMD strix series (LP: #2058331)
    - HID: amd_sfh: Increase sensor command timeout
    - HID: amd_sfh: Improve boot time when SFH is available
    - HID: amd_sfh: Extend MP2 register access to SFH
    - HID: amd_sfh: Set the AMD SFH driver to depend on x86

  * RFIM and SAGV Linux Support for G10 models (LP: #2070158)
    - drm/i915/display: Add meaningful traces for QGV point info error handling
    - drm/i915/display: Extract code required to calculate max qgv/psf gv point
    - drm/i915/display: extract code to prepare qgv points mask
    - drm/i915/display: Disable SAGV on bw init, to force QGV point recalculation
    - drm/i915/display: handle systems with duplicate psf gv points
    - drm/i915/display: force qgv check after the hw state readout

  * Update amd-pmf for AMD strix series (LP: #2058330)
    - platform/x86/amd/pmf: Differentiate PMF ACPI versions
    - platform/x86/amd/pmf: Disable debugf...

Changed in linux (Ubuntu Noble):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/6.8.0-1012.14 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-azure' to 'verification-done-noble-linux-azure'. If the problem still exists, change the tag 'verification-needed-noble-linux-azure' to 'verification-failed-noble-linux-azure'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-azure-v2 verification-needed-noble-linux-azure
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-raspi/6.8.0-1009.10 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-raspi' to 'verification-done-noble-linux-raspi'. If the problem still exists, change the tag 'verification-needed-noble-linux-raspi' to 'verification-failed-noble-linux-raspi'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-raspi-v2 verification-needed-noble-linux-raspi
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-riscv-6.8/6.8.0-40.40.1~22.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-riscv-6.8' to 'verification-done-jammy-linux-riscv-6.8'. If the problem still exists, change the tag 'verification-needed-jammy-linux-riscv-6.8' to 'verification-failed-jammy-linux-riscv-6.8'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-riscv-6.8-v2 verification-needed-jammy-linux-riscv-6.8
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-raspi-realtime/6.8.0-2008.8 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-raspi-realtime' to 'verification-done-noble-linux-raspi-realtime'. If the problem still exists, change the tag 'verification-needed-noble-linux-raspi-realtime' to 'verification-failed-noble-linux-raspi-realtime'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-raspi-realtime-v2 verification-needed-noble-linux-raspi-realtime
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-lowlatency/6.8.0-40.40.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-lowlatency' to 'verification-done-noble-linux-lowlatency'. If the problem still exists, change the tag 'verification-needed-noble-linux-lowlatency' to 'verification-failed-noble-linux-lowlatency'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-lowlatency-v2 verification-needed-noble-linux-lowlatency
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-ibm-6.8/6.8.0-1010.10~22.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-ibm-6.8' to 'verification-done-jammy-linux-ibm-6.8'. If the problem still exists, change the tag 'verification-needed-jammy-linux-ibm-6.8' to 'verification-failed-jammy-linux-ibm-6.8'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-ibm-6.8-v2 verification-needed-jammy-linux-ibm-6.8
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-lowlatency-hwe-6.8/6.8.0-40.40.1~22.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-lowlatency-hwe-6.8' to 'verification-done-jammy-linux-lowlatency-hwe-6.8'. If the problem still exists, change the tag 'verification-needed-jammy-linux-lowlatency-hwe-6.8' to 'verification-failed-jammy-linux-lowlatency-hwe-6.8'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-lowlatency-hwe-6.8-v2 verification-needed-jammy-linux-lowlatency-hwe-6.8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.