amdgpu hangs on DCN 3.5 at bootup: RIP: 0010:dcn35_clk_mgr_construct+0x183/0x2210 [amdgpu]

Bug #2066233 reported by You-Sheng Yang
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
New
Undecided
Unassigned
linux (Ubuntu)
Status tracked in Oracular
Noble
Fix Committed
High
You-Sheng Yang
Oracular
Fix Released
Undecided
Unassigned
linux-oem-6.8 (Ubuntu)
Status tracked in Oracular
Noble
Fix Released
High
You-Sheng Yang
Oracular
Invalid
Undecided
Unassigned

Bug Description

[SRU Justification]

BugLink: https://bugs.launchpad.net/bugs/2066233

[Impact]

Newer VBIOS on DCN 3.5 bumped the version of IntegratedInfo table from 2.2 to 2.3. This version uses same structure. Version 2.3 is missing from the construct_integrated_info() parser, so it leads to NULL pointer dereference.

```
Call Trace:
<TASK>
? show_regs+0x72/0x90
? __die+0x25/0x80
? page_fault_oops+0x154/0x4c0
? ttm_bo_kmap+0x11d/0x310 [ttm]
? dma_resv_wait_timeout+0x48/0xe0
? do_user_addr_fault+0x30e/0x6e0
? exc_page_fault+0x84/0x1b0
? asm_exc_page_fault+0x27/0x30
? dcn35_clk_mgr_construct+0x183/0x2210 [amdgpu]
? dcn35_clk_mgr_construct+0x15a/0x2210 [amdgpu]
? dcn35_hwseq_create+0x23/0x470 [amdgpu]
```

[Fix]

Fix landed to upstream v6.9-rc7: 9a35d205f466 ("drm/amd/display: Atom Integrated System Info v2_2 for DCN35")

[Test Case]

AMDGPU should then be initialized successfully without NULL pointer deref dump at boot.

[Where problems could occur]

No. New hardware revision with same data only.

[Other Info]

While this has been landed to v6.9-rc7, expect every kernel version older than that with planned support to the new VBIOS update should be fixed. So far linux/noble and linux-oem-6.8/noble are nominated by chip vendor.

========== original bug report ==========

Newer VBIOS on DCN 3.5 bumped the version of IntegratedInfo table from 2.2 to 2.3. This version uses same structure. Version 2.3 is missing from the construct_integrated_info() parser, so it leads to NULL pointer dereference.

[Thu May 9 18:02:38 2024] Call Trace:
[Thu May 9 18:02:38 2024] <TASK>
[Thu May 9 18:02:38 2024] ? show_regs+0x72/0x90
[Thu May 9 18:02:38 2024] ? __die+0x25/0x80
[Thu May 9 18:02:38 2024] ? page_fault_oops+0x154/0x4c0
[Thu May 9 18:02:38 2024] ? ttm_bo_kmap+0x11d/0x310 [ttm]
[Thu May 9 18:02:38 2024] ? dma_resv_wait_timeout+0x48/0xe0
[Thu May 9 18:02:38 2024] ? do_user_addr_fault+0x30e/0x6e0
[Thu May 9 18:02:38 2024] ? exc_page_fault+0x84/0x1b0
[Thu May 9 18:02:38 2024] ? asm_exc_page_fault+0x27/0x30
[Thu May 9 18:02:38 2024] ? dcn35_clk_mgr_construct+0x183/0x2210 [amdgpu]
[Thu May 9 18:02:38 2024] ? dcn35_clk_mgr_construct+0x15a/0x2210 [amdgpu]
[Thu May 9 18:02:38 2024] ? dcn35_hwseq_create+0x23/0x470 [amdgpu]
...

Fix landed to upstream v6.9-rc7: 9a35d205f466 ("drm/amd/display: Atom Integrated System Info v2_2 for DCN35")

CVE References

You-Sheng Yang (vicamo)
tags: added: amd oem-priority originate-from-2065426
Changed in linux (Ubuntu Oracular):
status: New → Fix Released
Changed in linux (Ubuntu Noble):
status: New → In Progress
Changed in linux-oem-6.8 (Ubuntu Noble):
status: New → In Progress
importance: Undecided → High
assignee: nobody → You-Sheng Yang (vicamo)
Changed in linux-oem-6.8 (Ubuntu Oracular):
status: New → Invalid
Revision history for this message
You-Sheng Yang (vicamo) wrote :
description: updated
Revision history for this message
You-Sheng Yang (vicamo) wrote :
description: updated
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

oracular has 6.8 still, how is this fixed there?

Revision history for this message
You-Sheng Yang (vicamo) wrote :
Changed in linux (Ubuntu Noble):
importance: Undecided → High
assignee: nobody → You-Sheng Yang (vicamo)
Stefan Bader (smb)
Changed in linux (Ubuntu Noble):
status: In Progress → Fix Committed
LEE KUAN-YING (kyyc0426)
Changed in linux-oem-6.8 (Ubuntu Noble):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-oem-6.8/6.8.0-1007.7 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-oem-6.8' to 'verification-done-noble-linux-oem-6.8'. If the problem still exists, change the tag 'verification-needed-noble-linux-oem-6.8' to 'verification-failed-noble-linux-oem-6.8'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-oem-6.8-v2 verification-needed-noble-linux-oem-6.8
You-Sheng Yang (vicamo)
tags: added: verification-done-noble-linux-oem-6.8
removed: verification-needed-noble-linux-oem-6.8
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-6.8 - 6.8.0-1007.7

---------------
linux-oem-6.8 (6.8.0-1007.7) noble; urgency=medium

  * noble/linux-oem-6.8: 6.8.0-1007.7 -proposed tracker (LP: #2068142)

  * Packaging resync (LP: #1786013)
    - [Packaging] Replace fs/cifs with fs/smb in inclusion list

  * Panels show garbage or flickering when i915.psr2 enabled (LP: #2069993)
    - SAUCE: drm/i915/display/psr: add a psr2 disable quirk table
    - SAUCE: drm/i915/display/psr: disable psr2 for panel_0x4d_0x10_0x93_0x15
    - SAUCE: drm/i915/display/psr: disable psr2 for panel_0x30_0xe4_0x8b_0x07
    - SAUCE: drm/i915/display/psr: disable psr2 for panel_0x30_0xe4_0x78_0x07
    - SAUCE: drm/i915/display/psr: disable psr2 for panel_0x30_0xe4_0x8c_0x07
    - SAUCE: drm/i915/display/psr: disable psr2 for panel_0x06_0xaf_0x9a_0xf9
    - SAUCE: drm/i915/display/psr: disable psr2 for panel_0x4d_0x10_0x8f_0x15
    - SAUCE: drm/i915/display/psr: disable psr2 for panel_0x06_0xaf_0xa3_0xc3

  * FPS of glxgear with fullscreen is too low on MTL platform (LP: #2069380)
    - drm/i915: Bypass LMEMBAR/GTTMMADR for MTL stolen memory access

  * amdgpu hangs on DCN 3.5 at bootup: RIP:
    0010:dcn35_clk_mgr_construct+0x183/0x2210 [amdgpu] (LP: #2066233)
    - drm/amd/display: Atom Integrated System Info v2_2 for DCN35

  [ Ubuntu: 6.8.0-36.36 ]

  * noble/linux: 6.8.0-36.36 -proposed tracker (LP: #2068150)
  * CVE-2024-26924
    - netfilter: nft_set_pipapo: do not free live element

  [ Ubuntu: 6.8.0-35.35 ]

  * noble/linux: 6.8.0-35.35 -proposed tracker (LP: #2065886)
  * CVE-2024-21823
    - VFIO: Add the SPR_DSA and SPR_IAX devices to the denylist
    - dmaengine: idxd: add a new security check to deal with a hardware erratum
    - dmaengine: idxd: add a write() method for applications to submit work

 -- Kuan-Ying Lee <email address hidden> Wed, 26 Jun 2024 14:17:06 +0800

Changed in linux-oem-6.8 (Ubuntu Noble):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.