amdgpu hangs on DCN 3.5 at bootup: RIP: 0010:dcn35_clk_mgr_construct+0x183/0x2210 [amdgpu]

Bug #2066233 reported by You-Sheng Yang
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
New
Undecided
Unassigned
linux (Ubuntu)
Status tracked in Oracular
Noble
In Progress
Undecided
Unassigned
Oracular
Fix Released
Undecided
Unassigned
linux-oem-6.8 (Ubuntu)
Status tracked in Oracular
Noble
In Progress
High
You-Sheng Yang
Oracular
Invalid
Undecided
Unassigned

Bug Description

[SRU Justification]

BugLink: https://bugs.launchpad.net/bugs/2066233

[Impact]

Newer VBIOS on DCN 3.5 bumped the version of IntegratedInfo table from 2.2 to 2.3. This version uses same structure. Version 2.3 is missing from the construct_integrated_info() parser, so it leads to NULL pointer dereference.

```
Call Trace:
<TASK>
? show_regs+0x72/0x90
? __die+0x25/0x80
? page_fault_oops+0x154/0x4c0
? ttm_bo_kmap+0x11d/0x310 [ttm]
? dma_resv_wait_timeout+0x48/0xe0
? do_user_addr_fault+0x30e/0x6e0
? exc_page_fault+0x84/0x1b0
? asm_exc_page_fault+0x27/0x30
? dcn35_clk_mgr_construct+0x183/0x2210 [amdgpu]
? dcn35_clk_mgr_construct+0x15a/0x2210 [amdgpu]
? dcn35_hwseq_create+0x23/0x470 [amdgpu]
```

[Fix]

Fix landed to upstream v6.9-rc7: 9a35d205f466 ("drm/amd/display: Atom Integrated System Info v2_2 for DCN35")

[Test Case]

AMDGPU should then be initialized successfully without NULL pointer deref dump at boot.

[Where problems could occur]

No. New hardware revision with same data only.

[Other Info]

While this has been landed to v6.9-rc7, expect every kernel version older than that with planned support to the new VBIOS update should be fixed. So far linux/noble and linux-oem-6.8/noble are nominated by chip vendor.

========== original bug report ==========

Newer VBIOS on DCN 3.5 bumped the version of IntegratedInfo table from 2.2 to 2.3. This version uses same structure. Version 2.3 is missing from the construct_integrated_info() parser, so it leads to NULL pointer dereference.

[Thu May 9 18:02:38 2024] Call Trace:
[Thu May 9 18:02:38 2024] <TASK>
[Thu May 9 18:02:38 2024] ? show_regs+0x72/0x90
[Thu May 9 18:02:38 2024] ? __die+0x25/0x80
[Thu May 9 18:02:38 2024] ? page_fault_oops+0x154/0x4c0
[Thu May 9 18:02:38 2024] ? ttm_bo_kmap+0x11d/0x310 [ttm]
[Thu May 9 18:02:38 2024] ? dma_resv_wait_timeout+0x48/0xe0
[Thu May 9 18:02:38 2024] ? do_user_addr_fault+0x30e/0x6e0
[Thu May 9 18:02:38 2024] ? exc_page_fault+0x84/0x1b0
[Thu May 9 18:02:38 2024] ? asm_exc_page_fault+0x27/0x30
[Thu May 9 18:02:38 2024] ? dcn35_clk_mgr_construct+0x183/0x2210 [amdgpu]
[Thu May 9 18:02:38 2024] ? dcn35_clk_mgr_construct+0x15a/0x2210 [amdgpu]
[Thu May 9 18:02:38 2024] ? dcn35_hwseq_create+0x23/0x470 [amdgpu]
...

Fix landed to upstream v6.9-rc7: 9a35d205f466 ("drm/amd/display: Atom Integrated System Info v2_2 for DCN35")

You-Sheng Yang (vicamo)
tags: added: amd oem-priority originate-from-2065426
Changed in linux (Ubuntu Oracular):
status: New → Fix Released
Changed in linux (Ubuntu Noble):
status: New → In Progress
Changed in linux-oem-6.8 (Ubuntu Noble):
status: New → In Progress
importance: Undecided → High
assignee: nobody → You-Sheng Yang (vicamo)
Changed in linux-oem-6.8 (Ubuntu Oracular):
status: New → Invalid
Revision history for this message
You-Sheng Yang (vicamo) wrote :
description: updated
Revision history for this message
You-Sheng Yang (vicamo) wrote :
description: updated
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

oracular has 6.8 still, how is this fixed there?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.