Comment 0 for bug 2036742

Revision history for this message
Paolo Gentili (pgentili) wrote : amdgpu crash on Mantic

[Impact]

Booting from USB the latest Mantic Desktop canary image (2023-09-19), just after the initial logs, nothing is displayed on screen. The system is still alive since _autoinstall_ works as intended. Once provisioned, the problem is still present.

```
[ 4.189404] kernel: amdgpu 0000:01:00.0: amdgpu:
                       last message was failed ret is 65535
[ 4.189430] kernel: ------------[ cut here ]------------
[ 4.189432] kernel: WARNING: CPU: 6 PID: 241 at drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:1107 uvd_v6_0_ring_insert_nop+0xf8/0x130 [amdgpu]
[ 4.189768] kernel: Modules linked in: hid_generic uas usbhid hid usb_storage amdgpu(+) i915 iommu_v2 gpu_sched drm_buddy drm_ttm_helper i2c_algo_bit ttm drm_display_helper cec rc_core crct10dif_pclmul crc32_pclmul drm_kms_helper polyval_clmulni syscopyarea polyval_generic ghash_clmulni_intel sysfillrect sha512_ssse3 nvme aesni_intel sysimgblt ucsi_acpi crypto_simd nvme_core ahci xhci_pci video typec_ucsi drm cryptd e1000e nvme_common libahci xhci_pci_renesas typec wmi pinctrl_tigerlake
[ 4.189800] kernel: CPU: 6 PID: 241 Comm: (udev-worker) Not tainted 6.3.0-7-generic #7-Ubuntu
[ 4.189804] kernel: Hardware name: Dell Inc. OptiPlex 5090/, BIOS 0.12.80 02/23/2021
[ 4.189806] kernel: RIP: 0010:uvd_v6_0_ring_insert_nop+0xf8/0x130 [amdgpu]
[ 4.190057] kernel: Code: 02 00 00 83 e8 01 89 83 90 02 00 00 45 39 ec 74 24 85 c0 0f 8f 5b ff ff ff 48 c7 c7 d0 97 4d c1 e8 3d d0 78 ff e9 4a ff ff ff <0f> 0b 41 d1 ec 0f 85 31 ff ff ff 5b 41 5c 41 5d 5d 31 c0 31 d2 31
[ 4.190060] kernel: RSP: 0018:ffffbab4c0e5b860 EFLAGS: 00010202
[ 4.190063] kernel: RAX: ffffffffc0db0f90 RBX: ffff92b21f990480 RCX: 0000000000000010
[ 4.190065] kernel: RDX: 000000000000000f RSI: 000000000000000f RDI: ffff92b21f990480
[ 4.190067] kernel: RBP: ffffbab4c0e5b878 R08: 0000000000000000 R09: 0000000000000000
[ 4.190068] kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000f
[ 4.190070] kernel: R13: 0000000000000000 R14: ffff92b21f980070 R15: 0000000000000001
[ 4.190071] kernel: FS: 00007f28c53758c0(0000) GS:ffff92b581380000(0000) knlGS:0000000000000000
[ 4.190074] kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.190075] kernel: CR2: 000055c668720080 CR3: 0000000100f48001 CR4: 0000000000770ee0
[ 4.190077] kernel: PKRU: 55555554
[ 4.190079] kernel: Call Trace:
[ 4.190080] kernel: <TASK>
[ 4.190082] kernel: ? show_regs+0x6d/0x80
[ 4.190087] kernel: ? __warn+0x89/0x160
[ 4.190091] kernel: ? uvd_v6_0_ring_insert_nop+0xf8/0x130 [amdgpu]
[ 4.190343] kernel: ? report_bug+0x17e/0x1b0
[ 4.190347] kernel: ? handle_bug+0x46/0x90
[ 4.190350] kernel: ? exc_invalid_op+0x18/0x80
[ 4.190352] kernel: ? asm_exc_invalid_op+0x1b/0x20
[ 4.190356] kernel: ? __pfx_uvd_v6_0_ring_insert_nop+0x10/0x10 [amdgpu]
[ 4.190594] kernel: ? uvd_v6_0_ring_insert_nop+0xf8/0x130 [amdgpu]
[ 4.190761] kernel: amdgpu_ring_commit+0x36/0x80 [amdgpu]
[ 4.190901] kernel: uvd_v6_0_ring_test_ring+0xf6/0x180 [amdgpu]
[ 4.191068] kernel: amdgpu_ring_test_helper+0x1e/0x90 [amdgpu]
[ 4.191207] kernel: uvd_v6_0_hw_init+0x99/0x650 [amdgpu]
[ 4.191380] kernel: amdgpu_device_ip_init+0x48e/0x950 [amdgpu]
[ 4.191513] kernel: amdgpu_device_init+0x8eb/0x1130 [amdgpu]
[ 4.191647] kernel: amdgpu_driver_load_kms+0x1a/0x1c0 [amdgpu]
[ 4.191779] kernel: amdgpu_pci_probe+0x180/0x440 [amdgpu]
[ 4.191907] kernel: local_pci_probe+0x44/0xb0
[ 4.191911] kernel: pci_call_probe+0x55/0x190
[ 4.191914] kernel: pci_device_probe+0x84/0x120
[ 4.191916] kernel: really_probe+0x1c9/0x430
[ 4.191919] kernel: __driver_probe_device+0x8c/0x190
[ 4.191921] kernel: driver_probe_device+0x24/0xd0
[ 4.191923] kernel: __driver_attach+0x10b/0x210
[ 4.191925] kernel: ? __pfx___driver_attach+0x10/0x10
[ 4.191929] kernel: bus_for_each_dev+0x8a/0xf0
[ 4.191931] kernel: driver_attach+0x1e/0x30
[ 4.191932] kernel: bus_add_driver+0x127/0x240
[ 4.191934] kernel: driver_register+0x5e/0x130
[ 4.191936] kernel: ? __pfx_init_module+0x10/0x10 [amdgpu]
[ 4.192076] kernel: __pci_register_driver+0x62/0x70
[ 4.192078] kernel: amdgpu_init+0x69/0xff0 [amdgpu]
[ 4.192213] kernel: do_one_initcall+0x5b/0x250
[ 4.192217] kernel: do_init_module+0x7b/0x260
[ 4.192219] kernel: load_module+0xbc9/0xc90
[ 4.192222] kernel: __do_sys_finit_module+0xc4/0x140
[ 4.192224] kernel: ? __do_sys_finit_module+0xc4/0x140
[ 4.192226] kernel: __x64_sys_finit_module+0x18/0x30
[ 4.192228] kernel: do_syscall_64+0x58/0x90
[ 4.192231] kernel: ? exit_to_user_mode_prepare+0x30/0xb0
[ 4.192233] kernel: ? syscall_exit_to_user_mode+0x29/0x50
[ 4.192235] kernel: ? do_syscall_64+0x67/0x90
[ 4.192237] kernel: ? do_syscall_64+0x67/0x90
[ 4.192239] kernel: ? do_syscall_64+0x67/0x90
[ 4.192241] kernel: ? sysvec_call_function+0x4b/0xd0
[ 4.192243] kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 4.192245] kernel: RIP: 0033:0x7f28c5b23c5d
[ 4.192253] kernel: Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8b 71 13 00 f7 d8 64 89 01 48
[ 4.192256] kernel: RSP: 002b:00007fff1cede9f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 4.192258] kernel: RAX: ffffffffffffffda RBX: 000055c668709060 RCX: 00007f28c5b23c5d
[ 4.192259] kernel: RDX: 0000000000000000 RSI: 00007f28c5cfe44a RDI: 0000000000000016
[ 4.192260] kernel: RBP: 00007f28c5cfe44a R08: 0000000000000040 R09: fffffffffffffde0
[ 4.192262] kernel: R10: fffffffffffffe18 R11: 0000000000000246 R12: 0000000000020000
[ 4.192263] kernel: R13: 000055c668706350 R14: 0000000000000000 R15: 000055c668703870
[ 4.192265] kernel: </TASK>
[ 4.192266] kernel: ---[ end trace 0000000000000000 ]---
```

It seems related to https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/2029396 .

[Test Case]

Live boot Ubuntu Mantic Desktop canary (2023-09-19)

[Where Problems Could Occur]

Dell Optiplex 5090
- Intel Core(TM) i7-11700
- Advanced Micro Devices, Inc. [AMD/ATI] - 1002:699f