Comment 17 for bug 2029934

Revision history for this message
Simon Fels (morphis) wrote :

I gave this another spin today with 6.5.0-17-generic #17~22.04.1 and the LRM modules of the 535 driver (6.5.0-17.17~22.04.1+1 of linux-modules-nvidia-535-server-generic-hwe-22.04) on our Altra system with 2x L4 GPUs and the same problem exists as with the DKMS modules:

[ 39.437849] watchdog: BUG: soft lockup - CPU#62 stuck for 26s! [systemd-udevd:850]
[ 39.445411] Modules linked in: nvidia(POE+) crct10dif_ce polyval_ce polyval_generic ghash_ce ast mlx5_core video drm_shmem_helper sm4 mlxfw sha2_ce drm_kms_helper nvme psample sha256_arm64 sha1_ce nvme_core igb drm tls xhci_pci nvme_common pci_hyperv_intf xhci_pci_renesas i2c_algo_bit aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher
[ 39.474949] CPU: 62 PID: 850 Comm: systemd-udevd Tainted: P OE 6.5.0-17-generic #17~22.04.1-Ubuntu
[ 39.485196] Hardware name: GIGABYTE G242-P30-JG/MP32-AR0-JG, BIOS F07 03/22/2021
[ 39.492578] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 39.499526] pc : smp_call_function_many_cond+0x19c/0x720
[ 39.504830] lr : smp_call_function_many_cond+0x1b8/0x720
[ 39.510130] sp : ffff80008934b920
[ 39.513431] x29: ffff80008934b920 x28: ffffaef99146dd10 x27: 0000000000000000
[ 39.520554] x26: 000000000000004f x25: ffff085dcfffbb80 x24: 0000000000000026
[ 39.527677] x23: 0000000000000001 x22: ffff085dcfdd6708 x21: ffffaef9914726e0
[ 39.534799] x20: ffff085dcfadbb80 x19: ffff085dcfdd6700 x18: ffff800089341060
[ 39.541921] x17: 0000000000000000 x16: 0000000000000000 x15: 43535f5f00656c75
[ 39.549044] x14: 0c030b111b111303 x13: 0000000000000006 x12: 3931413337353339
[ 39.556166] x11: 0101010101010101 x10: 000000000000004f x9 : ffffaef98ee015b8
[ 39.563289] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000000000003e
[ 39.570411] x5 : ffffaef99146d000 x4 : 0000000000000000 x3 : ffff085dcfadbb88
[ 39.577533] x2 : 0000000000000026 x1 : 0000000000000011 x0 : 0000000000000000
[ 39.584656] Call trace:
[ 39.587090] smp_call_function_many_cond+0x19c/0x720
[ 39.592043] kick_all_cpus_sync+0x50/0xa8
[ 39.596040] flush_module_icache+0x94/0xf8
[ 39.600125] load_module+0x448/0x8e0
[ 39.603688] init_module_from_file+0x94/0x110
[ 39.608033] idempotent_init_module+0x194/0x2b0
[ 39.612551] __arm64_sys_finit_module+0x74/0x100
[ 39.617155] invoke_syscall+0x7c/0x130
[ 39.620892] el0_svc_common.constprop.0+0x5c/0x170
[ 39.625670] do_el0_svc+0x38/0x68
[ 39.628972] el0_svc+0x30/0xe0
[ 39.632016] el0t_64_sync_handler+0x128/0x158
[ 39.636360] el0t_64_sync+0x1b0/0x1b8