Comment 4 for bug 2034447

Revision history for this message
Francis Ginther (fginther) wrote :

Another panic with the same kernel on hidon, but a different trace:

[ 53.908045] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[ 53.910849] ipmi_si IPI0001:00: IPMI kcs interface initialized
[ 53.916485] BUG: unable to handle page fault for address: ff3bbb1e67db8300
[ 53.916488] #PF: supervisor instruction fetch in kernel mode
[ 53.916489] #PF: error_code(0x0011) - permissions violation
[ 53.916490] PGD bc51202067 P4D bc51203067 PUD 114ee8063 PMD 127c6e063 PTE 8000000127db8163
[ 53.916495] Oops: 0011 [#1] SMP NOPTI
[ 53.916498] CPU: 192 PID: 0 Comm: swapper/192 Not tainted 5.15.0-85-generic #95-Ubuntu
[ 53.916501] Hardware name: NVIDIA DGXH100/DGXH100, BIOS 1.0.7 05/08/2023
[ 53.916502] RIP: 0010:0xff3bbb1e67db8300
[ 53.916505] Code: 13 00 00 00 00 00 00 00 00 00 00 50 0c 99 3c 18 48 ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 d0 64 1e bb 3b ff 28 50 30 6b 1e bb 3b ff 00 00 00 00 00 00
[ 53.916507] RSP: 0018:ff48183c9bc04eb0 EFLAGS: 00010202
[ 53.916509] RAX: ff3bbb1e67db8300 RBX: 0000000000000207 RCX: ffffffffa5a0a898
[ 53.916511] RDX: ff3bbc1ddb48bfa0 RSI: ffffffffa479751d RDI: ff3bbc1ddb48c000
[ 53.916512] RBP: ff48183c9bc04f20 R08: 0000000000000001 R09: 0000000000000000
[ 53.916513] R10: 0000000000000001 R11: 0000000000000000 R12: ff48183c9bc04ed8
[ 53.916514] R13: 0000000000000206 R14: ff3bbd19bf6322c0 R15: ff3bbc1dce49c080
[ 53.916516] FS: 0000000000000000(0000) GS:ff3bbd19bf600000(0000) knlGS:0000000000000000
[ 53.916517] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 53.916519] CR2: ff3bbb1e67db8300 CR3: 000000bc4f810003 CR4: 0000000000771ee0
[ 54.036823] ens6f0 speed is unknown, defaulting to 1000
[ 54.036851] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 54.045100] ens6f0 speed is unknown, defaulting to 1000
[ 54.053987] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 54.053989] PKRU: 55555554
[ 54.053990] Call Trace:
[ 54.053992] <IRQ>
[ 54.055846] 4xxx 0000:e8:00.0: qat_dev1 started 9 acceleration engines
[ 54.060696] ens6f0 speed is unknown, defaulting to 1000
[ 54.068491] ? show_trace_log_lvl+0x1d6/0x2ea
[ 54.078858] ens6f0 speed is unknown, defaulting to 1000
[ 54.082407] ? show_trace_log_lvl+0x1d6/0x2ea
[ 54.133342] ? rcu_core+0x122/0x2a0
[ 54.133349] ? show_regs.part.0+0x23/0x29
[ 54.133352] ? __die_body.cold+0x8/0xd
[ 54.133355] ? __die+0x2b/0x37
[ 54.133358] ? page_fault_oops+0x13b/0x170
[ 54.133363] ? search_exception_tables+0x61/0x70
[ 54.133367] ? kernelmode_fixup_or_oops+0xa2/0x120
[ 54.133370] ? __bad_area_nosemaphore+0x15d/0x1a0
[ 54.133372] ? bad_area_nosemaphore+0x16/0x20
[ 54.133374] ? do_kern_addr_fault+0x62/0x80
[ 54.133377] ? exc_page_fault+0xe7/0x170
[ 54.133382] ? asm_exc_page_fault+0x27/0x30
[ 54.133386] ? file_free_rcu+0x2d/0x60
[ 54.133392] ? rcu_do_batch+0x14e/0x430
[ 54.133395] rcu_core+0x122/0x2a0
[ 54.133398] rcu_core_si+0xe/0x20
[ 54.133401] __do_softirq+0xd6/0x2e7
[ 54.133404] irq_exit_rcu+0x94/0xc0
[ 54.133408] sysvec_apic_timer_interrupt+0x80/0x90
[ 54.133411] </IRQ>
[ 54.133412] <TASK>
[ 54.133412] asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 54.133414] RIP: 0010:cpuidle_enter_state+0xd9/0x620
[ 54.133419] Code: 3d dc 69 18 5b e8 a7 66 67 ff 49 89 c7 0f 1f 44 00 00 31 ff e8 e8 73 67 ff 80 7d d0 00 0f 85 61 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 6d 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e7 03 00 00
[ 54.133421] RSP: 0018:ff48183c993f3e28 EFLAGS: 00000246
[ 54.133422] RAX: ff3bbd19bf6314c0 RBX: ff7a183c7f613c60 RCX: 0000000000000000
[ 54.133423] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 54.133424] RBP: ff48183c993f3e78 R08: 0000000c8d282382 R09: 00000000000c3500
[ 54.133425] R10: 0000000000000004 R11: 071c71c71c71c71c R12: ffffffffa64d6a20
[ 54.133426] R13: 0000000000000003 R14: 0000000000000003 R15: 0000000c8d282382
[ 54.133429] ? cpuidle_enter_state+0xc8/0x620
[ 54.133431] cpuidle_enter+0x2e/0x50
[ 54.133433] cpuidle_idle_call+0x142/0x1e0
[ 54.133437] do_idle+0x83/0xf0
[ 54.133439] cpu_startup_entry+0x20/0x30
[ 54.133440] start_secondary+0x12a/0x180
[ 54.133444] secondary_startup_64_no_verify+0xc2/0xcb
[ 54.133450] </TASK>
[ 54.133451] Modules linked in: intel_rapl_msr nls_iso8859_1 intel_rapl_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp irdma(+) rapl qat_4xxx i40e isst_if_mbox_pci intel_qat idxd isst_if_mmio pmt_crashlog pmt_telemetry isst_if_common idxd_bus authenc pmt_class intel_th_gth mei_me intel_th_pci mei intel_th switchtec acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 multipath linear mlx5_ib ib_uverbs ib_core raid0 ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt crct10dif_pclmul crc32_pclmul mlx5_core fb_sys_fops ghash_clmulni_intel mlxfw ixgbe cec aesni_intel psample crypto_simd rc_core xfrm_algo nvme cryptd i2c_i801 xhci_pci dca
[ 54.335555] tls ice intel_pmt drm pci_hyperv_intf i2c_ismt i2c_smbus xhci_pci_renesas mdio nvme_core wmi pinctrl_emmitsburg
[ 54.446210] CR2: ff3bbb1e67db8300
[ 54.449953] ---[ end trace f63e8c008d6a0ea1 ]---