Another panic with the same kernel on hidon, but a different trace:
[ 53.908045] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 53.910849] ipmi_si IPI0001:00: IPMI kcs interface initialized [ 53.916485] BUG: unable to handle page fault for address: ff3bbb1e67db8300 [ 53.916488] #PF: supervisor instruction fetch in kernel mode [ 53.916489] #PF: error_code(0x0011) - permissions violation [ 53.916490] PGD bc51202067 P4D bc51203067 PUD 114ee8063 PMD 127c6e063 PTE 8000000127db8163 [ 53.916495] Oops: 0011 [#1] SMP NOPTI [ 53.916498] CPU: 192 PID: 0 Comm: swapper/192 Not tainted 5.15.0-85-generic #95-Ubuntu [ 53.916501] Hardware name: NVIDIA DGXH100/DGXH100, BIOS 1.0.7 05/08/2023 [ 53.916502] RIP: 0010:0xff3bbb1e67db8300 [ 53.916505] Code: 13 00 00 00 00 00 00 00 00 00 00 50 0c 99 3c 18 48 ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 d0 64 1e bb 3b ff 28 50 30 6b 1e bb 3b ff 00 00 00 00 00 00 [ 53.916507] RSP: 0018:ff48183c9bc04eb0 EFLAGS: 00010202 [ 53.916509] RAX: ff3bbb1e67db8300 RBX: 0000000000000207 RCX: ffffffffa5a0a898 [ 53.916511] RDX: ff3bbc1ddb48bfa0 RSI: ffffffffa479751d RDI: ff3bbc1ddb48c000 [ 53.916512] RBP: ff48183c9bc04f20 R08: 0000000000000001 R09: 0000000000000000 [ 53.916513] R10: 0000000000000001 R11: 0000000000000000 R12: ff48183c9bc04ed8 [ 53.916514] R13: 0000000000000206 R14: ff3bbd19bf6322c0 R15: ff3bbc1dce49c080 [ 53.916516] FS: 0000000000000000(0000) GS:ff3bbd19bf600000(0000) knlGS:0000000000000000 [ 53.916517] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 53.916519] CR2: ff3bbb1e67db8300 CR3: 000000bc4f810003 CR4: 0000000000771ee0 [ 54.036823] ens6f0 speed is unknown, defaulting to 1000 [ 54.036851] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 54.045100] ens6f0 speed is unknown, defaulting to 1000 [ 54.053987] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 54.053989] PKRU: 55555554 [ 54.053990] Call Trace: [ 54.053992] <IRQ> [ 54.055846] 4xxx 0000:e8:00.0: qat_dev1 started 9 acceleration engines [ 54.060696] ens6f0 speed is unknown, defaulting to 1000 [ 54.068491] ? show_trace_log_lvl+0x1d6/0x2ea [ 54.078858] ens6f0 speed is unknown, defaulting to 1000 [ 54.082407] ? show_trace_log_lvl+0x1d6/0x2ea [ 54.133342] ? rcu_core+0x122/0x2a0 [ 54.133349] ? show_regs.part.0+0x23/0x29 [ 54.133352] ? __die_body.cold+0x8/0xd [ 54.133355] ? __die+0x2b/0x37 [ 54.133358] ? page_fault_oops+0x13b/0x170 [ 54.133363] ? search_exception_tables+0x61/0x70 [ 54.133367] ? kernelmode_fixup_or_oops+0xa2/0x120 [ 54.133370] ? __bad_area_nosemaphore+0x15d/0x1a0 [ 54.133372] ? bad_area_nosemaphore+0x16/0x20 [ 54.133374] ? do_kern_addr_fault+0x62/0x80 [ 54.133377] ? exc_page_fault+0xe7/0x170 [ 54.133382] ? asm_exc_page_fault+0x27/0x30 [ 54.133386] ? file_free_rcu+0x2d/0x60 [ 54.133392] ? rcu_do_batch+0x14e/0x430 [ 54.133395] rcu_core+0x122/0x2a0 [ 54.133398] rcu_core_si+0xe/0x20 [ 54.133401] __do_softirq+0xd6/0x2e7 [ 54.133404] irq_exit_rcu+0x94/0xc0 [ 54.133408] sysvec_apic_timer_interrupt+0x80/0x90 [ 54.133411] </IRQ> [ 54.133412] <TASK> [ 54.133412] asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 54.133414] RIP: 0010:cpuidle_enter_state+0xd9/0x620 [ 54.133419] Code: 3d dc 69 18 5b e8 a7 66 67 ff 49 89 c7 0f 1f 44 00 00 31 ff e8 e8 73 67 ff 80 7d d0 00 0f 85 61 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 6d 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e7 03 00 00 [ 54.133421] RSP: 0018:ff48183c993f3e28 EFLAGS: 00000246 [ 54.133422] RAX: ff3bbd19bf6314c0 RBX: ff7a183c7f613c60 RCX: 0000000000000000 [ 54.133423] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 54.133424] RBP: ff48183c993f3e78 R08: 0000000c8d282382 R09: 00000000000c3500 [ 54.133425] R10: 0000000000000004 R11: 071c71c71c71c71c R12: ffffffffa64d6a20 [ 54.133426] R13: 0000000000000003 R14: 0000000000000003 R15: 0000000c8d282382 [ 54.133429] ? cpuidle_enter_state+0xc8/0x620 [ 54.133431] cpuidle_enter+0x2e/0x50 [ 54.133433] cpuidle_idle_call+0x142/0x1e0 [ 54.133437] do_idle+0x83/0xf0 [ 54.133439] cpu_startup_entry+0x20/0x30 [ 54.133440] start_secondary+0x12a/0x180 [ 54.133444] secondary_startup_64_no_verify+0xc2/0xcb [ 54.133450] </TASK> [ 54.133451] Modules linked in: intel_rapl_msr nls_iso8859_1 intel_rapl_common i10nm_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp irdma(+) rapl qat_4xxx i40e isst_if_mbox_pci intel_qat idxd isst_if_mmio pmt_crashlog pmt_telemetry isst_if_common idxd_bus authenc pmt_class intel_th_gth mei_me intel_th_pci mei intel_th switchtec acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 multipath linear mlx5_ib ib_uverbs ib_core raid0 ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt crct10dif_pclmul crc32_pclmul mlx5_core fb_sys_fops ghash_clmulni_intel mlxfw ixgbe cec aesni_intel psample crypto_simd rc_core xfrm_algo nvme cryptd i2c_i801 xhci_pci dca [ 54.335555] tls ice intel_pmt drm pci_hyperv_intf i2c_ismt i2c_smbus xhci_pci_renesas mdio nvme_core wmi pinctrl_emmitsburg [ 54.446210] CR2: ff3bbb1e67db8300 [ 54.449953] ---[ end trace f63e8c008d6a0ea1 ]---
Another panic with the same kernel on hidon, but a different trace:
[ 53.908045] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) 67db8300 c04eb0 EFLAGS: 00010202 0(0000) GS:ff3bbd19bf60 0000(0000) knlGS:000000000 0000000 log_lvl+ 0x1d6/0x2ea log_lvl+ 0x1d6/0x2ea 0x122/0x2a0 part.0+ 0x23/0x29 cold+0x8/ 0xd oops+0x13b/ 0x170 exception_ tables+ 0x61/0x70 fixup_or_ oops+0xa2/ 0x120 nosemaphore+ 0x15d/0x1a0 nosemaphore+ 0x16/0x20 addr_fault+ 0x62/0x80 fault+0xe7/ 0x170 page_fault+ 0x27/0x30 rcu+0x2d/ 0x60 batch+0x14e/ 0x430 0x122/0x2a0 si+0xe/ 0x20 0xd6/0x2e7 rcu+0x94/ 0xc0 apic_timer_ interrupt+ 0x80/0x90 apic_timer_ interrupt+ 0x1b/0x20 enter_state+ 0xd9/0x620 3f3e28 EFLAGS: 00000246 enter_state+ 0xc8/0x620 enter+0x2e/ 0x50 idle_call+ 0x142/0x1e0 entry+0x20/ 0x30 +0x12a/ 0x180 startup_ 64_no_verify+ 0xc2/0xcb temp_thermal intel_powerclamp coretemp irdma(+) rapl qat_4xxx i40e isst_if_mbox_pci intel_qat idxd isst_if_mmio pmt_crashlog pmt_telemetry isst_if_common idxd_bus authenc pmt_class intel_th_gth mei_me intel_th_pci mei intel_th switchtec acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 multipath linear mlx5_ib ib_uverbs ib_core raid0 ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt crct10dif_pclmul crc32_pclmul mlx5_core fb_sys_fops ghash_clmulni_intel mlxfw ixgbe cec aesni_intel psample crypto_simd rc_core xfrm_algo nvme cryptd i2c_i801 xhci_pci dca
[ 53.910849] ipmi_si IPI0001:00: IPMI kcs interface initialized
[ 53.916485] BUG: unable to handle page fault for address: ff3bbb1e67db8300
[ 53.916488] #PF: supervisor instruction fetch in kernel mode
[ 53.916489] #PF: error_code(0x0011) - permissions violation
[ 53.916490] PGD bc51202067 P4D bc51203067 PUD 114ee8063 PMD 127c6e063 PTE 8000000127db8163
[ 53.916495] Oops: 0011 [#1] SMP NOPTI
[ 53.916498] CPU: 192 PID: 0 Comm: swapper/192 Not tainted 5.15.0-85-generic #95-Ubuntu
[ 53.916501] Hardware name: NVIDIA DGXH100/DGXH100, BIOS 1.0.7 05/08/2023
[ 53.916502] RIP: 0010:0xff3bbb1e
[ 53.916505] Code: 13 00 00 00 00 00 00 00 00 00 00 50 0c 99 3c 18 48 ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 d0 64 1e bb 3b ff 28 50 30 6b 1e bb 3b ff 00 00 00 00 00 00
[ 53.916507] RSP: 0018:ff48183c9b
[ 53.916509] RAX: ff3bbb1e67db8300 RBX: 0000000000000207 RCX: ffffffffa5a0a898
[ 53.916511] RDX: ff3bbc1ddb48bfa0 RSI: ffffffffa479751d RDI: ff3bbc1ddb48c000
[ 53.916512] RBP: ff48183c9bc04f20 R08: 0000000000000001 R09: 0000000000000000
[ 53.916513] R10: 0000000000000001 R11: 0000000000000000 R12: ff48183c9bc04ed8
[ 53.916514] R13: 0000000000000206 R14: ff3bbd19bf6322c0 R15: ff3bbc1dce49c080
[ 53.916516] FS: 000000000000000
[ 53.916517] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 53.916519] CR2: ff3bbb1e67db8300 CR3: 000000bc4f810003 CR4: 0000000000771ee0
[ 54.036823] ens6f0 speed is unknown, defaulting to 1000
[ 54.036851] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 54.045100] ens6f0 speed is unknown, defaulting to 1000
[ 54.053987] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 54.053989] PKRU: 55555554
[ 54.053990] Call Trace:
[ 54.053992] <IRQ>
[ 54.055846] 4xxx 0000:e8:00.0: qat_dev1 started 9 acceleration engines
[ 54.060696] ens6f0 speed is unknown, defaulting to 1000
[ 54.068491] ? show_trace_
[ 54.078858] ens6f0 speed is unknown, defaulting to 1000
[ 54.082407] ? show_trace_
[ 54.133342] ? rcu_core+
[ 54.133349] ? show_regs.
[ 54.133352] ? __die_body.
[ 54.133355] ? __die+0x2b/0x37
[ 54.133358] ? page_fault_
[ 54.133363] ? search_
[ 54.133367] ? kernelmode_
[ 54.133370] ? __bad_area_
[ 54.133372] ? bad_area_
[ 54.133374] ? do_kern_
[ 54.133377] ? exc_page_
[ 54.133382] ? asm_exc_
[ 54.133386] ? file_free_
[ 54.133392] ? rcu_do_
[ 54.133395] rcu_core+
[ 54.133398] rcu_core_
[ 54.133401] __do_softirq+
[ 54.133404] irq_exit_
[ 54.133408] sysvec_
[ 54.133411] </IRQ>
[ 54.133412] <TASK>
[ 54.133412] asm_sysvec_
[ 54.133414] RIP: 0010:cpuidle_
[ 54.133419] Code: 3d dc 69 18 5b e8 a7 66 67 ff 49 89 c7 0f 1f 44 00 00 31 ff e8 e8 73 67 ff 80 7d d0 00 0f 85 61 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 6d 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e7 03 00 00
[ 54.133421] RSP: 0018:ff48183c99
[ 54.133422] RAX: ff3bbd19bf6314c0 RBX: ff7a183c7f613c60 RCX: 0000000000000000
[ 54.133423] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 54.133424] RBP: ff48183c993f3e78 R08: 0000000c8d282382 R09: 00000000000c3500
[ 54.133425] R10: 0000000000000004 R11: 071c71c71c71c71c R12: ffffffffa64d6a20
[ 54.133426] R13: 0000000000000003 R14: 0000000000000003 R15: 0000000c8d282382
[ 54.133429] ? cpuidle_
[ 54.133431] cpuidle_
[ 54.133433] cpuidle_
[ 54.133437] do_idle+0x83/0xf0
[ 54.133439] cpu_startup_
[ 54.133440] start_secondary
[ 54.133444] secondary_
[ 54.133450] </TASK>
[ 54.133451] Modules linked in: intel_rapl_msr nls_iso8859_1 intel_rapl_common i10nm_edac nfit x86_pkg_
[ 54.335555] tls ice intel_pmt drm pci_hyperv_intf i2c_ismt i2c_smbus xhci_pci_renesas mdio nvme_core wmi pinctrl_emmitsburg
[ 54.446210] CR2: ff3bbb1e67db8300
[ 54.449953] ---[ end trace f63e8c008d6a0ea1 ]---