Random kernel panics related to page fault in Ubuntu 23.04/23.10 server with QEMU
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Here are the general info about the server
01:06:49 labs@selfmadeninja ~ → lsb_release -a; uname -a; dpkg -l | grep ' linux-i'
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 23.10
Release: 23.10
Codename: mantic
Linux selfmadeninja 6.5.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 14:59:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
rc linux-image-
rc linux-image-
rc linux-image-
rc linux-image-
rc linux-image-
rc linux-image-
rc linux-image-
rc linux-image-
rc linux-image-
rc linux-image-
rc linux-image-
rc linux-image-
rc linux-image-
ii linux-image-
ii linux-image-
ii linux-image-generic 6.5.0.14.16 amd64 Generic Linux kernel image
I recently changed to Intel i9 14900K and it was working well for sometime. My system has 128G RAM and I am using QEMU/KVM and running many servers. I also have a Windows VM for gaming with GPU passthrough. I have dual GPUs installed, but using only one GPU at the moment, that too only when the Windows VM is on. I am using integrated graphics for the Ubuntu Server.
I am getting random panics, I was clueless and then I installed kdump to capture the panics. All of them point to something like this
[34006.012227] BUG: unable to handle page fault for address: ffff936a8abdd3c8
[34006.012234] #PF: supervisor read access in kernel mode
[34006.012235] #PF: error_code(0x0000) - not-present page
but they all originate from different areas. I have zipped all the crash from host I got so far including the kernel dump and its available here (crash.tar.gz- 4.5GB): https:/
So I turned off panic_on_oops in the host, and my VM started to panic (I have kdump configuted in one of my vm also) and the crash logs are here: https:/
This is my current /proc/cmdline:
BOOT_IMAGE=
-------
Panic Examples:
Panic 1 on host:
[43084.904967] BUG: kernel NULL pointer dereference, address: 0000000000000009
[43084.904972] #PF: supervisor instruction fetch in kernel mode
[43084.904974] #PF: error_code(0x0010) - not-present page
[43084.904975] PGD 0 P4D 0
[43084.904977] Oops: 0010 [#1] PREEMPT SMP NOPTI
[43084.904980] CPU: 9 PID: 0 Comm: swapper/9 Kdump: loaded Tainted: P O 6.2.0-37-generic #38-Ubuntu
[43084.904982] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 1402 09/08/2023
[43084.904983] RIP: 0010:0x9
[43084.904987] Code: Unable to access opcode bytes at 0xffffffffffffffdf.
[43084.904988] RSP: 0018:ffffb9f000
[43084.904990] RAX: 0000000000000000 RBX: ffff9fd9419ccd40 RCX: 0000000000000000
[43084.904991] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[43084.904992] RBP: ffffffff8957bd15 R08: 0000000000000000 R09: 0000000000000000
[43084.904993] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[43084.904993] R13: 0000000000000004 R14: 0000000000000009 R15: ffffb9f0001cbdd8
[43084.904995] FS: 000000000000000
[43084.904996] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43084.904997] CR2: ffffffffffffffdf CR3: 0000000150364000 CR4: 0000000000752ee0
[43084.904998] PKRU: 55555554
[43084.904999] Call Trace:
[43084.905000] <TASK>
[43084.905002] ? show_regs+0x6d/0x80
[43084.905006] ? __die+0x24/0x80
[43084.905008] ? page_fault_
[43084.905010] ? do_user_
[43084.905012] ? exc_page_
[43084.905015] ? asm_exc_
[43084.905018] ? psi_task_
[43084.905022] ? enqueue_
[43084.905025] ? ttwu_do_
[43084.905027] ? sched_ttwu_
[43084.905029] ? __flush_
[43084.905032] ? flush_smp_
[43084.905034] ? do_idle+0xb2/0x100
[43084.905036] ? cpu_startup_
[43084.905038] ? start_secondary
[43084.905040] ? secondary_
[43084.905044] </TASK>
[43084.905044] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd snd_hda_
[43084.905086] snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_
[43084.905133] multipath linear hid_generic usbhid hid i915 drm_buddy i2c_algo_bit ttm drm_display_helper cec rc_core drm_kms_helper syscopyarea mfd_aaeon sysfillrect asus_wmi sysimgblt ledtrig_audio crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 aesni_intel sparse_keymap crypto_simd r8169 nvme cryptd platform_profile i2c_i801 spi_intel_pci ahci i2c_smbus drm nvme_core spi_intel intel_lpss_pci realtek xhci_pci libahci intel_lpss nvme_common vmd xhci_pci_renesas idma64 video wmi pinctrl_alderlake
[43084.905160] CR2: 0000000000000009
Panic 2 on host:
[102172.140109] BUG: unable to handle page fault for address: fffffffffffbcca3
[102172.140114] #PF: supervisor read access in kernel mode
[102172.140116] #PF: error_code(0x0000) - not-present page
[102172.140117] PGD 1370e15067 P4D 1370e15067 PUD 1370e17067 PMD 0
[102172.140120] Oops: 0000 [#1] PREEMPT SMP NOPTI
[102172.140123] CPU: 9 PID: 117178 Comm: worker Kdump: loaded Tainted: P O 6.2.0-39-generic #40-Ubuntu
[102172.140125] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 1402 09/08/2023
[102172.140126] RIP: 0010:mempool_
[102172.140131] Code: 00 48 c7 45 b0 00 00 00 00 48 c7 45 b8 00 00 00 00 48 c7 45 c0 00 00 00 00 48 c7 45 c8 00 00 00 00 41 81 e5 00 04 00 00 75 5e <41> 89 dc 81 e3 bf fb ff ff 41 81 cc 00 20 09 00 81 cb 00 20 09 00
[102172.140133] RSP: 0018:ffffa8244f
[102172.140134] RAX: 0000000000000000 RBX: 0000000000000cc0 RCX: 0000000000000cc0
[102172.140136] RDX: 0000000000000000 RSI: 0000000000000cc0 RDI: ffffffffb9d1ce58
[102172.140137] RBP: ffffa8244f7ab9c0 R08: ffffffffb9d1ce40 R09: ffff89ab82fb7080
[102172.140138] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000cc0
[102172.140139] R13: 0000000000000400 R14: ffffffffb9d1ce58 R15: ffffa8244f7abb08
[102172.140140] FS: 00007fa6ba7fc6c
[102172.140141] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[102172.140142] CR2: fffffffffffbcca3 CR3: 000000012fbc4000 CR4: 0000000000752ee0
[102172.140143] PKRU: 55555554
[102172.140144] Call Trace:
[102172.140146] <TASK>
[102172.140148] ? show_regs+0x6d/0x80
[102172.140151] ? __die+0x24/0x80
[102172.140153] ? page_fault_
[102172.140156] ? kernelmode_
[102172.140158] ? __bad_area_
[102172.140159] ? update_
[102172.140162] ? bad_area_
[102172.140163] ? do_kern_
[102172.140165] ? exc_page_
[102172.140172] ? asm_exc_
[102172.140178] ? mempool_
[102172.140182] bio_alloc_
[102172.140188] iomap_dio_
[102172.140192] iomap_dio_
[102172.140195] __iomap_
[102172.140200] iomap_dio_
[102172.140202] ext4_dio_
[102172.140205] ext4_file_
[102172.140207] do_iter_
[102172.140210] do_iter_
[102172.140212] vfs_writev+
[102172.140215] __x64_sys_
[102172.140217] do_syscall_
[102172.140220] ? exit_to_
[102172.140223] ? syscall_
[102172.140225] ? do_syscall_
[102172.140227] ? do_syscall_
[102172.140228] ? do_syscall_
[102172.140230] ? do_syscall_
[102172.140232] ? do_syscall_
[102172.140233] entry_SYSCALL_
[102172.140235] RIP: 0033:0x7fb73c1121d0
[102172.140237] Code: 3c 24 48 89 4c 24 18 e8 ee 92 f7 ff 4c 8b 54 24 18 8b 3c 24 45 31 c0 41 89 c1 8b 54 24 14 48 8b 74 24 08 b8 28 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 30 44 89 cf 48 89 04 24 e8 3c 93 f7 ff 48 8b
[102172.140238] RSP: 002b:00007fa6ba
[102172.140240] RAX: ffffffffffffffda RBX: 00007fa5a14d4b10 RCX: 00007fb73c1121d0
[102172.140241] RDX: 0000000000000018 RSI: 0000558de9da7360 RDI: 000000000000000f
[102172.140242] RBP: 0000558de9ad9440 R08: 0000000000000000 R09: 0000000000000000
[102172.140243] R10: 0000014be83c8000 R11: 0000000000000246 R12: 0000558de9ad9450
[102172.140244] R13: 0000558de92ed95b R14: 00007fa6baffc300 R15: 00007fa6b9ffc000
[102172.140246] </TASK>
[102172.140247] Modules linked in: tls snd_seq_dummy snd_hrtimer rfcomm zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_vsock vmw_vsock_
[102172.140287] snd_hda_codec snd_hda_core snd_hwdep soundwire_bus intel_rapl_msr snd_soc_core intel_rapl_common snd_compress ac97_bus snd_pcm_dmaengine intel_tcc_cooling snd_pcm x86_pkg_
[102172.140333] syscopyarea mfd_aaeon sysfillrect asus_wmi sysimgblt crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 aesni_intel ledtrig_audio crypto_simd sparse_keymap nvme i2c_i801 spi_intel_pci r8169 cryptd platform_profile drm intel_lpss_pci ahci i2c_smbus spi_intel nvme_core realtek intel_lpss libahci xhci_pci nvme_common idma64 vmd xhci_pci_renesas video wmi pinctrl_alderlake
[102172.140352] CR2: fffffffffffbcca3
Panic 3 on host:
[34006.012227] BUG: unable to handle page fault for address: ffff936a8abdd3c8
[34006.012234] #PF: supervisor read access in kernel mode
[34006.012235] #PF: error_code(0x0000) - not-present page
[34006.012236] PGD 0 P4D 0
[34006.012239] Oops: 0000 [#1] PREEMPT SMP NOPTI
[34006.012241] CPU: 7 PID: 11561 Comm: worker Kdump: loaded Tainted: P O 6.5.0-14-generic #14-Ubuntu
[34006.012243] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 1601 11/23/2023
[34006.012244] RIP: 0010:__
[34006.012249] Code: c0 0f 85 5e 02 00 00 4d 85 e4 0f 84 40 02 00 00 0f 1f 44 00 00 48 c7 44 24 10 00 00 00 00 49 8b 04 24 65 48 03 05 80 70 7b 58 <48> 8b 50 08 4c 8b 10 48 8b 40 10 4d 85 d2 74 1f 48 85 c0 74 1a 41
[34006.012250] RSP: 0018:ffffaab07e
[34006.012252] RAX: ffff936a8abdd3c0 RBX: 0000000000092820 RCX: 0000000000001000
[34006.012253] RDX: 00000000ffffffff RSI: 0000000000092820 RDI: ffff9ce5c0042d00
[34006.012254] RBP: ffffaab07ef7f900 R08: ffffffffa77931f5 R09: 0000000000000000
[34006.012255] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ce5c0042d00
[34006.012256] R13: 0000000000092820 R14: 00000000ffffffff R15: 0000000000001000
[34006.012257] FS: 00007fe5bbfff6c
[34006.012258] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[34006.012259] CR2: ffff936a8abdd3c8 CR3: 00000002065be000 CR4: 0000000000752ee0
[34006.012260] PKRU: 55555554
[34006.012261] Call Trace:
[34006.012262] <TASK>
[34006.012264] ? show_regs+0x6d/0x80
[34006.012267] ? __die+0x24/0x80
[34006.012268] ? page_fault_
[34006.012271] ? kernelmode_
[34006.012273] ? __bad_area_
[34006.012274] ? bad_area_
[34006.012275] ? do_kern_
[34006.012276] ? exc_page_
[34006.012278] ? asm_exc_
[34006.012280] ? mempool_
[34006.012283] ? __kmem_
[34006.012284] ? mempool_
[34006.012285] ? mempool_
[34006.012286] __kmalloc+
[34006.012289] mempool_
[34006.012290] mempool_
[34006.012292] nvme_map_
[34006.012296] ? __submit_
[34006.012298] nvme_prep_
[34006.012301] nvme_queue_
[34006.012304] blk_mq_
[34006.012307] blk_mq_
[34006.012308] __blk_flush_
[34006.012309] blk_finish_
[34006.012310] __iomap_
[34006.012313] iomap_dio_
[34006.012314] ext4_dio_
[34006.012316] ext4_file_
[34006.012318] do_iter_
[34006.012320] do_iter_
[34006.012321] vfs_writev+
[34006.012323] __x64_sys_
[34006.012325] do_syscall_
[34006.012326] ? sysvec_
[34006.012327] entry_SYSCALL_
[34006.012328] RIP: 0033:0x7ff69eb24c30
[34006.012359] Code: 3c 24 48 89 4c 24 18 e8 5e ec f6 ff 4c 8b 54 24 18 8b 3c 24 45 31 c0 41 89 c1 8b 54 24 14 48 8b 74 24 08 b8 28 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 30 44 89 cf 48 89 04 24 e8 ac ec f6 ff 48 8b
[34006.012360] RSP: 002b:00007fe5bb
[34006.012361] RAX: ffffffffffffffda RBX: 000055b93ff9a100 RCX: 00007ff69eb24c30
[34006.012361] RDX: 0000000000000002 RSI: 000055b93feb0400 RDI: 000000000000000f
[34006.012362] RBP: 00007fe4aee6fb00 R08: 0000000000000000 R09: 0000000000000000
[34006.012363] R10: 00000155ccfa0000 R11: 0000000000000246 R12: 000055b93fbdca20
[34006.012363] R13: 000055b93e6a41f7 R14: 00007fe5d0ff82c0 R15: 00007fe5bb7ff000
[34006.012364] </TASK>
[34006.012365] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd snd_hda_
[34006.012395] snd_soc_
[34006.012430] linear hid_generic usbhid hid i915 drm_buddy i2c_algo_bit ttm drm_display_helper cec rc_core crct10dif_pclmul drm_kms_helper crc32_pclmul polyval_clmulni polyval_generic nvme ghash_clmulni_intel aesni_intel r8169 nvme_core ahci crypto_simd cryptd realtek nvme_common libahci drm intel_lpss_pci video intel_lpss xhci_pci idma64 xhci_pci_renesas vmd wmi pinctrl_alderlake
[34006.012445] CR2: ffff936a8abdd3c8
summary: |
- Random kernel panics related to page fault in Ubuntu 23.4.23.10 server + Random kernel panics related to page fault in Ubuntu 23.4/23.10 server with QEMU |
description: | updated |
affects: | ubuntu → linux (Ubuntu) |
summary: |
- Random kernel panics related to page fault in Ubuntu 23.4/23.10 server + Random kernel panics related to page fault in Ubuntu 23.04/23.10 server with QEMU |
tags: | added: lunar mantic |
Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https:/ /wiki.ubuntu. com/Bugs/ FindRightPackag e. You might also ask for help in the #ubuntu-bugs irc channel on Libera.chat.
To change the source package that this bug is filed about visit https:/ /bugs.launchpad .net/ubuntu/ +bug/2046329/ +editstatus and add the package name in the text box next to the word Package.
[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]