NULL pointer dereference in split_swap_cluster
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-azure (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
We have encountered the following oops on one of our VMs:
Apr 7 14:02:19 rancher1 kernel: [2089793.273674] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007
Apr 7 14:02:19 rancher1 kernel: [2089793.282782] IP: split_swap_
Apr 7 14:02:19 rancher1 kernel: [2089793.330631] PGD 0 P4D 0
Apr 7 14:02:19 rancher1 kernel: [2089793.338279] Oops: 0002 [#1] SMP PTI
Apr 7 14:02:19 rancher1 kernel: [2089793.350774] Modules linked in: ufs msdos xfs cmac arc4 md4 nls_utf8 cifs ccm fscache xt_tcpudp xt_set ip_set_hash_net ip_set iptable_raw vxlan ip6_udp_tunnel udp_tunnel xt_nat xt_mark xfrm6_mode_tunnel xfrm4_mode_tunnel esp4 ansi_cprng veth ipt_MASQUERADE nf_nat_
Apr 7 14:02:19 rancher1 kernel: [2089793.618910] crc32_pclmul ghash_clmulni_intel pcbc hid_generic aesni_intel aes_x86_64 crypto_simd glue_helper cryptd hid_hyperv pata_acpi hyperv_fb cfbfillrect hyperv_keyboard cfbimgblt hid cfbcopyarea hv_netvsc hv_utils
Apr 7 14:02:19 rancher1 kernel: [2089793.692250] CPU: 0 PID: 47 Comm: kswapd0 Not tainted 4.15.0-1040-azure #44-Ubuntu
Apr 7 14:02:19 rancher1 kernel: [2089793.725316] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017
Apr 7 14:02:19 rancher1 kernel: [2089793.762206] RIP: 0010:split_
Apr 7 14:02:19 rancher1 kernel: [2089793.781768] RSP: 0018:ffffaaf900
Apr 7 14:02:19 rancher1 kernel: [2089793.800432] RAX: 0000000000000000 RBX: 00000000007290de RCX: 00000000007290de
Apr 7 14:02:19 rancher1 kernel: [2089793.824572] RDX: ffffaaf905001000 RSI: 0000000000118df9 RDI: 00000000007290de
Apr 7 14:02:19 rancher1 kernel: [2089793.854139] RBP: ffffaaf900fbfbe8 R08: 0000000000000001 R09: ffff9c647ffd4d00
Apr 7 14:02:19 rancher1 kernel: [2089793.882588] R10: ffff9c647ffd4000 R11: 0000000000000001 R12: fffff61ac4630000
Apr 7 14:02:19 rancher1 kernel: [2089793.909530] R13: fffff61ac4630080 R14: fffff61ac4638000 R15: fffff61ac4630040
Apr 7 14:02:19 rancher1 kernel: [2089793.935871] FS: 000000000000000
Apr 7 14:02:19 rancher1 kernel: [2089793.966483] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 7 14:02:19 rancher1 kernel: [2089793.987904] CR2: 0000000000000007 CR3: 000000003240a005 CR4: 00000000001606f0
Apr 7 14:02:19 rancher1 kernel: [2089794.017641] Call Trace:
Apr 7 14:02:19 rancher1 kernel: [2089794.028683] split_huge_
Apr 7 14:02:19 rancher1 kernel: [2089794.051250] deferred_
Apr 7 14:02:19 rancher1 kernel: [2089794.065213] shrink_
Apr 7 14:02:19 rancher1 kernel: [2089794.083856] shrink_
Apr 7 14:02:19 rancher1 kernel: [2089794.097963] kswapd+0x32a/0x770
Apr 7 14:02:19 rancher1 kernel: [2089794.110523] kthread+0x105/0x140
Apr 7 14:02:19 rancher1 kernel: [2089794.122680] ? mem_cgroup_
Apr 7 14:02:19 rancher1 kernel: [2089794.139139] ? kthread_
Apr 7 14:02:19 rancher1 kernel: [2089794.155543] ret_from_
Apr 7 14:02:19 rancher1 kernel: [2089794.167841] Code: c1 e3 07 48 c1 eb 10 48 8d 1c d8 48 89 df e8 49 9f 79 00 80 63 07 fb 48 85 db 74 17 48 89 df c6 07 00 0f 1f 40 00 31 c0 5b 5d c3 <80> 24 25 07 00 00 00 fb 31 c0 5b 5d c3 b8 f0 ff ff ff eb e9 0f
Apr 7 14:02:19 rancher1 kernel: [2089794.237196] RIP: split_swap_
Apr 7 14:02:19 rancher1 kernel: [2089794.259910] CR2: 0000000000000007
Apr 7 14:02:19 rancher1 kernel: [2089794.270891] ---[ end trace 5b797d89aee7fc1b ]---
The machine become unstable after this until reboot, like reading some namespaced process' command arguments hung, so it is possible that there was some kernel data structure corruption. The machine was under large memory pressure, when this happened.
this bug is present in the current upstream also (v5.8).
Red Hat is working on the fix (ref: 1739593, private).