kernel BUG at mm/zswap.c:1275

Bug #1939996 reported by Nicholas Guriev
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I have just found the following lines in dmesg. I do not know what caused the bug. But mentioned PIDs are now labeled as zombie. Apps running by these process have hang.

[37300.978014] ------------[ cut here ]------------
[37300.978020] kernel BUG at mm/zswap.c:1275!
[37300.978027] invalid opcode: 0000 [#2] SMP PTI
[37300.978032] CPU: 2 PID: 2735 Comm: marco Tainted: P D OE 5.11.0-25-generic #27-Ubuntu
[37300.978035] Hardware name: Acer Extensa 2520G/BA50_SL , BIOS V1.15 05/13/2016
[37300.978037] RIP: 0010:zswap_frontswap_load+0x273/0x280
[37300.978044] Code: 9c 3d 01 f3 48 ab 83 aa 68 13 00 00 01 eb 84 48 8d 7b 10 e8 df 55 97 00 c7 43 10 00 00 00 00 44 8b 6b 30 e9 47 ff ff ff 0f 0b <0f> 0b e8 e6 ba 96 00 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5
[37300.978047] RSP: 0000:ffffba3a42f2bbc0 EFLAGS: 00010282
[37300.978051] RAX: 0000000000000200 RBX: ffffda3a3fd03390 RCX: 0017ffffc0000000
[37300.978054] RDX: ffffe151ced99400 RSI: ffff8d52b25299c0 RDI: 0000000000000db0
[37300.978056] RBP: ffffba3a42f2bc48 R08: 0000000000000490 R09: ffffe151ceda5f40
[37300.978058] R10: 0000000000000000 R11: ffffba3a4057c000 R12: ffff8d50c18e42a0
[37300.978060] R13: 00000000ffffffea R14: ffff8d4f52597628 R15: ffff8d4f52597620
[37300.978063] FS: 00007f2d60ec2a80(0000) GS:ffff8d52b2500000(0000) knlGS:0000000000000000
[37300.978066] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37300.978068] CR2: 00007f9b24d275a0 CR3: 000000010fbc2005 CR4: 00000000003706e0
[37300.978071] Call Trace:
[37300.978076] __frontswap_load+0x80/0xd0
[37300.978079] swap_readpage+0x148/0x270
[37300.978084] swap_cluster_readahead+0x1c2/0x310
[37300.978089] swapin_readahead+0x2a/0x30
[37300.978093] do_swap_page+0x401/0x760
[37300.978098] handle_pte_fault+0x1ff/0x260
[37300.978102] __handle_mm_fault+0x599/0x7c0
[37300.978106] ? timerqueue_add+0x68/0xa0
[37300.978111] handle_mm_fault+0xd7/0x2b0
[37300.978115] do_user_addr_fault+0x1a3/0x450
[37300.978120] exc_page_fault+0x6c/0x150
[37300.978123] ? asm_exc_page_fault+0x8/0x30
[37300.978128] asm_exc_page_fault+0x1e/0x30
[37300.978132] RIP: 0033:0x7f2d617dc1b4
[37300.978135] Code: 05 c3 0f 1f 40 00 31 c0 48 83 7e 08 00 0f 95 c0 c3 0f 1f 44 00 00 f3 0f 1e fa 31 c0 48 39 3e 75 0d 48 8b 46 08 48 85 c0 74 04 <48> 8b 40 08 c3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 39 3e 74 07 31
[37300.978138] RSP: 002b:00007ffd61f7bd68 EFLAGS: 00010202
[37300.978141] RAX: 000055e753ce7f70 RBX: 0000000000000000 RCX: 0000000000000000
[37300.978143] RDX: 0000000000000000 RSI: 00007ffd61f7be30 RDI: 000055e753bf0e70
[37300.978145] RBP: 00007ffd61f7be00 R08: 000055e753f51730 R09: 000055e753f51730
[37300.978147] R10: 000000000000011c R11: 000000000000ff21 R12: 0000000000000000
[37300.978149] R13: 000055e753c89f90 R14: 000055e753c89f70 R15: 0000000000000000
[37300.978152] Modules linked in: uas usb_storage usbhid rfcomm cmac algif_hash algif_skcipher af_alg bnep vboxnetadp(OE) vboxnetflt(OE) xfrm_user xfrm_algo vboxdrv(OE) l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ccm joydev hid_multitouch hid_generic uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 ath3k videobuf2_common btusb rtsx_usb_ms btrtl btbcm btintel memstick bluetooth rtsx_usb_sdmmc videodev ecdh_generic mc ecc rtsx_usb snd_hda_codec_hdmi snd_soc_skl snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_acpi_intel_match snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_soc_acpi snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec snd_hda_core snd_hwdep soundwire_bus snd_soc_core intel_rapl_msr snd_compress intel_rapl_common ac97_bus x86_pkg_temp_thermal snd_pcm_dmaengine intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel binfmt_misc snd_pcm
[37300.978220] rapl intel_cstate snd_seq_midi snd_seq_midi_event intel_xhci_usb_role_switch mei_hdcp ath9k snd_rawmidi acer_wmi intel_wmi_thunderbolt at24 wmi_bmof ath9k_common sparse_keymap ath9k_hw ath nf_log_ipv4 snd_seq nf_log_common ipt_REJECT mac80211 snd_seq_device nf_reject_ipv4 xt_LOG snd_timer nls_iso8859_1 xt_multiport cfg80211 snd intel_lpss_pci i2c_i801 intel_lpss nft_limit mei_me xhci_pci idma64 i2c_smbus soundcore libarc4 intel_pch_thermal xhci_pci_renesas efi_pstore virt_dma mei i2c_hid hid xt_limit xt_addrtype xt_tcpudp wmi xt_conntrack acpi_pad nft_compat nft_counter nvidia_uvm(POE) nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack sch_fq_codel nf_defrag_ipv6 nf_defrag_ipv4 nf_tables msr nfnetlink parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic libcrc32c xor raid6_pq dm_mirror dm_region_hash dm_log dm_crypt nvidia_drm(POE) nvidia_modeset(POE) i915 nvidia(POE) i2c_algo_bit drm_kms_helper syscopyarea
[37300.978320] sysfillrect aesni_intel sysimgblt fb_sys_fops cec r8169 glue_helper rc_core crypto_simd cryptd realtek ahci input_leds drm serio_raw libahci video mac_hid zstd
[37300.978372] ---[ end trace b37c75c81a28f77d ]---
[37301.370437] RIP: 0010:zswap_frontswap_load+0x273/0x280
[37301.370454] Code: 9c 3d 01 f3 48 ab 83 aa 68 13 00 00 01 eb 84 48 8d 7b 10 e8 df 55 97 00 c7 43 10 00 00 00 00 44 8b 6b 30 e9 47 ff ff ff 0f 0b <0f> 0b e8 e6 ba 96 00 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5
[37301.370460] RSP: 0018:ffffba3a4427fbc0 EFLAGS: 00010282
[37301.370466] RAX: 0000000000000200 RBX: ffffda3a3fc03390 RCX: 0017ffffc0000000
[37301.370469] RDX: ffffe151c6717080 RSI: ffff8d52b24299c0 RDI: 0000000000000e20
[37301.370472] RBP: ffffba3a4427fc48 R08: 0000000000000710 R09: ffffe151c6717040
[37301.370474] R10: 0000000000000018 R11: ffffba3a402a3000 R12: ffff8d506eb7e150
[37301.370477] R13: 00000000ffffffea R14: ffff8d4f52597628 R15: ffff8d4f52597620
[37301.370481] FS: 00007f2d60ec2a80(0000) GS:ffff8d52b2500000(0000) knlGS:0000000000000000
[37301.370485] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37301.370488] CR2: 00007f9b24d275a0 CR3: 000000010fbc2005 CR4: 00000000003706e0
[37306.194009] [UFW ALLOW] IN=eth0 OUT= MAC=[REDACTED] SRC=[REDACTED] DST=[REDACTED] LEN=126 TOS=0x00 PREC=0x00 TTL=107 ID=42121 PROTO=UDP SPT=9798 DPT=6971 LEN=106
[37408.407227] ------------[ cut here ]------------
[37408.407232] kernel BUG at mm/zswap.c:1275!
[37408.407240] invalid opcode: 0000 [#3] SMP PTI
[37408.409615] CPU: 0 PID: 36995 Comm: nvidia-settings Tainted: P D OE 5.11.0-25-generic #27-Ubuntu
[37408.411945] Hardware name: Acer Extensa 2520G/BA50_SL , BIOS V1.15 05/13/2016
[37408.414135] RIP: 0010:zswap_frontswap_load+0x273/0x280
[37408.416300] Code: 9c 3d 01 f3 48 ab 83 aa 68 13 00 00 01 eb 84 48 8d 7b 10 e8 df 55 97 00 c7 43 10 00 00 00 00 44 8b 6b 30 e9 47 ff ff ff 0f 0b <0f> 0b e8 e6 ba 96 00 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5
[37408.421005] RSP: 0018:ffffba3a466cfbc0 EFLAGS: 00010282
[37408.423319] RAX: 0000000000000200 RBX: ffffda3a3fc03390 RCX: 0017ffffc0000000
[37408.425586] RDX: ffffe151c7b64680 RSI: ffff8d52b24299c0 RDI: 0000000000000ea0
[37408.427846] RBP: ffffba3a466cfc48 R08: 00000000000004e0 R09: ffffe151cd9fbbc0
[37408.430145] R10: 000000000000003c R11: ffffba3a402a3000 R12: ffff8d4f4253f0e0
[37408.432524] R13: 00000000ffffffea R14: ffff8d4f52597628 R15: ffff8d4f52597620
[37408.434920] FS: 00007fd154bb8b80(0000) GS:ffff8d52b2400000(0000) knlGS:0000000000000000
[37408.437294] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37408.439693] CR2: 00007f560e5b2008 CR3: 0000000291124006 CR4: 00000000003706f0
[37408.442071] Call Trace:
[37408.444544] __frontswap_load+0x80/0xd0
[37408.446945] swap_readpage+0x148/0x270
[37408.449344] swap_cluster_readahead+0x1c2/0x310
[37408.451705] swapin_readahead+0x2a/0x30
[37408.453938] do_swap_page+0x401/0x760
[37408.456268] ? ___sys_recvmsg+0xa3/0x130
[37408.458485] handle_pte_fault+0x1ff/0x260
[37408.460740] __handle_mm_fault+0x599/0x7c0
[37408.462978] ? timerqueue_add+0x68/0xa0
[37408.465238] handle_mm_fault+0xd7/0x2b0
[37408.467444] do_user_addr_fault+0x1a3/0x450
[37408.469652] exc_page_fault+0x6c/0x150
[37408.471753] ? asm_exc_page_fault+0x8/0x30
[37408.474055] asm_exc_page_fault+0x1e/0x30
[37408.476206] RIP: 0033:0x7fd1544e77e4
[37408.478161] Code: 48 89 c2 48 8b 45 10 42 8b 34 a8 e8 b6 c3 e5 ff 39 5d 18 7f c1 48 83 c4 08 5b 5d 41 5c 41 5d c3 66 0f 1f 44 00 00 f3 0f 1e fa <48> 8b 7e 08 48 85 ff 74 2b 53 31 d2 48 89 f3 48 8d 35 46 ff ff ff
[37408.482487] RSP: 002b:00007ffe7c121d18 EFLAGS: 00010246
[37408.484595] RAX: 00007fd1544e77e0 RBX: 0000000000000000 RCX: 00007ffe7c121eb0
[37408.486763] RDX: 0000000000000001 RSI: 0000562e96ac85f0 RDI: 00007fd14400a610
[37408.488942] RBP: 0000000000000000 R08: 00007ffe7c121e30 R09: 0000000000000000
[37408.491101] R10: 0000562e96ff4070 R11: 0000562e9658bff0 R12: 0000000000000001
[37408.493102] R13: 00007ffe7c121eb0 R14: 00007ffe7c121e30 R15: 0000562e97024dc0
[37408.495047] Modules linked in: uas usb_storage usbhid rfcomm cmac algif_hash algif_skcipher af_alg bnep vboxnetadp(OE) vboxnetflt(OE) xfrm_user xfrm_algo vboxdrv(OE) l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ccm joydev hid_multitouch hid_generic uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 ath3k videobuf2_common btusb rtsx_usb_ms btrtl btbcm btintel memstick bluetooth rtsx_usb_sdmmc videodev ecdh_generic mc ecc rtsx_usb snd_hda_codec_hdmi snd_soc_skl snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_acpi_intel_match snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_soc_acpi snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec snd_hda_core snd_hwdep soundwire_bus snd_soc_core intel_rapl_msr snd_compress intel_rapl_common ac97_bus x86_pkg_temp_thermal snd_pcm_dmaengine intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel binfmt_misc snd_pcm
[37408.495152] rapl intel_cstate snd_seq_midi snd_seq_midi_event intel_xhci_usb_role_switch mei_hdcp ath9k snd_rawmidi acer_wmi intel_wmi_thunderbolt at24 wmi_bmof ath9k_common sparse_keymap ath9k_hw ath nf_log_ipv4 snd_seq nf_log_common ipt_REJECT mac80211 snd_seq_device nf_reject_ipv4 xt_LOG snd_timer nls_iso8859_1 xt_multiport cfg80211 snd intel_lpss_pci i2c_i801 intel_lpss nft_limit mei_me xhci_pci idma64 i2c_smbus soundcore libarc4 intel_pch_thermal xhci_pci_renesas efi_pstore virt_dma mei i2c_hid hid xt_limit xt_addrtype xt_tcpudp wmi xt_conntrack acpi_pad nft_compat nft_counter nvidia_uvm(POE) nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack sch_fq_codel nf_defrag_ipv6 nf_defrag_ipv4 nf_tables msr nfnetlink parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic libcrc32c xor raid6_pq dm_mirror dm_region_hash dm_log dm_crypt nvidia_drm(POE) nvidia_modeset(POE) i915 nvidia(POE) i2c_algo_bit drm_kms_helper syscopyarea
[37408.507629] sysfillrect aesni_intel sysimgblt fb_sys_fops cec r8169 glue_helper rc_core crypto_simd cryptd realtek ahci input_leds drm serio_raw libahci video mac_hid zstd
[37408.524756] ---[ end trace b37c75c81a28f77e ]---
[37408.826179] RIP: 0010:zswap_frontswap_load+0x273/0x280
[37408.827721] Code: 9c 3d 01 f3 48 ab 83 aa 68 13 00 00 01 eb 84 48 8d 7b 10 e8 df 55 97 00 c7 43 10 00 00 00 00 44 8b 6b 30 e9 47 ff ff ff 0f 0b <0f> 0b e8 e6 ba 96 00 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5
[37408.831107] RSP: 0018:ffffba3a4427fbc0 EFLAGS: 00010282
[37408.832932] RAX: 0000000000000200 RBX: ffffda3a3fc03390 RCX: 0017ffffc0000000
[37408.834572] RDX: ffffe151c6717080 RSI: ffff8d52b24299c0 RDI: 0000000000000e20
[37408.836235] RBP: ffffba3a4427fc48 R08: 0000000000000710 R09: ffffe151c6717040
[37408.837953] R10: 0000000000000018 R11: ffffba3a402a3000 R12: ffff8d506eb7e150
[37408.839576] R13: 00000000ffffffea R14: ffff8d4f52597628 R15: ffff8d4f52597620
[37408.841189] FS: 00007fd154bb8b80(0000) GS:ffff8d52b2400000(0000) knlGS:0000000000000000
[37408.842944] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37408.844588] CR2: 00007f560e5b2008 CR3: 0000000291124006 CR4: 00000000003706f0

ProblemType: Bug
DistroRelease: Ubuntu 21.04
Package: linux-image-5.11.0-25-generic 5.11.0-25.27
ProcVersionSignature: Ubuntu 5.11.0-25.27-generic 5.11.22
Uname: Linux 5.11.0-25-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.11-0ubuntu65.1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: mymedia 2207 F.... pulseaudio
 /dev/snd/pcmC0D0p: mymedia 2207 F...m pulseaudio
CasperMD5CheckResult: unknown
CurrentDesktop: MATE
Date: Sun Aug 15 19:59:17 2021
MachineType: Acer Extensa 2520G
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/@/boot/vmlinuz-5.11.0-25-generic root=/dev/mapper/ubuntu-data ro rootflags=subvol=@ zswap.enabled=1 zswap.compressor=zstd zswap.zpool=zsmalloc quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-5.11.0-25-generic N/A
 linux-backports-modules-5.11.0-25-generic N/A
 linux-firmware 1.197.2
SourcePackage: linux
UpgradeStatus: Upgraded to hirsute on 2021-05-29 (78 days ago)
dmi.bios.date: 05/13/2016
dmi.bios.release: 0.0
dmi.bios.vendor: Insyde Corp.
dmi.bios.version: V1.15
dmi.board.asset.tag: Type2 - Board Asset Tag
dmi.board.name: BA50_SL
dmi.board.vendor: Acer
dmi.board.version: V1.15
dmi.chassis.type: 10
dmi.chassis.vendor: Chassis Manufacturer
dmi.chassis.version: Chassis Version
dmi.ec.firmware.release: 2.70
dmi.modalias: dmi:bvnInsydeCorp.:bvrV1.15:bd05/13/2016:br0.0:efr2.70:svnAcer:pnExtensa2520G:pvrV1.15:rvnAcer:rnBA50_SL:rvrV1.15:cvnChassisManufacturer:ct10:cvrChassisVersion:
dmi.product.family: SKL
dmi.product.name: Extensa 2520G
dmi.product.sku: Extensa 2520G_100C_1.15
dmi.product.version: V1.15
dmi.sys.vendor: Acer
modified.conffile..etc.apport.crashdb.conf: [modified]
mtime.conffile..etc.apport.crashdb.conf: 2021-07-30T21:15:00.732098

Revision history for this message
Nicholas Guriev (mymedia) wrote :
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Did this issue happen on previous kernels?

Revision history for this message
Nicholas Guriev (mymedia) wrote :

I can reproduce the bug on -22 and -25 kernels. Here are almost full dmesg logs. You can see SSD errors where swap is placed. This issue may be related to them somehow. To exclude hardware element of errors, I will try other more recent laptop or VirtualBox.

Steps:

1. sudo mount -t tmpfs -o mode=775 tmpfs /mnt

2. sudo dd if=/dev/urandom of=/mnt/garbage
Wait till ENOSPC. In my case, the garbage file becomes of 8 GB.

3. ./eatmem 5000000
This hand-made program only consumes a lot of memory in a loop then sleeps.

So I speculate when the kernel wants to displace tmpfs content to a swap, it encounters the bug. If I reverse second and third steps, the bug does not appear.

Revision history for this message
Nicholas Guriev (mymedia) wrote :
Revision history for this message
Nicholas Guriev (mymedia) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Ko-Zu (causeless) wrote :

This happens for me if zswap was enabled with zsmalloc pool.
ref: https://www.spinics.net/lists/linux-mm/msg245604.html

For 20.04 hwe, this also affects 5.11.0.25 but previous hwe 5.8.0.x were not affected.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

The fix is not in upstream 5.11 stable...

So maybe send SRU to ubuntu kernel to include the fix:
https://wiki.ubuntu.com/Kernel/Dev/StablePatchFormat

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.