Kernel BUG with btrfs on linux 5.13

Bug #1959685 reported by Maxim Kuvyrkov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

I am hitting the below kernel oops every 2-5 days on my home server box. RIP is always "0010:btrfs_evict_inode+0xa1/0x480 [btrfs]" and address is always "0000000000000068". I'm using btrfs in raid1 configuration for general storage.

===
Feb 1 01:40:38 server kernel: [172867.493851] BUG: kernel NULL pointer dereference, address: 0000000000000068
Feb 1 01:40:38 server kernel: [172867.493885] #PF: supervisor write access in kernel mode
Feb 1 01:40:38 server kernel: [172867.493903] #PF: error_code(0x0002) - not-present page
Feb 1 01:40:38 server kernel: [172867.493920] PGD 0 P4D 0
Feb 1 01:40:38 server kernel: [172867.493931] Oops: 0002 [#1] SMP PTI
Feb 1 01:40:38 server kernel: [172867.493946] CPU: 2 PID: 104 Comm: kswapd0 Not tainted 5.13.0-27-generic #29~20.04.1-Ubuntu
Feb 1 01:40:38 server kernel: [172867.493972] Hardware name: Gigabyte Technology Co., Ltd. H67MA-D2H-B3/H67MA-D2H-B3, BIOS F1 02/11/2011
Feb 1 01:40:38 server kernel: [172867.493999] RIP: 0010:btrfs_evict_inode+0xa1/0x480 [btrfs]
Feb 1 01:40:38 server kernel: [172867.494070] Code: fd ff ff e8 d1 b1 9f c8 65 48 8b 04 25 c0 7b 01 00 48 89 45 b8 49 8b 84 24 f0 fd ff ff 48 85 c0 74 55 4d 8b b4 24 f8 fd ff ff <f0> 41 80 66 68 fe f0 41 80 66
68 f7 4c 89 f6 4c 89 ff e8 98 00 01
Feb 1 01:40:38 server kernel: [172867.494123] RSP: 0018:ffffab8dc042fad0 EFLAGS: 00010202
Feb 1 01:40:38 server kernel: [172867.494141] RAX: 0000000000000004 RBX: ffff8f01001251d8 RCX: ffff8f0100125040
Feb 1 01:40:38 server kernel: [172867.494163] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8f01001251d0
Feb 1 01:40:38 server kernel: [172867.494185] RBP: ffffab8dc042fb28 R08: 0000000000000000 R09: 0000000000000000
Feb 1 01:40:38 server kernel: [172867.494206] R10: ffff8f00c2ea6280 R11: 0000000000000001 R12: ffff8f01001253c0
Feb 1 01:40:38 server kernel: [172867.494228] R13: ffff8f01001251d0 R14: 0000000000000000 R15: ffff8f01001251b0
Feb 1 01:40:38 server kernel: [172867.494250] FS: 0000000000000000(0000) GS:ffff8f03cfa80000(0000) knlGS:0000000000000000
Feb 1 01:40:38 server kernel: [172867.494274] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 1 01:40:38 server kernel: [172867.494293] CR2: 0000000000000068 CR3: 0000000202810001 CR4: 00000000000606e0
Feb 1 01:40:38 server kernel: [172867.494315] Call Trace:
Feb 1 01:40:38 server kernel: [172867.494328] evict+0xd2/0x180
Feb 1 01:40:38 server kernel: [172867.494342] dispose_list+0x39/0x50
Feb 1 01:40:38 server kernel: [172867.494356] prune_icache_sb+0x5c/0x80
Feb 1 01:40:38 server kernel: [172867.494370] super_cache_scan+0x132/0x1b0
Feb 1 01:40:38 server kernel: [172867.494386] do_shrink_slab+0x144/0x2c0
Feb 1 01:40:38 server kernel: [172867.494402] shrink_slab+0x215/0x2b0
Feb 1 01:40:38 server kernel: [172867.494416] shrink_node+0x2dd/0x6f0
Feb 1 01:40:38 server kernel: [172867.494430] balance_pgdat+0x322/0x5f0
Feb 1 01:40:38 server kernel: [172867.494445] kswapd+0x1f8/0x380
Feb 1 01:40:38 server kernel: [172867.494458] ? wait_woken+0x80/0x80
Feb 1 01:40:38 server kernel: [172867.494473] ? balance_pgdat+0x5f0/0x5f0
Feb 1 01:40:38 server kernel: [172867.494487] kthread+0x12b/0x150
Feb 1 01:40:38 server kernel: [172867.494501] ? set_kthread_struct+0x40/0x40
Feb 1 01:40:38 server kernel: [172867.494516] ret_from_fork+0x22/0x30
Feb 1 01:40:38 server kernel: [172867.494532] Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defra
g_ipv6 nf_defrag_ipv4 bpfilter br_netfilter bridge stp llc aufs overlay snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_code
c snd_hda_core snd_hwdep snd_pcm intel_rapl_msr snd_seq_midi snd_seq_midi_event mei_hdcp intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp snd_rawmidi kvm_intel kvm snd_seq crct10dif_pclmul ghash_
clmulni_intel snd_seq_device cryptd i915 rapl snd_timer intel_cstate serio_raw joydev drm_kms_helper input_leds cec snd rc_core i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt video soundcore mei_me m
ei mac_hid sch_fq_codel msr parport_pc ppdev drm lp parport ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c uas usb_storage hid_generic usbhid hid
Feb 1 01:40:38 server kernel: [172867.494581] gpio_ich crc32_pclmul lpc_ich r8169 i2c_i801 ahci i2c_smbus xhci_pci realtek libahci xhci_pci_renesas
Feb 1 01:40:38 server kernel: [172867.497076] CR2: 0000000000000068
Feb 1 01:40:38 server kernel: [172867.497876] ---[ end trace 4ab76c62441acb52 ]---
Feb 1 01:40:38 server kernel: [172867.498695] RIP: 0010:btrfs_evict_inode+0xa1/0x480 [btrfs]
Feb 1 01:40:38 server kernel: [172867.499556] Code: fd ff ff e8 d1 b1 9f c8 65 48 8b 04 25 c0 7b 01 00 48 89 45 b8 49 8b 84 24 f0 fd ff ff 48 85 c0 74 55 4d 8b b4 24 f8 fd ff ff <f0> 41 80 66 68 fe f0 41 80 66 68 f7 4c 89 f6 4c 89 ff e8 98 00 01
Feb 1 01:40:38 server kernel: [172867.501356] RSP: 0018:ffffab8dc042fad0 EFLAGS: 00010202
Feb 1 01:40:38 server kernel: [172867.502297] RAX: 0000000000000004 RBX: ffff8f01001251d8 RCX: ffff8f0100125040
Feb 1 01:40:38 server kernel: [172867.503243] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8f01001251d0
Feb 1 01:40:38 server kernel: [172867.504204] RBP: ffffab8dc042fb28 R08: 0000000000000000 R09: 0000000000000000
Feb 1 01:40:38 server kernel: [172867.505177] R10: ffff8f00c2ea6280 R11: 0000000000000001 R12: ffff8f01001253c0
Feb 1 01:40:38 server kernel: [172867.506172] R13: ffff8f01001251d0 R14: 0000000000000000 R15: ffff8f01001251b0
Feb 1 01:40:38 server kernel: [172867.507165] FS: 0000000000000000(0000) GS:ffff8f03cfa80000(0000) knlGS:0000000000000000
Feb 1 01:40:38 server kernel: [172867.508183] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 1 01:40:38 server kernel: [172867.509210] CR2: 0000000000000068 CR3: 0000000202810001 CR4: 00000000000606e0
===

More instances of the same report is in the attached file.

I /think/ I didn't see these crashes while I was running 5.11 kernels. Will switch back to linux-image-5.11.0-46-generic and see if the crashes go away.

Attached is the output of
$ cd /var/log
$ (cat kern.log kern.log.1; zcat kern.log.{2,3,4}.gz) | grep -B10 -A50 "BUG:" > ~/kern.log
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu27.21
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D3', '/dev/snd/hwC0D2', '/dev/snd/pcmC0D3p', '/dev/snd/pcmC0D2c', '/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CasperMD5CheckResult: skip
CurrentDesktop: MATE
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2020-06-11 (608 days ago)
InstallationMedia: Ubuntu 20.04 LTS "Focal Fossa" - Release amd64 (20200423)
IwConfig:
 lo no wireless extensions.

 docker0 no wireless extensions.

 enp3s0 no wireless extensions.
MachineType: Gigabyte Technology Co., Ltd. H67MA-D2H-B3
Package: linux (not installed)
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-99-generic root=UUID=1a61d5dd-775e-43cd-9cd5-0056498f56fb ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 5.4.0-99.112-generic 5.4.162
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-99-generic N/A
 linux-backports-modules-5.4.0-99-generic N/A
 linux-firmware 1.187.25
RfKill:

Tags: focal
Uname: Linux 5.4.0-99-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin lxd plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 02/11/2011
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F1
dmi.board.name: H67MA-D2H-B3
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF1:bd02/11/2011:svnGigabyteTechnologyCo.,Ltd.:pnH67MA-D2H-B3:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnH67MA-D2H-B3:rvr:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
dmi.product.name: H67MA-D2H-B3
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1959685

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected focal
description: updated
Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote : CRDA.txt

apport information

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote : Lspci.txt

apport information

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote : Lspci-vt.txt

apport information

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote : Lsusb.txt

apport information

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote : Lsusb-t.txt

apport information

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote : Lsusb-v.txt

apport information

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote : ProcEnviron.txt

apport information

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote : ProcModules.txt

apport information

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote : PulseList.txt

apport information

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote : UdevDb.txt

apport information

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote : WifiSyslog.txt

apport information

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote : acpidump.txt

apport information

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote :

The kernel BUG also happens with 5.11.0-46 and 5.13.0-28 kernels. In 5.11 the location of the fault is different, but the symptoms are very similar to crash in 5.13.

Attached are kern.log excerpts for 5.11.0-46 and 5.13.0-28 crashes. I'm now testing 5.4.0-99-generic.

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote :

Kernel 5.4.0-99-generic is now running without a fault for 11 days. Kernels 5.11 and 5.13 crashed within 4-5 days of uptime.

Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote :

Kernel 5.4.0-99-generic is now running without a fault for 40 days.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Maxim Kuvyrkov (maxim-kuvyrkov) wrote :

I'm now trying to run v5.15 kernel after months of v5.4 running without crashes on BTRFS.

On first boot with v5.15 I've got on dmesg ...
[ 63.021417] BTRFS warning (device sdb): block group 12481442611200 has wrong amount of free space
[ 63.021424] BTRFS warning (device sdb): failed to load free space cache for block group 12481442611200, rebuilding it now
..., which I didn't get with either v5.11 or v5.13. Maybe this corruption was the problem that v5.4 ignored, and v5.11/v5.13 crashed on.

Uptime with v5.15 is 2 days so far.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.