btrfs not writable when mounted without "skip_balance"

Bug #1890951 reported by Johannes Rohr
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Yesterday, I upgraded to Ubuntu 20.04, since when there is a problem with the btrfs volume it is installed on. The system hung when launching a btrfs balance and on next boot, it hung indefinitely, when trying to created the volatile files and dirs.

Booting from a Debian buster rescue system, I noticed that I was unable to create or delete any files and I noticed the following in dmesg:

[Sun Aug 9 12:21:35 2020] ------------[ cut here ]------------
[Sun Aug 9 12:21:35 2020] kernel BUG at fs/btrfs/relocation.c:2626!
[Sun Aug 9 12:21:35 2020] invalid opcode: 0000 [#1] SMP PTI
[Sun Aug 9 12:21:35 2020] CPU: 1 PID: 4537 Comm: btrfs-balance Tainted: G O 5.4.47 #1
[Sun Aug 9 12:21:35 2020] Hardware name: FUJITSU D3401-H1/D3401-H1, BIOS V5.0.0.11 R1.14.0 for D3401-H1x 06/09/2016
[Sun Aug 9 12:21:35 2020] RIP: 0010:select_reloc_root+0x5b/0x19f [btrfs]
[Sun Aug 9 12:21:35 2020] Code: c0 c7 44 24 04 00 00 00 00 e8 8b 9d 17 e1 48 89 df 4c 89 f6 48 8d 54 24 04 e8 9c e6 ff ff 48 8b 58 60 48 89 c5 48 85 db 75 02 <0f> 0b 48 8b 43 20 a8 02 75 02 0f 0b 48 83 bb df 01 00 00 f8 75 45
[Sun Aug 9 12:21:35 2020] RSP: 0018:ffff8887e0b0bb20 EFLAGS: 00010246
[Sun Aug 9 12:21:35 2020] RAX: ffff8887dfab5280 RBX: 0000000000000000 RCX: 0000000000000000
[Sun Aug 9 12:21:35 2020] RDX: ffff8887e0b0bb24 RSI: ffff8887e0b0bc10 RDI: ffff8887dfab52c0
[Sun Aug 9 12:21:35 2020] RBP: ffff8887dfab5280 R08: ffff8887dfab52c0 R09: ffffffffa0491e7e
[Sun Aug 9 12:21:35 2020] R10: ffff8887f4ba7e70 R11: ffff8888090ed158 R12: ffff8887dfab5280
[Sun Aug 9 12:21:35 2020] R13: ffff8887fd330800 R14: ffff8887e0b0bc10 R15: ffff8887e7fa66e8
[Sun Aug 9 12:21:35 2020] FS: 0000000000000000(0000) GS:ffff88880e240000(0000) knlGS:0000000000000000
[Sun Aug 9 12:21:35 2020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sun Aug 9 12:21:35 2020] CR2: 000055b4d5b7cfe0 CR3: 000000000200a004 CR4: 00000000003606e0
[Sun Aug 9 12:21:35 2020] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Sun Aug 9 12:21:35 2020] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Sun Aug 9 12:21:35 2020] Call Trace:
[Sun Aug 9 12:21:35 2020] do_relocation+0xb6/0x4c2 [btrfs]
[Sun Aug 9 12:21:35 2020] ? calcu_metadata_size.isra.36.constprop.42+0x9e/0xc4 [btrfs]
[Sun Aug 9 12:21:35 2020] ? do_raw_spin_lock+0x2f/0x5a
[Sun Aug 9 12:21:35 2020] ? btrfs_block_rsv_refill+0x4b/0x8b [btrfs]
[Sun Aug 9 12:21:35 2020] relocate_tree_blocks+0x301/0x427 [btrfs]
[Sun Aug 9 12:21:35 2020] ? tree_insert+0x49/0x4e [btrfs]
[Sun Aug 9 12:21:35 2020] ? add_tree_block.isra.38+0x11e/0x144 [btrfs]
[Sun Aug 9 12:21:35 2020] relocate_block_group+0x279/0x49e [btrfs]
[Sun Aug 9 12:21:35 2020] btrfs_relocate_block_group+0x15e/0x23d [btrfs]
[Sun Aug 9 12:21:35 2020] btrfs_relocate_chunk+0x25/0x8c [btrfs]
[Sun Aug 9 12:21:35 2020] btrfs_balance+0xaf0/0xd45 [btrfs]
[Sun Aug 9 12:21:35 2020] ? btrfs_balance+0xd45/0xd45 [btrfs]
[Sun Aug 9 12:21:35 2020] balance_kthread+0x32/0x46 [btrfs]
[Sun Aug 9 12:21:35 2020] kthread+0xf5/0xfa
[Sun Aug 9 12:21:35 2020] ? kthread_associate_blkcg+0x86/0x86
[Sun Aug 9 12:21:35 2020] ret_from_fork+0x3a/0x50
[Sun Aug 9 12:21:35 2020] Modules linked in: btrfs xor zstd_decompress zstd_compress lzo_compress lzo_decompress zlib_deflate raid6_pq libcrc32c sd_mod ipmi_devintf ipmi_msghandler sg x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crc32_pclmul crc32c_intel iTCO_wdt ghash_clmulni_intel aesni_intel crypto_simd psmouse ahci cryptd libahci i2c_i801 serio_raw glue_helper intel_pch_thermal evdev video thermal acpi_pad button fan jc42 ftsteutates nct6775 hwmon_vid coretemp ip_tables x_tables autofs4 e1000e
[Sun Aug 9 12:21:36 2020] ---[ end trace 442b443de6cecc6e ]---
[Sun Aug 9 12:21:36 2020] RIP: 0010:select_reloc_root+0x5b/0x19f [btrfs]
[Sun Aug 9 12:21:36 2020] Code: c0 c7 44 24 04 00 00 00 00 e8 8b 9d 17 e1 48 89 df 4c 89 f6 48 8d 54 24 04 e8 9c e6 ff ff 48 8b 58 60 48 89 c5 48 85 db 75 02 <0f> 0b 48 8b 43 20 a8 02 75 02 0f 0b 48 83 bb df 01 00 00 f8 75 45

So this is obviously a btrfs bug that got introduced by the newer btrfs code in the newer kernel, although, booting an older kernel didn't fix it. Apparently something about the fs is messed up now and no balance can be run. The only temporary remedy I found was to add the skip_balance mount option to /etc/fstab, but of course, this is a crutch, and when you can't run balance, then the fs is bound to run out of space at some point.
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Aug 9 12:53 seq
 crw-rw---- 1 root audio 116, 33 Aug 9 12:53 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu27.6
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CasperMD5CheckResult: skip
DistroRelease: Ubuntu 20.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb: Error: [Errno 2] No such file or directory: 'lsusb'
Lsusb-t: Error: [Errno 2] No such file or directory: 'lsusb'
Lsusb-v: Error: [Errno 2] No such file or directory: 'lsusb'
MachineType: FUJITSU D3401-H1
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=screen-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/ROOT/boot/vmlinuz-5.4.0-42-generic root=UUID=b80344d6-ae49-4557-bd4d-641d0afcda3e ro rootflags=subvol=ROOT
ProcVersionSignature: Ubuntu 5.4.0-42.46-generic 5.4.44
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-42-generic N/A
 linux-backports-modules-5.4.0-42-generic N/A
 linux-firmware 1.187.2
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: focal
Uname: Linux 5.4.0-42-generic x86_64
UpgradeStatus: Upgraded to focal on 2020-08-08 (1 days ago)
UserGroups: adm docker sudo
_MarkForUpload: True
dmi.bios.date: 06/09/2016
dmi.bios.vendor: FUJITSU // American Megatrends Inc.
dmi.bios.version: V5.0.0.11 R1.14.0 for D3401-H1x
dmi.board.name: D3401-H1
dmi.board.vendor: FUJITSU
dmi.board.version: S26361-D3401-H1
dmi.chassis.type: 3
dmi.chassis.vendor: FUJITSU
dmi.modalias: dmi:bvnFUJITSU//AmericanMegatrendsInc.:bvrV5.0.0.11R1.14.0forD3401-H1x:bd06/09/2016:svnFUJITSU:pnD3401-H1:pvr:rvnFUJITSU:rnD3401-H1:rvrS26361-D3401-H1:cvnFUJITSU:ct3:cvr:
dmi.product.family: ESPRIMO-FTS
dmi.product.name: D3401-H1
dmi.product.sku: S26361-Kxxx-Vyyy
dmi.sys.vendor: FUJITSU

Revision history for this message
In , furlongm (furlongm-linux-kernel-bugs) wrote :
Download full text (5.0 KiB)

If I mount with skip_balance, the following does not happen. But if I try to start a balance processes go into a D state and the balance hangs.

[ 173.488407] kernel BUG at ../fs/btrfs/relocation.c:1449!
[ 173.488410] invalid opcode: 0000 [#1] SMP PTI
[ 173.488414] CPU: 7 PID: 2542 Comm: btrfs-balance Tainted: G O 4.12.14-lp150.12.58-default #1 openSUSE Leap 15.0
[ 173.488416] Hardware name: Dell Inc. OptiPlex 9020/00V62H, BIOS A24 10/24/2018
[ 173.488417] task: ffff8803c85f80c0 task.stack: ffffc90002234000
[ 173.488439] RIP: 0010:create_reloc_root+0x1dc/0x1f0 [btrfs]
[ 173.488440] RSP: 0018:ffffc90002237910 EFLAGS: 00010282
[ 173.488442] RAX: 00000000ffffffef RBX: ffff880408c99c00 RCX: 0000000000000001
[ 173.488444] RDX: 0000000000000005 RSI: ffff88040bdc2a80 RDI: 0000000000000286
[ 173.488445] RBP: ffff8803dd514d98 R08: ffff88040b8c5a80 R09: 0000000000000000
[ 173.488447] R10: 0000000000000002 R11: ffff88040bdc2a80 R12: fffffffffffffff7
[ 173.488448] R13: ffff88040679c000 R14: ffff8803efafe800 R15: 00000000000c4000
[ 173.488450] FS: 0000000000000000(0000) GS:ffff88041ebc0000(0000) knlGS:0000000000000000
[ 173.488451] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 173.488453] CR2: 00007f26c9915b00 CR3: 000000000200a006 CR4: 00000000001606e0
[ 173.488454] Call Trace:
[ 173.488468] btrfs_init_reloc_root+0x5b/0xa0 [btrfs]
[ 173.488478] record_root_in_trans+0xb7/0xf0 [btrfs]
[ 173.488488] btrfs_record_root_in_trans+0x4e/0x60 [btrfs]
[ 173.488497] start_transaction+0xa6/0x410 [btrfs]
[ 173.488506] __btrfs_prealloc_file_range+0xbb/0x460 [btrfs]
[ 173.488516] btrfs_prealloc_file_range+0x10/0x20 [btrfs]
[ 173.488527] prealloc_file_extent_cluster+0x113/0x200 [btrfs]
[ 173.488537] relocate_file_extent_cluster+0x8d/0x470 [btrfs]
[ 173.488546] ? __btrfs_end_transaction+0x1c1/0x2e0 [btrfs]
[ 173.488555] relocate_data_extent+0x5f/0xc0 [btrfs]
[ 173.488564] relocate_block_group+0x495/0x6f0 [btrfs]
[ 173.488573] btrfs_relocate_block_group+0x188/0x230 [btrfs]
[ 173.488583] btrfs_relocate_chunk+0x4a/0xf0 [btrfs]
[ 173.488592] btrfs_shrink_device+0x1c4/0x4c0 [btrfs]
[ 173.488602] __btrfs_balance+0xd4/0xbe0 [btrfs]
[ 173.488611] ? insert_balance_item.isra.33+0x9a/0x350 [btrfs]
[ 173.488614] ? printk+0x43/0x4b
[ 173.488624] ? btrfs_dev_replace_lock.part.6+0x15/0x20 [btrfs]
[ 173.488634] ? btrfs_dev_replace_lock+0x85/0x90 [btrfs]
[ 173.488643] btrfs_balance+0x2de/0x5c0 [btrfs]
[ 173.488651] ? btrfs_balance+0x5c0/0x5c0 [btrfs]
[ 173.488659] balance_kthread+0x56/0x80 [btrfs]
[ 173.488662] kthread+0x11a/0x130
[ 173.488664] ? kthread_create_on_node+0x40/0x40
[ 173.488667] ret_from_fork+0x35/0x40
[ 173.488669] Code: 48 c7 83 dc 00 00 00 00 00 00 00 48 c7 83 e4 00 00 00 00 00 00 00 c6 83 ec 00 00 00 00 c6 83 ed 00 00 00 00 e9 1b ff ff ff 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f
[ 173.488687] Modules linked in: usblp ccm af_packet ebtable_filter ebtables nf_log_ipv6 xt_comment nf_log_ipv4 nf_log_common xt_LOG xt_limit devlink nfnetlink_cthelper nfnetlink vboxpci(O) vboxnetadp(O) vboxnetflt(O) ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_pk...

Read more...

Revision history for this message
In , furlongm (furlongm-linux-kernel-bugs) wrote :
Download full text (6.0 KiB)

And another:

[ 3088.350638] ------------[ cut here ]------------
[ 3088.350639] kernel BUG at ../fs/btrfs/relocation.c:1449!
[ 3088.350643] invalid opcode: 0000 [#1] SMP PTI
[ 3088.350647] CPU: 2 PID: 5222 Comm: btrfs Tainted: G O 4.12.14-lp150.12.58-default #1 openSUSE Leap 15.0
[ 3088.350648] Hardware name: Dell Inc. OptiPlex 9020/00V62H, BIOS A24 10/24/2018
[ 3088.350650] task: ffff880398564000 task.stack: ffffc9000b2f8000
[ 3088.350671] RIP: 0010:create_reloc_root+0x1dc/0x1f0 [btrfs]
[ 3088.350673] RSP: 0018:ffffc9000b2fb858 EFLAGS: 00010282
[ 3088.350675] RAX: 00000000ffffffef RBX: ffff880409763000 RCX: 0000000000000001
[ 3088.350677] RDX: 0000000000000027 RSI: ffff8803a90dc380 RDI: 0000000000000286
[ 3088.350678] RBP: ffff8803a07209d8 R08: ffff8803a13afa80 R09: 0000000000000000
[ 3088.350679] R10: 0000000000000002 R11: ffff8803a90dc380 R12: fffffffffffffff7
[ 3088.350681] R13: ffff880370b5c000 R14: ffff8803689f0800 R15: 00000000000c4000
[ 3088.350683] FS: 00007f1d8bfb98c0(0000) GS:ffff88041ea80000(0000) knlGS:0000000000000000
[ 3088.350684] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3088.350685] CR2: 00007f33879f7000 CR3: 00000003ab59e005 CR4: 00000000001606e0
[ 3088.350687] Call Trace:
[ 3088.350700] btrfs_init_reloc_root+0x5b/0xa0 [btrfs]
[ 3088.350710] record_root_in_trans+0xb7/0xf0 [btrfs]
[ 3088.350720] btrfs_record_root_in_trans+0x4e/0x60 [btrfs]
[ 3088.350729] start_transaction+0xa6/0x410 [btrfs]
[ 3088.350738] __btrfs_prealloc_file_range+0xbb/0x460 [btrfs]
[ 3088.350748] btrfs_prealloc_file_range+0x10/0x20 [btrfs]
[ 3088.350759] prealloc_file_extent_cluster+0x113/0x200 [btrfs]
[ 3088.350769] relocate_file_extent_cluster+0x8d/0x470 [btrfs]
[ 3088.350778] ? __btrfs_end_transaction+0x1c1/0x2e0 [btrfs]
[ 3088.350787] relocate_data_extent+0x78/0xc0 [btrfs]
[ 3088.350796] relocate_block_group+0x495/0x6f0 [btrfs]
[ 3088.350805] btrfs_relocate_block_group+0x188/0x230 [btrfs]
[ 3088.350815] btrfs_relocate_chunk+0x4a/0xf0 [btrfs]
[ 3088.350825] __btrfs_balance+0x8d3/0xbe0 [btrfs]
[ 3088.350835] btrfs_balance+0x2de/0x5c0 [btrfs]
[ 3088.350844] btrfs_ioctl_balance+0x310/0x370 [btrfs]
[ 3088.350848] ? __switch_to_asm+0x40/0x70
[ 3088.350857] btrfs_ioctl+0xba5/0x1e60 [btrfs]
[ 3088.350860] ? __switch_to_asm+0x40/0x70
[ 3088.350861] ? __switch_to_asm+0x34/0x70
[ 3088.350863] ? __switch_to_asm+0x40/0x70
[ 3088.350865] ? __switch_to_asm+0x34/0x70
[ 3088.350867] ? __switch_to_asm+0x40/0x70
[ 3088.350875] ? __switch_to_asm+0x34/0x70
[ 3088.350882] ? __switch_to_asm+0x40/0x70
[ 3088.350890] ? __switch_to_asm+0x34/0x70
[ 3088.350897] ? __switch_to_asm+0x40/0x70
[ 3088.350904] ? __switch_to_asm+0x34/0x70
[ 3088.350912] ? __switch_to_asm+0x40/0x70
[ 3088.350919] ? __switch_to_asm+0x34/0x70
[ 3088.350926] ? __switch_to_asm+0x40/0x70
[ 3088.350934] ? __switch_to_asm+0x34/0x70
[ 3088.350942] ? do_vfs_ioctl+0x90/0x5f0
[ 3088.350956] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[ 3088.350963] do_vfs_ioctl+0x90/0x5f0
[ 3088.350970] ? __schedule+0x247/0x860
[ 3088.350978] SyS_ioctl+0x74/0x80
[ 3088.350986] do_syscall_64+0x7b/0x150
[ 3088.350994] entry_SYSCALL_64_after_hwframe+0x3d/0xa2...

Read more...

Revision history for this message
In , office (office-linux-kernel-bugs) wrote :

I got exactly the same dump while the boot process and then the VM stopped working.

Here is the guide how I could fix it after hours:

https://forums.suse.com/showthread.php?13662-Server-crashes-with-a-long-BTRFS-error-list&p=57729#post57729

Revision history for this message
In , dantheelder (dantheelder-linux-kernel-bugs) wrote :
Download full text (6.5 KiB)

I just ran into this on a btrfs volume on OpenSUSE 15.1 (4.12.14-lp151.28.4). I was unable to ever successfully balance it but mounting with skip_balance at least let me mount it and back everything up. Here's the gory details:

[ 352.337489] BTRFS info (device dm-4): use lzo compression
[ 352.338112] BTRFS info (device dm-4): disk space caching is enabled
[ 352.338649] BTRFS info (device dm-4): has skinny extents
[ 352.348289] BTRFS info (device dm-4): bdev /dev/mapper/slow-Slow errs: wr 0, rd 0, flush 0, corrupt 187, gen 0
[ 352.855165] BTRFS info (device dm-4): detected SSD devices, enabling SSD mode
[ 352.906205] BTRFS info (device dm-4): checking UUID tree
[ 352.906324] BTRFS info (device dm-4): continuing balance
[ 352.940505] BTRFS info (device dm-4): relocating block group 2355407880192 flags data
[ 352.967918] ------------[ cut here ]------------
[ 352.969550] kernel BUG at ../fs/btrfs/relocation.c:1449!
[ 352.971205] invalid opcode: 0000 [#1] SMP PTI
[ 352.972840] CPU: 2 PID: 4951 Comm: btrfs-balance Tainted: P O 4.12.14-lp151.28.4-default #1 openSUSE Leap 15.1
[ 352.974552] Hardware name: Dell Inc. Precision 7510/0M91XC, BIOS 1.16.3 09/12/2018
[ 352.976153] task: ffff880fcb094080 task.stack: ffffc9000a7bc000
[ 352.977429] RIP: 0010:create_reloc_root+0x1dc/0x1f0 [btrfs]
[ 352.978675] RSP: 0018:ffffc9000a7bf9d0 EFLAGS: 00010282
[ 352.979938] RAX: 00000000ffffffef RBX: ffff880ff06daa00 RCX: ffffea0040a6795f
[ 352.981133] RDX: 0000000000000013 RSI: ffff880dfa0d8f50 RDI: 0000000000000286
[ 352.982124] RBP: ffff880faac771e0 R08: ffff880e0415cea0 R09: 0000000000000000
[ 352.983115] R10: 0000000000000028 R11: ffff880dfa0d8f50 R12: fffffffffffffff7
[ 352.984098] R13: ffff880feb532000 R14: ffff880fefbdb800 R15: 00000000000c4000
[ 352.985072] FS: 0000000000000000(0000) GS:ffff881082500000(0000) knlGS:0000000000000000
[ 352.986006] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 352.986782] CR2: 000055f50cacc470 CR3: 000000000200a004 CR4: 00000000003606e0
[ 352.987559] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 352.988311] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 352.989043] Call Trace:
[ 352.989763] btrfs_init_reloc_root+0x5b/0xa0 [btrfs]
[ 352.990473] record_root_in_trans+0xb4/0xe0 [btrfs]
[ 352.991132] btrfs_record_root_in_trans+0x4e/0x60 [btrfs]
[ 352.991689] start_transaction+0xa6/0x410 [btrfs]
[ 352.992259] __btrfs_prealloc_file_range+0xbb/0x460 [btrfs]
[ 352.992839] btrfs_prealloc_file_range+0x10/0x20 [btrfs]
[ 352.993430] prealloc_file_extent_cluster+0x113/0x200 [btrfs]
[ 352.994032] relocate_file_extent_cluster+0x8d/0x470 [btrfs]
[ 352.994646] ? __btrfs_end_transaction+0x1c1/0x2e0 [btrfs]
[ 352.995270] relocate_data_extent+0x78/0xc0 [btrfs]
[ 352.995895] relocate_block_group+0x495/0x6f0 [btrfs]
[ 352.996449] btrfs_relocate_block_group+0x18e/0x280 [btrfs]
[ 352.997014] btrfs_relocate_chunk+0x4a/0xf0 [btrfs]
[ 352.997591] __btrfs_balance+0x8b7/0xbd0 [btrfs]
[ 352.998174] btrfs_balance+0x2de/0x5c0 [btrfs]
[ 352.998758] ? btrfs_balance+0x5c0/0x5c0 [btrfs]
[ 352.999355] balance_kthread+0x56/0x80 [btrfs]
[ 3...

Read more...

Revision history for this message
In , asbeer (asbeer-linux-kernel-bugs) wrote :

I've experienced this exact same error at relocation.c:1449 running "4.12.14-lp150.12.58-default #1 openSUSE Leap 15.0". The call to btrfs_insert_root() within create_reloc_root() returns an error code which causes the subsequent BUG_ON() assertion to fail.

I'm hopeful that the following commit will fix this issue.

https://lkml.org/lkml/2019/6/7/720
https://github.com/torvalds/linux/commit/30d40577e322b670551ad7e2faa9570b6e23eb2b

Revision history for this message
In , erico.mendonca (erico.mendonca-linux-kernel-bugs) wrote :

Also experienced the same problem on kernel 4.12.14-lp151.28.4-default, openSUSE Leap 15.1.

These two requests should integrate the commit above and fix it:
https://build.opensuse.org/request/show/710395
https://build.opensuse.org/request/show/710403

The workaround for now, is to mount with "skip_balance" option.

Revision history for this message
In , office (office-linux-kernel-bugs) wrote :
Download full text (4.5 KiB)

Few moments ago, SLES 12.3 all updates installed.

[528112.786104] BUG: unable to handle kernel paging request at ffff880273fa9b40
[528112.786401] IP: btrfs_init_reloc_root+0x2b/0xa0 [btrfs]
[528112.786584] PGD 200c067 P4D 200c067 PUD 2fed1c067 PMD 2feb7c067 PTE 8010000273fa9065
[528112.786842] Oops: 0003 [#1] SMP NOPTI
[528112.786948] CPU: 4 PID: 574 Comm: systemd-journal Not tainted 4.12.14-95.24-default #1 SLE12-SP4
[528112.787181] task: ffff8802ed710bc0 task.stack: ffffc90042bc8000
[528112.787362] RIP: e030:btrfs_init_reloc_root+0x2b/0xa0 [btrfs]
[528112.787575] RSP: e02b:ffffc90042bcbbe8 EFLAGS: 00010286
[528112.787722] RAX: ffff880273fa9800 RBX: ffff8802ea8eb800 RCX: 0000000000000000
[528112.787917] RDX: 0000000000118121 RSI: ffff8802ea8eb800 RDI: ffff88027f81e2d0
[528112.788114] RBP: ffff88027f81e2d0 R08: 0000000000000001 R09: ffff88027f81e2d0
[528112.788355] R10: 00000000000c0000 R11: ffff8802ec1b20c8 R12: ffff8802e9ba0000
[528112.788626] R13: ffff8802e9ba0078 R14: 0000000000000000 R15: ffff8802ed710bc0
[528112.788878] FS: 00007f4611f4c840(0000) GS:ffff8802f0900000(0000) knlGS:0000000000000000
[528112.789138] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[528112.789314] CR2: ffff880273fa9b40 CR3: 0000000185a42000 CR4: 0000000000040660
[528112.789604] Call Trace:
[528112.789698] record_root_in_trans+0xa9/0xf0 [btrfs]
[528112.789869] btrfs_record_root_in_trans+0x4a/0x70 [btrfs]
[528112.790073] start_transaction+0xab/0x440 [btrfs]
[528112.790217] btrfs_dirty_inode+0x49/0xe0 [btrfs]
[528112.790379] file_update_time+0xa6/0xf0
[528112.790558] btrfs_page_mkwrite+0x129/0x490 [btrfs]
[528112.790712] do_page_mkwrite+0x31/0x70
[528112.790857] do_wp_page+0x43f/0x570
[528112.790968] __handle_mm_fault+0x793/0xef0
[528112.791085] handle_mm_fault+0xc4/0x1d0
[528112.791197] __do_page_fault+0x1f3/0x4c0
[528112.791310] do_page_fault+0x2b/0x70
[528112.791445] ? do_syscall_64+0x9a/0x160
[528112.791560] ? page_fault+0x2f/0x50
[528112.791685] page_fault+0x45/0x50
[528112.791782] RIP: 2ee0:0x7fffd27e03d8
[528112.791883] RSP: cd92ee0:0000000000000001 EFLAGS: 55ce0cd92ee0
[528112.791886] Code: 0f 1f 44 00 00 41 55 41 54 55 48 89 fd 53 48 8b 86 f0 01 00 00 48 89 f3 48 8b 88 a0 cb 00 00 48 8b 46 18 48 85 c0 74 13 48 8b 17 <48> 89 90 40 03 00 00 5b 5d 41 5c 31 c0 41 5d c3 48 85 c9 74 f2
[528112.792632] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache af_packet iscsi_ibft iscsi_boot_sysfs xenfs xen_privcmd intel_rapl sb_edac x86_pkg_temp_thermal coretemp crc32_pclmul ghash_clmulni_intel pcbc xen_netfront aesni_intel aes_x86_64 crypto_simd glue_helper cryptd pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc btrfs xor raid6_pq xen_blkfront crc32c_intel sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
[528112.793813] Supported: Yes
[528112.793895] CR2: ffff880273fa9b40
[528112.793992] ---[ end trace d4d5746ac351a11a ]---
[528112.794139] RIP: e030:btrfs_init_reloc_root+0x2b/0xa0 [btrfs]
[528112.794296] RSP: e02b:ffffc90042bcbbe8 EFLAGS: 00010286
[528112.794508] RAX: ffff880273fa9800 RBX: ffff8802ea8eb800 RCX: 0000000000000000
[528112.794722] RDX: 0000000000118121 RSI: ffff8802ea8eb800 RDI: ffff8...

Read more...

Revision history for this message
In , office (office-linux-kernel-bugs) wrote :

I consider it as a very serious bug. My problem was indeed that the drive ran out of space.
The only one solution [but only for few weeks/months] would be I run every day the balance.

I added additional space to the unit, extended correctly the size and now since more than 4 weeks there is no problem.

Why it is a bug:
The system cannot crash with above error messages and stop working. It has to bring the message that the drive is out of space, or it needs a balance or what ever.

I am now changing all btrfs drives to ext4. btrfs is apparently not stable enough and a toy for kindergarten environments but not for servers which are used for a serious work. At least SUSE is not capable to fix this.

They refer only to their paid support but this is ridiculous: Having very serious bugs and then ask for money? Good work guys!

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1890951

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Johannes Rohr (jorohr) wrote :

Unfortunately, producing the logs apart from the backtrace above isn't really possible. As soon as I reproduce the bug, the FS becomes unwritable, so that no logs can be obtained. That's a catch22

Changed in linux (Ubuntu):
status: Incomplete → New
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1890951

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
In , jorohr (jorohr-linux-kernel-bugs) wrote :
Download full text (3.9 KiB)

I see the same issues on Ubuntu 20.04 with kernel 5.4.0-42. also with a debian buster rescue system from which I booted for recovery. Unfortunately, I didn't note down the kernel version. Here is the backtrace from dmesg:

[Sun Aug 9 12:21:35 2020] ------------[ cut here ]------------
[Sun Aug 9 12:21:35 2020] kernel BUG at fs/btrfs/relocation.c:2626!
[Sun Aug 9 12:21:35 2020] invalid opcode: 0000 [#1] SMP PTI
[Sun Aug 9 12:21:35 2020] CPU: 1 PID: 4537 Comm: btrfs-balance Tainted: G O 5.4.47 #1
[Sun Aug 9 12:21:35 2020] Hardware name: FUJITSU D3401-H1/D3401-H1, BIOS V5.0.0.11 R1.14.0 for D3401-H1x 06/09/2016
[Sun Aug 9 12:21:35 2020] RIP: 0010:select_reloc_root+0x5b/0x19f [btrfs]
[Sun Aug 9 12:21:35 2020] Code: c0 c7 44 24 04 00 00 00 00 e8 8b 9d 17 e1 48 89 df 4c 89 f6 48 8d 54 24 04 e8 9c e6 ff ff 48 8b 58 60 48 89 c5 48 85 db 75 02 <0f> 0b 48 8b 43 20 a8 02 75 02 0f 0b 48 83 bb df 01 00 00 f8 75 45
[Sun Aug 9 12:21:35 2020] RSP: 0018:ffff8887e0b0bb20 EFLAGS: 00010246
[Sun Aug 9 12:21:35 2020] RAX: ffff8887dfab5280 RBX: 0000000000000000 RCX: 0000000000000000
[Sun Aug 9 12:21:35 2020] RDX: ffff8887e0b0bb24 RSI: ffff8887e0b0bc10 RDI: ffff8887dfab52c0
[Sun Aug 9 12:21:35 2020] RBP: ffff8887dfab5280 R08: ffff8887dfab52c0 R09: ffffffffa0491e7e
[Sun Aug 9 12:21:35 2020] R10: ffff8887f4ba7e70 R11: ffff8888090ed158 R12: ffff8887dfab5280
[Sun Aug 9 12:21:35 2020] R13: ffff8887fd330800 R14: ffff8887e0b0bc10 R15: ffff8887e7fa66e8
[Sun Aug 9 12:21:35 2020] FS: 0000000000000000(0000) GS:ffff88880e240000(0000) knlGS:0000000000000000
[Sun Aug 9 12:21:35 2020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sun Aug 9 12:21:35 2020] CR2: 000055b4d5b7cfe0 CR3: 000000000200a004 CR4: 00000000003606e0
[Sun Aug 9 12:21:35 2020] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Sun Aug 9 12:21:35 2020] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Sun Aug 9 12:21:35 2020] Call Trace:
[Sun Aug 9 12:21:35 2020] do_relocation+0xb6/0x4c2 [btrfs]
[Sun Aug 9 12:21:35 2020] ? calcu_metadata_size.isra.36.constprop.42+0x9e/0xc4 [btrfs]
[Sun Aug 9 12:21:35 2020] ? do_raw_spin_lock+0x2f/0x5a
[Sun Aug 9 12:21:35 2020] ? btrfs_block_rsv_refill+0x4b/0x8b [btrfs]
[Sun Aug 9 12:21:35 2020] relocate_tree_blocks+0x301/0x427 [btrfs]
[Sun Aug 9 12:21:35 2020] ? tree_insert+0x49/0x4e [btrfs]
[Sun Aug 9 12:21:35 2020] ? add_tree_block.isra.38+0x11e/0x144 [btrfs]
[Sun Aug 9 12:21:35 2020] relocate_block_group+0x279/0x49e [btrfs]
[Sun Aug 9 12:21:35 2020] btrfs_relocate_block_group+0x15e/0x23d [btrfs]
[Sun Aug 9 12:21:35 2020] btrfs_relocate_chunk+0x25/0x8c [btrfs]
[Sun Aug 9 12:21:35 2020] btrfs_balance+0xaf0/0xd45 [btrfs]
[Sun Aug 9 12:21:35 2020] ? btrfs_balance+0xd45/0xd45 [btrfs]
[Sun Aug 9 12:21:35 2020] balance_kthread+0x32/0x46 [btrfs]
[Sun Aug 9 12:21:35 2020] kthread+0xf5/0xfa
[Sun Aug 9 12:21:35 2020] ? kthread_associate_blkcg+0x86/0x86
[Sun Aug 9 12:21:35 2020] ret_from_fork+0x3a/0x50
[Sun Aug 9 12:21:35 2020] Modules linked in: btrfs xor zstd_decompress zstd_compress lzo_compress lzo_decompress zlib_deflate raid6_pq libcrc32c sd_mod ipmi_devintf i...

Read more...

Revision history for this message
Johannes Rohr (jorohr) wrote : CRDA.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → New
tags: added: apport-collected focal
description: updated
Revision history for this message
Johannes Rohr (jorohr) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Johannes Rohr (jorohr) wrote : Lspci.txt

apport information

Revision history for this message
Johannes Rohr (jorohr) wrote : Lspci-vt.txt

apport information

Revision history for this message
Johannes Rohr (jorohr) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Johannes Rohr (jorohr) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Johannes Rohr (jorohr) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Johannes Rohr (jorohr) wrote : ProcModules.txt

apport information

Revision history for this message
Johannes Rohr (jorohr) wrote : UdevDb.txt

apport information

Revision history for this message
Johannes Rohr (jorohr) wrote : WifiSyslog.txt

apport information

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
In , jorohr (jorohr-linux-kernel-bugs) wrote :

Here is my bug report in the Ubuntu launchpad

Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
Johannes Rohr (jorohr) wrote :

Apparently this is (probably) fixed in kernels from 5.4.54 https://<email address hidden>/T/#r52f03985dd7982c8f92c8a65089583f01c62020b so I hope it becomes available in Ubuntu soon.

Revision history for this message
Johannes Rohr (jorohr) wrote :

apparently fixed in kernel 5.4.56 https://<email address hidden>/T/#r52f03985dd7982c8f92c8a65089583f01c62020b
Will it become available for Ubuntu?

Johannes Rohr (jorohr)
no longer affects: linux
Revision history for this message
Lukas Tribus (luky-37) wrote :
Revision history for this message
Johannes Rohr (jorohr) wrote :

today's kernel update seems to have fixed it, as expected.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.