btrfs: Attempting to balance a nearly full filesystem with relocated root nodes fails

Bug #1933172 reported by Matthew Ruffell
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Medium
Matthew Ruffell

Bug Description

BugLink: https://bugs.launchpad.net/bugs/1933172

[Impact]

If you attempt to balance a btrfs filesystem that is nearly full, and this filesystem has had a lot of small, medium and large files created and deleted, such that the b-tree needs to be rotated, when the balance fails due to not having enough free space, the kernel oops, and the btrfs filesystem hangs.

It doesn't appear to cause any filesystem corruption, and is reproducible every time on affected filesystems.

The following oops is generated:

general protection fault: 0000 [#1] SMP PTI
CPU: 0 PID: 18440 Comm: btrfs Not tainted 4.15.0-136-generic #140-Ubuntu
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
RIP: 0010:btrfs_set_root_node+0x5/0x60 [btrfs]
RSP: 0018:ffffb3db890a79e0 EFLAGS: 00010282
RAX: ffff8d7f73861ad0 RBX: ffff8d7f78455708 RCX: ffff8d7f6d9a5390
RDX: ffff8d7f73861ad0 RSI: a023775cfc0348a3 RDI: ffff8d7f6d9a5028
RBP: ffffb3db890a7a78 R08: 0000000000000044 R09: 0000000000000228
R10: ffff8d7f6d9a5000 R11: 0000000000000010 R12: ffffb3db890a7a08
R13: ffff8d7f6d9a5000 R14: ffff8d7f6d9a5028 R15: ffff8d7f74560000
FS: 00007f48d84498c0(0000) GS:ffff8d7f7fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe4fbc1f000 CR3: 00000001799fc001 CR4: 0000000000160ef0
Call Trace:
 ? commit_fs_roots+0x130/0x1b0 [btrfs]
 ? btrfs_run_delayed_refs.part.70+0x80/0x190 [btrfs]
 btrfs_commit_transaction+0x42c/0x910 [btrfs]
 ? start_transaction+0x191/0x430 [btrfs]
 relocate_block_group+0x1e7/0x640 [btrfs]
 btrfs_relocate_block_group+0x18f/0x280 [btrfs]
 btrfs_relocate_chunk+0x38/0xd0 [btrfs]
 __btrfs_balance+0x972/0xcd0 [btrfs]
 ? insert_balance_item.isra.35+0x391/0x3c0 [btrfs]
 btrfs_balance+0x32c/0x5a0 [btrfs]
 btrfs_ioctl_balance+0x320/0x390 [btrfs]
 btrfs_ioctl+0x5a6/0x2490 [btrfs]
 ? lru_cache_add_active_or_unevictable+0x36/0xb0
 ? __handle_mm_fault+0x9fd/0x1290
 do_vfs_ioctl+0xa8/0x630
 ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
 ? do_vfs_ioctl+0xa8/0x630
 ? __do_page_fault+0x2a1/0x4b0
 SyS_ioctl+0x79/0x90
 do_syscall_64+0x73/0x130
 entry_SYSCALL_64_after_hwframe+0x41/0xa6
RIP: 0033:0x7f48d7228317
RSP: 002b:00007ffd76d03e38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f48d7228317
RDX: 00007ffd76d03ec8 RSI: 00000000c4009420 RDI: 0000000000000003
RBP: 00007ffd76d03ec8 R08: 0000000000000078 R09: 0000000000000000
R10: 0000562086e7f010 R11: 0000000000000246 R12: 0000000000000003
R13: 00007ffd76d057cb R14: 0000000000000002 R15: 0000000000000000
Code: 4d 85 e4 0f 84 56 fe ff ff 4d 89 04 24 41 c6 44 24 08 84 4d 89 4c 24 09 e9 42 fe ff ff 0f 0b e8 02 24 5e e0 66 90 0f 1f 44 00 00 <48> 8b 06 48 8b 0d c9 d4 99 e1 48 8b 15 d2 d4 99 e1 55 48 89 87
RIP: btrfs_set_root_node+0x5/0x60 [btrfs] RSP: ffffb3db890a79e0

I don't see this behaviour on any upstream kernel, and the first kernel to show this behaviour is 4.15.0-109-generic. The current 4.15.0-145-generic is still affected.

I believe that this is a regression introduced in the fixing of CVE-2019-19036.

[Testcase]

I haven't reliably been able to create a script which places a btrfs filesystem into the state necessary to reproduce this issue, so I have just provided my qcow2 image with my btrfs filesystem which reproduces the issue 100% of the time.

Download the image from here (warning size is 8.0gb):

https://people.canonical.com/~mruffell/sf311164/ubuntu18.04-server-2.qcow2

Make a Ubuntu 18.04 VM. Attach the ubuntu18.04-server-2.qcow2 image to a new virtio disk. Note, ubuntu18.04-server-2.qcow2 does not have an operating system, it is just a data only volume.

Mount the volume:

$ sudo mount /dev/vdb /mnt

Attempt to balance:

$ sudo btrfs filesystem balance start --full-balance /mnt
Segmentation fault (core dumped)

Check dmesg for kernel oops:
https://paste.ubuntu.com/p/wjJNqKBCfh/

If you install the test kernel from the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/sf311164-test

You should see this instead:

$ sudo btrfs filesystem balance start --full-balance /mnt
ERROR: error during balancing '/mnt': No space left on device
There may be more info in syslog - try dmesg | tail

Checking dmesg shows no kernel oops, and just info about the volume being too full to balance:

https://paste.ubuntu.com/p/4J8Gq2dtz4/

[Fix]

I found the problem to be introduced in 4.15.0-109-generic, and 4.15.0-108-generic and earlier worked fine, which means we introduced a regression somewhere.

I bisected the problem down to the following commit:

ubuntu-bionic 6f536ce7a978531d38a21d092394616cefb54436
Author: Qu Wenruo <email address hidden>
Date: Tue May 19 10:13:20 2020 +0800
Subject btrfs: reloc: fix reloc root leak and NULL pointer dereference
Link: https://paste.ubuntu.com/p/4qfWCM8ykh/

Unfortunately, I believe this is a bad backport. If you examine the original upstream commit:

commit 51415b6c1b117e223bc083e30af675cb5c5498f3
Author: Qu Wenruo <email address hidden>
Date: Tue May 19 10:13:20 2020 +0800
Subject: btrfs: reloc: fix reloc root leak and NULL pointer dereference
Link: https://github.com/torvalds/linux/commit/51415b6c1b117e223bc083e30af675cb5c5498f3

You will see the 4.15 backport has calls to free_extent_buffer() and btrfs_put_fs_root(). Now, btrfs_put_fs_root() was renamed to btrfs_put_root() in the newer patches, and contains logic to free relocated roots, so I think we might not need the calls to free_extent_buffer() to free the extents first, since it might be handled later.

The core issue is that we hit a general protection fault when attempting to access a root node, which means we have freed a root node we shouldn't have.

If we look at the backport in 5.4.y, aka, the one in Focal:

ubuntu-focal ecaee3a76ea998bc2fe20f056eb27f9bc837d116
Author: Qu Wenruo <email address hidden>
Date: Tue May 19 10:13:20 2020 +0800
Subject: btrfs: reloc: fix reloc root leak and NULL pointer dereference
Link: https://paste.ubuntu.com/p/PZrMqVt8Yk/

It seems upstream -stable omitted the calls to btrfs_put_root() entirely, and we don't need the calls to free_extent_buffer() because of it.

If I revert 6f536ce7a978531d38a21d092394616cefb54436 from ubuntu-bionic, and cherry-pick ecaee3a76ea998bc2fe20f056eb27f9bc837d116 from ubuntu-focal, and build, the problem no longer reproduces.

[Where problems could occur]

If a regression were to occur, it would affect users of btrfs filesystems, and would likely show during a routine balance operation. Since the issue is triggered during the cancellation of a balance operation, problems might occur for users with nearly full filesystems or filesystems that have existing corruption.

We are replacing a patch that was backported during the fixing of CVE-2019-19036, and replacing it with a backport provided by upstream developers, which cherry picks from 5.4.y to Bionic. The patch in 5.4.y is well tested by the community and is currently in the Focal kernel.

With all modifications to btrfs, there is a risk of data corruption and filesystem corruption for all btrfs users, since balances happen automatically and on a regular basis. If a regression does happen, users should remount their filesystems with the "nobalance" flag, backup their data, and attempt a repair if necessary.

[Other info]

A community member has hit this issue before I did, and has reported it upstream to linux-btrfs here, although no one knew what was happening:

https://www.spinics.net/lists/linux-btrfs/msg103367.html

CVE References

Changed in linux (Ubuntu):
status: New → Fix Released
Changed in linux (Ubuntu Bionic):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Matthew Ruffell (mruffell)
description: updated
tags: added: bionic sts
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Performing verification for Bionic.

I installed 4.15.0-151-generic from -updates, and added the reproducer btrfs qcow2 image file to my VM.

From there, I mounted the filesystem and attempted to balance:

$ sudo mount /dev/vdb /mnt
$ sudo btrfs filesystem balance start --full-balance /mnt
Segmentation fault (core dumped)

Checking dmesg, we get the same oops as I reported:

https://paste.ubuntu.com/p/wWHCgzZxTZ/

I then enabled -proposed and installed 4.15.0-152-generic, and rebooted:

$ sudo mount /dev/vdb /mnt
$ sudo btrfs filesystem balance start --full-balance /mnt
ERROR: error during balancing '/mnt': No space left on device
$ dmesg | tail -n 7
[ 34.131066] BTRFS info (device vdb): disk space caching is enabled
[ 34.131070] BTRFS info (device vdb): has skinny extents
[ 34.149906] BTRFS info (device vdb): checking UUID tree
[ 34.149946] BTRFS info (device vdb): continuing balance
[ 34.227645] BTRFS info (device vdb): 2 enospc errors during balance
[ 40.009032] BTRFS info (device vdb): relocating block group 27995340800 flags data
[ 40.200573] BTRFS info (device vdb): 14 enospc errors during balance

We no longer suffer a kernel oops, and instead, we correctly report that the disk is too full and a balance cannot be completed.

After deleting some files and re-issuing balances, balancing completes successfully.

$ sudo btrfs filesystem df /mnt
Data, single: total=4.88GiB, used=4.51GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=256.00MiB, used=5.39MiB
GlobalReserve, single: total=16.00MiB, used=0.00B
$ sudo btrfs filesystem balance start --full-balance /mnt
Done, had to relocate 8 out of 8 chunks

The kernel in -proposed fixes the problem, happy to mark Bionic as verified.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (4.5 KiB)

This bug was fixed in the package linux - 4.15.0-154.161

---------------
linux (4.15.0-154.161) bionic; urgency=medium

  * bionic/linux: 4.15.0-154.161 -proposed tracker (LP: #1938411)

  * Potential reverts of 4.19.y stable changes in 18.04 (LP: #1938537)
    - SAUCE: Revert "locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to
      signal"
    - SAUCE: Revert "drm/amd/amdgpu: fix refcount leak"

  * Packaging resync (LP: #1786013)
    - [Packaging] resync getabis
    - [Packaging] update helper scripts
    - update dkms package versions

  * btrfs: Automatic balance returns -EUCLEAN and leads to forced readonly
    filesystem (LP: #1934709) // CVE-2019-19036
    - btrfs: Validate child tree block's level and first key
    - btrfs: Detect unbalanced tree with empty leaf before crashing btree
      operations

  * btrfs: Automatic balance returns -EUCLEAN and leads to forced readonly
    filesystem (LP: #1934709)
    - Revert "btrfs: Detect unbalanced tree with empty leaf before crashing btree
      operations"
    - Revert "btrfs: Validate child tree block's level and first key"
    - btrfs: Only check first key for committed tree blocks
    - btrfs: Fix wrong first_key parameter in replace_path

  * Enable fib-onlink-tests.sh and msg_zerocopy.sh in kselftests/net on Bionic
    (LP: #1934759)
    - selftests: Add fib-onlink-tests.sh to TEST_PROGS
    - selftests: net: use TEST_PROGS_EXTENDED
    - selftests/net: enable msg_zerocopy test
    - SAUCE: selftests: Make fib-onlink-tests.sh executable

  * Kernel oops due to uninitialized list on kernfs (kernfs_kill_sb)
    (LP: #1934175)
    - kernfs: deal with kernfs_fill_super() failures
    - unfuck sysfs_mount()

  * large_dir in ext4 broken (LP: #1933074)
    - SAUCE: ext4: fix directory index node split corruption

  * btrfs: Attempting to balance a nearly full filesystem with relocated root
    nodes fails (LP: #1933172) // CVE-2019-19036
    - btrfs: reloc: fix reloc root leak and NULL pointer dereference

  * btrfs: Attempting to balance a nearly full filesystem with relocated root
    nodes fails (LP: #1933172)
    - Revert "btrfs: reloc: fix reloc root leak and NULL pointer dereference"

  * Pixel format change broken for Elgato Cam Link 4K (LP: #1932367)
    - (upstream) media: uvcvideo: Fix pixel format change for Elgato Cam Link 4K

  * Bionic update: upstream stable patchset 2021-06-23 (LP: #1933375)
    - net: usb: cdc_ncm: don't spew notifications
    - efi: Allow EFI_MEMORY_XP and EFI_MEMORY_RO both to be cleared
    - efi: cper: fix snprintf() use in cper_dimm_err_location()
    - vfio/pci: Fix error return code in vfio_ecap_init()
    - vfio/pci: zap_vma_ptes() needs MMU
    - vfio/platform: fix module_put call in error flow
    - ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
    - HID: pidff: fix error return code in hid_pidff_init()
    - HID: i2c-hid: fix format string mismatch
    - netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches
    - ieee802154: fix error return code in ieee802154_add_iface()
    - ieee802154: fix error return code in ieee802154_llsec_getparams()
    - Bluetooth: fix the erroneous flush_work() order
    - Blu...

Read more...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.