Ubuntu
linux package

FS access deadlock with btrfs quotas enabled

Bionic (18.04)
Bug #1765998

Bug #1765998 reported by Michael Sparmann on 2018-04-21

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Triaged	High	Unassigned
	Bionic	Triaged	High	Unassigned

Bug Description

I'm running into an issue on Ubuntu Bionic (but not Xenial) where shortly after boot, under heavy load from many LXD containers starting at once, access to the btrfs filesystem that the containers are on deadlocks.

The issue is quite hard to reproduce on other systems, quite likely related to the size of the filesystem involved (4 devices with a total of 8TB, millions of files, ~20 subvolumes with tens of snapshots each) and the access pattern from many LXD containers at once. It definitely goes away when disabling btrfs quotas though. Another prerequisite to trigger this bug may be the container subvolumes sharing extents (from their parent image or due to deduplication).

I can only reliably reproduce it on a production system that I can only do very limited testing on, however I have been able to gather the following information:
- Many threads are stuck, trying to aquire locks on various tree roots, which are never released by their current holders.
- There always seem to be (at least) two threads executing rmdir syscalls which are creating the circular dependency: One of them is in btrfs_cow_block => ... => btrfs_qgroup_trace_extent_post => ... => find_parent_nodes and wants to acquire a lock that was already aquired by btrfs_search_slot of the other rmdir.
- Reverting this patch seems to prevent it from happening: https://patchwork.kernel.org/patch/9573267/

Tags:

Revision history for this message

Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote on 2018-04-21: Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1765998

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Michael Sparmann (theseven) wrote on 2018-04-21:

I cannot run the affected (production) system using a broken kernel, and it will lockup after boot within seconds.
If necessary, I can provide additional information or testing upon request.

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Joseph Salisbury (jsalisbury) on 2018-04-23

Changed in linux (Ubuntu):
importance:	Undecided → High
tags:	added: bionic kernel-da-key

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2018-04-23:

I tried to build a Bionic test kernel with commit fb235dc reverted, but it does not revert cleanly and requires a back port. Do you have the back port you performed and can post to the bug?

Also, would it be possible for you to test the latest mainline kernel to see if the bug is already fixed upstream? It is available from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17-rc2

Changed in linux (Ubuntu Bionic):
status:	Confirmed → Incomplete

Revision history for this message

Michael Sparmann (theseven) wrote on 2018-04-23:

0002-qgroup_accounting_fix.patch Edit (6.9 KiB, text/plain)

This patch seems to fix it for me (running that for several days now).

Revision history for this message

Michael Sparmann (theseven) wrote on 2018-04-23:

I can confirm that the issue still exists in the mainline kernel build linked above.

Ubuntu Foundations Team Bug Bot (crichton) on 2018-04-23

tags:

added: patch

Michael Sparmann (theseven) on 2018-04-23

Changed in linux (Ubuntu Bionic):
status:	Incomplete → Confirmed
tags:	added: kernel-bug-exists-upstream

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2018-04-24:

This issue appears to be an upstream bug, since the bug still exists in mainline without a revert. Would it be possible for you to open an upstream bug report[0] and maybe ping the patch author?
That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

Once this bug is reported upstream, please add the tag: 'kernel-bug-reported-upstream'.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu Bionic):
status:	Confirmed → Triaged

Revision history for this message

Stéphane Lesimple (speed47) wrote on 2019-02-26:

I confirm I'm having this problem too since migrating to Bionic from Xenial.

My setup is a bit analogous to the original bug reporter: I have 5 drives (22 TB) in RAID1, organized in around 10 subvolumes with up to 20 snapshots per subvolume.

After some hours running normally, at some point [btrfs-transaction] goes into D state, and everything btrfs-related slowly comes down to a stall, with any program trying to touch it ending up in D state too.

The call trace I have also references btrfs_qgroup_trace_extent_post.

I'm currently testing 5.0-rc8 from the Ubuntu ppa mainline, to see if the problem is still there.

Michael, did you end up reporting the problem upstream? I would be keen do to it on the btrfs mailing-list, as soon as I have the answer to whether this is fixed with 5.0 or not.

Revision history for this message

Michael Sparmann (theseven) wrote on 2019-03-02:

Hm, it's been a while...
I think back then I made some btrfs developers aware of it on IRC, but never got around to sending it to the mailing list.
I'm running my own kernel builds for now (I had to do that to fix some other issues anyway) with the patch from comment #4 applied, which seems to reliably fix this issue.

I am very occasionally getting parent transid verify errors on the quota tree though, which I believe must be originating from another bug added at some point after I posted that patch here, because initially I didn't have any of those for several months. It seems that those can be cleaned up by temporarily disabling and re-enabling quota, so they are no big deal to me right now, despite causing some annoying downtime occasionally.

Revision history for this message

Stéphane Lesimple (speed47) wrote on 2019-03-03:

I see, thanks for the info.

I'll report my findings so far here, in case it turns out useful to some future person landing on this bug later:

The call trace of the deadlocked btrfs-cleaner kthread is as follows.

Tainted: P OE 4.15.0-45-generic #48-Ubuntu
btrfs-cleaner D 0 7969 2 0x80000000
Call Trace:
__schedule+0x291/0x8a0
schedule+0x2c/0x80
btrfs_tree_read_lock+0xcc/0x120 [btrfs]
? wait_woken+0x80/0x80
find_parent_nodes+0x295/0xe90 [btrfs]
? _cond_resched+0x19/0x40
btrfs_find_all_roots_safe+0xb0/0x120 [btrfs]
? btrfs_find_all_roots_safe+0xb0/0x120 [btrfs]
btrfs_find_all_roots+0x61/0x80 [btrfs]
btrfs_qgroup_trace_extent_post+0x37/0x60 [btrfs]
[...]

I'm not including the bottom of the call trace because it varies: the common part does start from btrfs_qgroup_trace_extent_post and up however. The caller of btrfs_qgroup_trace_extent_pos can be either
- btrfs_qgroup_trace_extent+0xee/0x110 [btrfs], or
- btrfs_add_delayed_tree_ref+0x1c6/0x1f0 [btrfs], or
- btrfs_add_delayed_data_ref+0x30a/0x340 [btrfs]

This happens on 4.15 (Ubuntu flavor), on 4.18 (Ubuntu flavor "HWE"), 4.20.13 (vanilla).

On 4.20.0 (vanilla) and 5.0-rc8 (vanilla), there is also a deadlock under similar conditions, but the call trace of the deadlocked btrfs-transaction kthread looks different:

Tainted: P OE 4.20.0-042000-generic #201812232030
btrfs-transacti D 0 8665 2 0x80000000
Call Trace:
__schedule+0x29e/0x840
? btrfs_free_path+0x13/0x20 [btrfs]
schedule+0x2c/0x80
btrfs_commit_transaction+0x715/0x840 [btrfs]
? wait_woken+0x80/0x80
transaction_kthread+0x15c/0x190 [btrfs]
kthread+0x120/0x140
? btrfs_cleanup_transaction+0x560/0x560 [btrfs]
? __kthread_parkme+0x70/0x70
ret_from_fork+0x35/0x40

Other userspace threads are locked at the same time.

So we seem to be dealing with at least 2 different deadlock cases which seem to happen with lots of subvolumes and/or snapshots, and quota enabled. All of this disappears with quota disabled.

For the record, the main btrfs qgroups dev seems to have a lot of pending changes / fixes coming around this, expected for 5.1 or 5.2. Stay tuned...

I have disabled quota for now. I only enable it for a short period of time when I need to get size information about my subvols and snapshots.

I see, thanks for the info.

I'll report my findings so far here, in case it turns out useful to some future person landing on this bug later:

The call trace of the deadlocked btrfs-cleaner kthread is as follows.

Tainted: P           OE    4.15.0-45-generic #48-Ubuntu
btrfs-cleaner   D    0  7969      2 0x80000000
Call Trace:
 __schedule+0x291/0x8a0
 schedule+0x2c/0x80
 btrfs_tree_read_lock+0xcc/0x120 [btrfs]
 ? wait_woken+0x80/0x80
 find_parent_nodes+0x295/0xe90 [btrfs]
 ? _cond_resched+0x19/0x40
 btrfs_find_all_roots_safe+0xb0/0x120 [btrfs]
 ? btrfs_find_all_roots_safe+0xb0/0x120 [btrfs]
 btrfs_find_all_roots+0x61/0x80 [btrfs]
 btrfs_qgroup_trace_extent_post+0x37/0x60 [btrfs]
[...]

This happens on 4.15 (Ubuntu flavor), on 4.18 (Ubuntu flavor "HWE"), 4.20.13 (vanilla).

On 4.20.0 (vanilla) and 5.0-rc8 (vanilla), there is also a deadlock under similar conditions, but the call trace of the deadlocked btrfs-transaction kthread looks different:

Tainted: P           OE     4.20.0-042000-generic #201812232030
btrfs-transacti D    0  8665      2 0x80000000
Call Trace:
 __schedule+0x29e/0x840
 ? btrfs_free_path+0x13/0x20 [btrfs]
 schedule+0x2c/0x80
 btrfs_commit_transaction+0x715/0x840 [btrfs]
 ? wait_woken+0x80/0x80
 transaction_kthread+0x15c/0x190 [btrfs]
 kthread+0x120/0x140
 ? btrfs_cleanup_transaction+0x560/0x560 [btrfs]
 ? __kthread_parkme+0x70/0x70
 ret_from_fork+0x35/0x40

Other userspace threads are locked at the same time.

So we seem to be dealing with at least 2 different deadlock cases which seem to happen with lots of subvolumes and/or snapshots, and quota enabled. All of this disappears with quota disabled.

For the record, the main btrfs qgroups dev seems to have a lot of pending changes / fixes coming around this, expected for 5.1 or 5.2. Stay tuned...

I have disabled quota for now. I only enable it for a short period of time when I need to get size information about my subvols and snapshots.

Brad Figg (brad-figg) on 2019-07-24

tags:

added: cscc

Revision history for this message

TomaszChmielewski (mangoo-wpkg) wrote on 2020-03-16:

#10

See also: https://github.com/lxc/lxd/issues/7029

Revision history for this message

Dmitrii Shcherbakov (dmitriis) wrote on 2020-03-19:

#11

Recently hit that on Focal due to the fact that quotas were enabled by LXD.

This renders the system completely unresponsive if you use btrfs as a rootfs with btrfs-transaction or btrfs-cleaner kernel threads hogging a 100% of 1 CPU core.

Booting using a live USB, disabling quotas and temporarily moving /etc/systemd/system/snap-lxd-* files out helps (followed by `snap disable lxd` until it gets updated with https://github.com/lxc/lxd/pull/7032).