I'm not including the bottom of the call trace because it varies: the common part does start from btrfs_qgroup_trace_extent_post and up however. The caller of btrfs_qgroup_trace_extent_pos can be either
- btrfs_qgroup_trace_extent+0xee/0x110 [btrfs], or
- btrfs_add_delayed_tree_ref+0x1c6/0x1f0 [btrfs], or
- btrfs_add_delayed_data_ref+0x30a/0x340 [btrfs]
This happens on 4.15 (Ubuntu flavor), on 4.18 (Ubuntu flavor "HWE"), 4.20.13 (vanilla).
On 4.20.0 (vanilla) and 5.0-rc8 (vanilla), there is also a deadlock under similar conditions, but the call trace of the deadlocked btrfs-transaction kthread looks different:
Other userspace threads are locked at the same time.
So we seem to be dealing with at least 2 different deadlock cases which seem to happen with lots of subvolumes and/or snapshots, and quota enabled. All of this disappears with quota disabled.
For the record, the main btrfs qgroups dev seems to have a lot of pending changes / fixes coming around this, expected for 5.1 or 5.2. Stay tuned...
I have disabled quota for now. I only enable it for a short period of time when I need to get size information about my subvols and snapshots.
I see, thanks for the info.
I'll report my findings so far here, in case it turns out useful to some future person landing on this bug later:
The call trace of the deadlocked btrfs-cleaner kthread is as follows.
Tainted: P OE 4.15.0-45-generic #48-Ubuntu 0x291/0x8a0 tree_read_ lock+0xcc/ 0x120 [btrfs] 0x80/0x80 nodes+0x295/ 0xe90 [btrfs] 0x19/0x40 find_all_ roots_safe+ 0xb0/0x120 [btrfs] all_roots_ safe+0xb0/ 0x120 [btrfs] find_all_ roots+0x61/ 0x80 [btrfs] qgroup_ trace_extent_ post+0x37/ 0x60 [btrfs]
btrfs-cleaner D 0 7969 2 0x80000000
Call Trace:
__schedule+
schedule+0x2c/0x80
btrfs_
? wait_woken+
find_parent_
? _cond_resched+
btrfs_
? btrfs_find_
btrfs_
btrfs_
[...]
I'm not including the bottom of the call trace because it varies: the common part does start from btrfs_qgroup_ trace_extent_ post and up however. The caller of btrfs_qgroup_ trace_extent_ pos can be either trace_extent+ 0xee/0x110 [btrfs], or delayed_ tree_ref+ 0x1c6/0x1f0 [btrfs], or delayed_ data_ref+ 0x30a/0x340 [btrfs]
- btrfs_qgroup_
- btrfs_add_
- btrfs_add_
This happens on 4.15 (Ubuntu flavor), on 4.18 (Ubuntu flavor "HWE"), 4.20.13 (vanilla).
On 4.20.0 (vanilla) and 5.0-rc8 (vanilla), there is also a deadlock under similar conditions, but the call trace of the deadlocked btrfs-transaction kthread looks different:
Tainted: P OE 4.20.0- 042000- generic #201812232030 0x29e/0x840 path+0x13/ 0x20 [btrfs] commit_ transaction+ 0x715/0x840 [btrfs] 0x80/0x80 kthread+ 0x15c/0x190 [btrfs] 0x120/0x140 transaction+ 0x560/0x560 [btrfs] parkme+ 0x70/0x70 fork+0x35/ 0x40
btrfs-transacti D 0 8665 2 0x80000000
Call Trace:
__schedule+
? btrfs_free_
schedule+0x2c/0x80
btrfs_
? wait_woken+
transaction_
kthread+
? btrfs_cleanup_
? __kthread_
ret_from_
Other userspace threads are locked at the same time.
So we seem to be dealing with at least 2 different deadlock cases which seem to happen with lots of subvolumes and/or snapshots, and quota enabled. All of this disappears with quota disabled.
For the record, the main btrfs qgroups dev seems to have a lot of pending changes / fixes coming around this, expected for 5.1 or 5.2. Stay tuned...
I have disabled quota for now. I only enable it for a short period of time when I need to get size information about my subvols and snapshots.