Comment 9 for bug 1765998

Revision history for this message
Stéphane Lesimple (speed47) wrote :

I see, thanks for the info.

I'll report my findings so far here, in case it turns out useful to some future person landing on this bug later:

The call trace of the deadlocked btrfs-cleaner kthread is as follows.

      Tainted: P OE 4.15.0-45-generic #48-Ubuntu
btrfs-cleaner D 0 7969 2 0x80000000
Call Trace:
 __schedule+0x291/0x8a0
 schedule+0x2c/0x80
 btrfs_tree_read_lock+0xcc/0x120 [btrfs]
 ? wait_woken+0x80/0x80
 find_parent_nodes+0x295/0xe90 [btrfs]
 ? _cond_resched+0x19/0x40
 btrfs_find_all_roots_safe+0xb0/0x120 [btrfs]
 ? btrfs_find_all_roots_safe+0xb0/0x120 [btrfs]
 btrfs_find_all_roots+0x61/0x80 [btrfs]
 btrfs_qgroup_trace_extent_post+0x37/0x60 [btrfs]
[...]

I'm not including the bottom of the call trace because it varies: the common part does start from btrfs_qgroup_trace_extent_post and up however. The caller of btrfs_qgroup_trace_extent_pos can be either
- btrfs_qgroup_trace_extent+0xee/0x110 [btrfs], or
- btrfs_add_delayed_tree_ref+0x1c6/0x1f0 [btrfs], or
- btrfs_add_delayed_data_ref+0x30a/0x340 [btrfs]

This happens on 4.15 (Ubuntu flavor), on 4.18 (Ubuntu flavor "HWE"), 4.20.13 (vanilla).

On 4.20.0 (vanilla) and 5.0-rc8 (vanilla), there is also a deadlock under similar conditions, but the call trace of the deadlocked btrfs-transaction kthread looks different:

      Tainted: P OE 4.20.0-042000-generic #201812232030
btrfs-transacti D 0 8665 2 0x80000000
Call Trace:
 __schedule+0x29e/0x840
 ? btrfs_free_path+0x13/0x20 [btrfs]
 schedule+0x2c/0x80
 btrfs_commit_transaction+0x715/0x840 [btrfs]
 ? wait_woken+0x80/0x80
 transaction_kthread+0x15c/0x190 [btrfs]
 kthread+0x120/0x140
 ? btrfs_cleanup_transaction+0x560/0x560 [btrfs]
 ? __kthread_parkme+0x70/0x70
 ret_from_fork+0x35/0x40

Other userspace threads are locked at the same time.

So we seem to be dealing with at least 2 different deadlock cases which seem to happen with lots of subvolumes and/or snapshots, and quota enabled. All of this disappears with quota disabled.

For the record, the main btrfs qgroups dev seems to have a lot of pending changes / fixes coming around this, expected for 5.1 or 5.2. Stay tuned...

I have disabled quota for now. I only enable it for a short period of time when I need to get size information about my subvols and snapshots.