FS access deadlock with btrfs quotas enabled

Bug #1765998 reported by Michael Sparmann on 2018-04-21
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Bionic
High
Unassigned

Bug Description

I'm running into an issue on Ubuntu Bionic (but not Xenial) where shortly after boot, under heavy load from many LXD containers starting at once, access to the btrfs filesystem that the containers are on deadlocks.

The issue is quite hard to reproduce on other systems, quite likely related to the size of the filesystem involved (4 devices with a total of 8TB, millions of files, ~20 subvolumes with tens of snapshots each) and the access pattern from many LXD containers at once. It definitely goes away when disabling btrfs quotas though. Another prerequisite to trigger this bug may be the container subvolumes sharing extents (from their parent image or due to deduplication).

I can only reliably reproduce it on a production system that I can only do very limited testing on, however I have been able to gather the following information:
- Many threads are stuck, trying to aquire locks on various tree roots, which are never released by their current holders.
- There always seem to be (at least) two threads executing rmdir syscalls which are creating the circular dependency: One of them is in btrfs_cow_block => ... => btrfs_qgroup_trace_extent_post => ... => find_parent_nodes and wants to acquire a lock that was already aquired by btrfs_search_slot of the other rmdir.
- Reverting this patch seems to prevent it from happening: https://patchwork.kernel.org/patch/9573267/

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1765998

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Michael Sparmann (theseven) wrote :

I cannot run the affected (production) system using a broken kernel, and it will lockup after boot within seconds.
If necessary, I can provide additional information or testing upon request.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: bionic kernel-da-key
Joseph Salisbury (jsalisbury) wrote :

I tried to build a Bionic test kernel with commit fb235dc reverted, but it does not revert cleanly and requires a back port. Do you have the back port you performed and can post to the bug?

Also, would it be possible for you to test the latest mainline kernel to see if the bug is already fixed upstream? It is available from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17-rc2

Changed in linux (Ubuntu Bionic):
status: Confirmed → Incomplete
Michael Sparmann (theseven) wrote :

This patch seems to fix it for me (running that for several days now).

Michael Sparmann (theseven) wrote :

I can confirm that the issue still exists in the mainline kernel build linked above.

tags: added: patch
Changed in linux (Ubuntu Bionic):
status: Incomplete → Confirmed
tags: added: kernel-bug-exists-upstream
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since the bug still exists in mainline without a revert. Would it be possible for you to open an upstream bug report[0] and maybe ping the patch author?
 That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

Once this bug is reported upstream, please add the tag: 'kernel-bug-reported-upstream'.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu Bionic):
status: Confirmed → Triaged
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers