tasks doing write()/fsync() hit deadlock in write_cache_pages()

Bug #1824827 reported by Mauricio Faria de Oliveira on 2019-04-15
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Cosmic
Undecided
Mauricio Faria de Oliveira
Disco
Undecided
Unassigned

Bug Description

[Impact]

 * Tasks of a multi-threaded workload doing write() and fsync()
   might deadlock in write_cache_pages(), preventing progress.

 * The fix addresses a corner case in write_cache_pages() on
   the range_cyclic implementation which allows the deadlock.

 * Patch:
   - commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7
     ("mm/page-writeback.c: fix range_cyclic writeback vs
     writepages deadlock"), present in v4.20-rc1~92^2~19.

[Test Case]

 * This issue originally has been hit by the 'perforce' (p4d)
   tool in a XFS filesystem, but it's difficult/rare to occur.

 * We've written an userspace + kernel module (kprobes-based)
   to reproduce this problem and verify the test kernel/patch.

 * The kprobes are strictly tied to particular kernel versions
   because of the assembly instruction offsets. We'll provide
   updated versions for -updates and -proposed for verification.

 * Steps
   (see output examples in comments):

   - Userspace part:
   $ gcc -o test test.c -pthread

   - Kernel part:
   $ touch Makefile
   $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o clean
   $ make -C /lib/modules/$(uname -r)/build M=$(pwd) obj-m=kprobe-test.o modules

   - Shorter hung task timeout and higher console logging level
     to notice the deadlocked tasks sooner, and watch progress:
   $ echo 10 | sudo tee /proc/sys/kernel/hung_task_timeout_secs
   $ echo 9 | sudo tee /proc/sys/kernel/printk

   - Load module / Run userspace part (logging to kernel log) in XFS:
   $ sudo insmod kprobe-test.ko
   $ cd /path/to/xfs-mountpoint && sudo sh -c 'stdbuf -oL /path/to/test >/dev/kmsg'
   $ sudo rmmod kprobe-test

   You may need to ctrl-z with the original kernel as 'test' doesn't finish.

   - Check kernel log or watch the system console:
   $ dmesg

   Check threads in D state.
   $ ps -eLo pid,tid,state,comm | grep D | grep -e test -e kworker

[Regression Potential]

 * The patch is small but changes core writeback infrastructure,
   so there's a chance this may _affect_ some or other behavior
   that has not been validated with our regression testing; not
   exactly _break_ it. Please note our regression testing.

 * This has been verified with 'xfstests' (not only for XFS fs,
   despite its original name), used by major Linux filesystems
   for regression testing during development. It's been tested
   on systems with 24 and 4 CPUs (to exercise differences in
   scalability, parallelism, and workload) and XFS and ext4
   (reporter's environment + Ubuntu's default).
   No regressions were observed (the set of failed tests is
   the same in each system and tests failed in the same way).

 * This has also been verified with 'iozone' for write intensive
   tests, to exercise the writeback mechanism and no errors were
   observed.

 * The reporter has been running the test kernel with the patch
   for weeks and has not observed any other issues/regressions.

[Other Info]

 * This is only required in Cosmic (for the Bionic HWE kernel),
   and is already applied in Disco.

Changed in linux (Ubuntu Disco):
status: New → Invalid
Changed in linux (Ubuntu Cosmic):
status: New → Confirmed
assignee: nobody → Mauricio Faria de Oliveira (mfo)

testcase, kernel part.

testcase, userspace part.

Download full text (9.5 KiB)

testcase
========

original kernel :: latest cosmic version:
----------------------------------------

$ uname -rv
4.18.0-18-generic #19-Ubuntu SMP Tue Apr 2 18:13:16 UTC 2019

[ 654.491029] kprobe_test: loading out-of-tree module taints kernel.
[ 654.493322] kprobe_test: module verification failed: signature and/or required key missing - tainting kernel

[ 654.497033] mod_init():158 :: hello
[ 654.497976] mod_init():183 :: kernel version: orig/-18/cosmic

[ 694.254271] Program running, TID = 3292
[ 694.256600] kp1_pre_handler():070 :: state 0 .... :: pid = 3292, mapping = 0xffff962333263730, comm = 'test'
[ 694.260870] kp1_pre_handler():079 :: state 0 -> 1 :: pid = 3292, mapping = 0xffff962333263730
[ 694.262710] kp2_pre_handler():119 :: state 1 .... :: pid = 3292, page index = 1
[ 694.264264] kp3_pre_handler():144 :: state 1 .... :: pid = 3292, page index = 1, calling writepage()
[ 694.266641] kp2_pre_handler():119 :: state 1 .... :: pid = 3292, page index = 2
[ 694.268456] kp3_pre_handler():144 :: state 1 .... :: pid = 3292, page index = 2, calling writepage()

[ 695.276320] Thread 0 running, TID = 3293!
[ 695.281210] kp1_pre_handler():070 :: state 1 .... :: pid = 1165, mapping = 0xffff962333263730, comm = 'kworker/u4:2'
[ 695.299026] kp1_pre_handler():101 :: state 1 -> 2 :: pid = 1165, mapping = 0xffff962333263730, comm ('kworker/u4:2') is kworker AND wbc->range_cyclic (0x1) is true AND mapping->writeback_index (0x2) is 0x2.
[ 695.314808] kp2_pre_handler():119 :: state 2 .... :: pid = 1165, page index = 2
[ 695.322822] kp3_pre_handler():144 :: state 2 .... :: pid = 1165, page index = 2, calling writepage()

[ 695.330308] kp2_pre_handler():119 :: state 2 .... :: pid = 1165, page index = 1
[ 695.334355] kp2_pre_handler():123 :: state 2 -> 3 :: pid = 1165, page index = 1, spin 5 seconds before lock_page()...

[ 696.283747] Thread 1 running, TID = 3295!
[ 696.284623] kp1_pre_handler():070 :: state 3 .... :: pid = 3295, mapping = 0xffff962333263730, comm = 'test'
[ 696.286726] kp2_pre_handler():119 :: state 3 .... :: pid = 3295, page index = 1
[ 696.288392] kp3_pre_handler():144 :: state 3 .... :: pid = 3295, page index = 1, calling writepage()
[ 696.290018] kp2_pre_handler():119 :: state 3 .... :: pid = 3295, page index = 2

[ 697.283941] Thread 2 running, TID = 3296!
[ 697.284859] kp1_pre_handler():070 :: state 3 .... :: pid = 3296, mapping = 0xffff962333263730, comm = 'test'
[ 697.287246] kp2_pre_handler():119 :: state 3 .... :: pid = 3296, page index = 1
[ 700.302756] kp2_pre_handler():127 :: state 3 -> 4 :: pid = 1165, page index = 1, spun 5 seconds before lock_page().

[ 715.716717] INFO: task kworker/u4:2:1165 blocked for more than 10 seconds.
[ 715.725486] Tainted: G OE 4.18.0-18-generic #19-Ubuntu
[ 715.732832] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 715.740500] kworker/u4:2 D 0 1165 2 0x80000000
[ 715.745615] Workqueue: writeback wb_workfn (flush-7:1)
[ 715.750736] Call Trace:
[ 715.753270] __schedule+0x29e/0x840
[ 715.756493] schedule+0x2c/0x80
[ 715.759369] io_schedule+0x16/0x40
[ 715.762...

Read more...

testcase
========

test kernel :: latest cosmic version + patch:
--------------------------------------------

$ uname -rv
4.18.0-18-generic #19+test20190415b1 SMP Mon Apr 15 15:43:20 UTC 2019

[ 169.145212] kprobe_test: loading out-of-tree module taints kernel.
[ 169.149144] kprobe_test: module verification failed: signature and/or required key missing - tainting kernel

[ 169.153539] mod_init():158 :: hello
[ 169.154744] mod_init():190 :: kernel version: test/-18/cosmic

[ 169.177027] Program running, TID = 2497
[ 169.177978] kp1_pre_handler():070 :: state 0 .... :: pid = 2497, mapping = 0xffff993df9136fb0, comm = 'test'
[ 169.181080] kp1_pre_handler():079 :: state 0 -> 1 :: pid = 2497, mapping = 0xffff993df9136fb0
[ 169.183355] kp2_pre_handler():119 :: state 1 .... :: pid = 2497, page index = 1
[ 169.185616] kp3_pre_handler():144 :: state 1 .... :: pid = 2497, page index = 1, calling writepage()
[ 169.187779] kp2_pre_handler():119 :: state 1 .... :: pid = 2497, page index = 2
[ 169.189186] kp3_pre_handler():144 :: state 1 .... :: pid = 2497, page index = 2, calling writepage()

[ 170.194880] Thread 0 running, TID = 2498!
[ 170.200011] kp1_pre_handler():070 :: state 1 .... :: pid = 7, mapping = 0xffff993df9136fb0, comm = 'kworker/u4:0'
[ 170.217616] kp1_pre_handler():101 :: state 1 -> 2 :: pid = 7, mapping = 0xffff993df9136fb0, comm ('kworker/u4:0') is kworker AND wbc->range_cyclic (0x1) is true AND mapping->writeback_index (0x2) is 0x2.
[ 170.238633] kp2_pre_handler():119 :: state 2 .... :: pid = 7, page index = 2
[ 170.248024] kp3_pre_handler():144 :: state 2 .... :: pid = 7, page index = 2, calling writepage()
[ 170.261141] kp1_pre_handler():070 :: state 2 .... :: pid = 7, mapping = 0xffff993df9136fb0, comm = 'kworker/u4:0'
[ 170.272150] kp2_pre_handler():119 :: state 2 .... :: pid = 7, page index = 1
[ 170.279860] kp2_pre_handler():123 :: state 2 -> 3 :: pid = 7, page index = 1, spin 5 seconds before lock_page()...

[ 171.195090] Thread 1 running, TID = 2499!
[ 171.196182] kp1_pre_handler():070 :: state 3 .... :: pid = 2499, mapping = 0xffff993df9136fb0, comm = 'test'
[ 171.198609] kp2_pre_handler():119 :: state 3 .... :: pid = 2499, page index = 1
[ 171.200358] kp3_pre_handler():144 :: state 3 .... :: pid = 2499, page index = 1, calling writepage()
[ 171.203717] kp2_pre_handler():119 :: state 3 .... :: pid = 2499, page index = 2

[ 172.195297] Thread 2 running, TID = 2500!
[ 172.196387] kp1_pre_handler():070 :: state 3 .... :: pid = 2500, mapping = 0xffff993df9136fb0, comm = 'test'
[ 172.198673] kp2_pre_handler():119 :: state 3 .... :: pid = 2500, page index = 1
[ 175.252161] kp2_pre_handler():127 :: state 3 -> 4 :: pid = 7, page index = 1, spun 5 seconds before lock_page().

[ 175.254922] kp3_pre_handler():144 :: state 4 .... :: pid = 2499, page index = 2, calling writepage()
[ 175.256849] kp3_pre_handler():144 :: state 4 .... :: pid = 2500, page index = 1, calling writepage()
[ 175.259166] kp2_pre_handler():119 :: state 4 .... :: pid = 2500, page index = 2
[ 175.273178] mod_exit():213 :: bye

Patch posted for SRU:

[C][PATCH 0/1] Fix write()/fsync() deadlock in write_cache_pages()
https://lists.ubuntu.com/archives/kernel-team/2019-April/100084.html

The change introduced by the patch is evident in the kernel message log for Thread 0:
between page indexes 2 and 1 there's now another function call to write_cache_pages()
instead of just another iteration of the for-loop inside one call.

Original kernel:

[ 695.276320] Thread 0 running, TID = 3293!
[ 695.281210] kp1_pre_handler():070 :: state 1 .... :: pid = 1165, mapping = 0xffff962333263730, comm = 'kworker/u4:2'
[ 695.299026] kp1_pre_handler():101 :: state 1 -> 2 :: pid = 1165, mapping = 0xffff962333263730, comm ('kworker/u4:2') is kworker AND wbc->range_cyclic (0x1) is true AND mapping->writeback_index (0x2) is 0x2.
[ 695.314808] kp2_pre_handler():119 :: state 2 .... :: pid = 1165, page index = 2
[ 695.322822] kp3_pre_handler():144 :: state 2 .... :: pid = 1165, page index = 2, calling writepage()
  << ... HERE ... >>
[ 695.330308] kp2_pre_handler():119 :: state 2 .... :: pid = 1165, page index = 1
[ 695.334355] kp2_pre_handler():123 :: state 2 -> 3 :: pid = 1165, page index = 1, spin 5 seconds before lock_page()...

Test kernel:

[ 170.194880] Thread 0 running, TID = 2498!
[ 170.200011] kp1_pre_handler():070 :: state 1 .... :: pid = 7, mapping = 0xffff993df9136fb0, comm = 'kworker/u4:0'
[ 170.217616] kp1_pre_handler():101 :: state 1 -> 2 :: pid = 7, mapping = 0xffff993df9136fb0, comm ('kworker/u4:0') is kworker AND wbc->range_cyclic (0x1) is true AND mapping->writeback_index (0x2) is 0x2.
[ 170.238633] kp2_pre_handler():119 :: state 2 .... :: pid = 7, page index = 2
[ 170.248024] kp3_pre_handler():144 :: state 2 .... :: pid = 7, page index = 2, calling writepage()
[ 170.261141] kp1_pre_handler():070 :: state 2 .... :: pid = 7, mapping = 0xffff993df9136fb0, comm = 'kworker/u4:0'
[ 170.272150] kp2_pre_handler():119 :: state 2 .... :: pid = 7, page index = 1
[ 170.279860] kp2_pre_handler():123 :: state 2 -> 3 :: pid = 7, page index = 1, spin 5 seconds before lock_page()...

Changed in linux (Ubuntu Cosmic):
status: Confirmed → In Progress
Changed in linux (Ubuntu Cosmic):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-cosmic' to 'verification-done-cosmic'. If the problem still exists, change the tag 'verification-needed-cosmic' to 'verification-failed-cosmic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-cosmic

Verification successful on Cosmic.

Updated the test-case kernel part (attached), and repeated it 20+ times, without any process hanging.

In all cases, the new function call into write_cache_pages() is observed in thread 0, between page index 2 and page index 1.

[ 150.914872] mod_init():161 :: hello
[ 150.917828] mod_init():207 :: kernel version: prop/-19/cosmic

[ 150.950322] Program running, TID = 1429
[ 150.951566] kp1_pre_handler():073 :: state 0 .... :: pid = 1429, mapping = 0xffff8abcba385570, comm = 'test'
[ 150.954205] kp1_pre_handler():082 :: state 0 -> 1 :: pid = 1429, mapping = 0xffff8abcba385570
[ 150.956518] kp2_pre_handler():122 :: state 1 .... :: pid = 1429, page index = 1
[ 150.958410] kp3_pre_handler():147 :: state 1 .... :: pid = 1429, page index = 1, calling writepage()
[ 150.961047] kp2_pre_handler():122 :: state 1 .... :: pid = 1429, page index = 2
[ 150.964788] kp3_pre_handler():147 :: state 1 .... :: pid = 1429, page index = 2, calling writepage()

[ 151.973660] Thread 0 running, TID = 1430!
[ 151.977071] kp1_pre_handler():073 :: state 1 .... :: pid = 7, mapping = 0xffff8abcba385570, comm = 'kworker/u8:0'
[ 151.984836] kp1_pre_handler():104 :: state 1 -> 2 :: pid = 7, mapping = 0xffff8abcba385570, comm ('kworker/u8:0') is kworker AND wbc->range_cyclic (0x1) is true AND mapping->writeback_index (0x2) is 0x2.
[ 152.017726] kp2_pre_handler():122 :: state 2 .... :: pid = 7, page index = 2
[ 152.027193] kp3_pre_handler():147 :: state 2 .... :: pid = 7, page index = 2, calling writepage()
[ 152.038466] kp1_pre_handler():073 :: state 2 .... :: pid = 7, mapping = 0xffff8abcba385570, comm = 'kworker/u8:0'
[ 152.048736] kp2_pre_handler():122 :: state 2 .... :: pid = 7, page index = 1
[ 152.056642] kp2_pre_handler():126 :: state 2 -> 3 :: pid = 7, page index = 1, spin 5 seconds before lock_page()...

[ 152.973731] Thread 1 running, TID = 1431!
[ 152.974943] kp1_pre_handler():073 :: state 3 .... :: pid = 1431, mapping = 0xffff8abcba385570, comm = 'test'
[ 152.977489] kp2_pre_handler():122 :: state 3 .... :: pid = 1431, page index = 1
[ 152.979140] kp3_pre_handler():147 :: state 3 .... :: pid = 1431, page index = 1, calling writepage()
[ 152.981928] kp2_pre_handler():122 :: state 3 .... :: pid = 1431, page index = 2

[ 153.973895] Thread 2 running, TID = 1432!
[ 153.975160] kp1_pre_handler():073 :: state 3 .... :: pid = 1432, mapping = 0xffff8abcba385570, comm = 'test'
[ 153.978573] kp2_pre_handler():122 :: state 3 .... :: pid = 1432, page index = 1
[ 157.033588] kp2_pre_handler():130 :: state 3 -> 4 :: pid = 7, page index = 1, spun 5 seconds before lock_page().
[ 157.036151] kp3_pre_handler():147 :: state 4 .... :: pid = 1431, page index = 2, calling writepage()
[ 157.038804] kp3_pre_handler():147 :: state 4 .... :: pid = 1432, page index = 1, calling writepage()
[ 157.041212] kp2_pre_handler():122 :: state 4 .... :: pid = 1432, page index = 2
[ 157.058880] mod_exit():230 :: bye

tags: added: verification-done-cosmic
removed: verification-needed-cosmic
Launchpad Janitor (janitor) wrote :
Download full text (7.0 KiB)

This bug was fixed in the package linux - 4.18.0-20.21

---------------
linux (4.18.0-20.21) cosmic; urgency=medium

  * CVE-2018-12126 // CVE-2018-12127 // CVE-2018-12130
    - Documentation/l1tf: Fix small spelling typo
    - x86/cpu: Sanitize FAM6_ATOM naming
    - kvm: x86: Report STIBP on GET_SUPPORTED_CPUID
    - locking/atomics, asm-generic: Move some macros from <linux/bitops.h> to a
      new <linux/bits.h> file
    - tools include: Adopt linux/bits.h
    - x86/msr-index: Cleanup bit defines
    - x86/speculation: Consolidate CPU whitelists
    - x86/speculation/mds: Add basic bug infrastructure for MDS
    - x86/speculation/mds: Add BUG_MSBDS_ONLY
    - x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests
    - x86/speculation/mds: Add mds_clear_cpu_buffers()
    - x86/speculation/mds: Clear CPU buffers on exit to user
    - x86/kvm/vmx: Add MDS protection when L1D Flush is not active
    - x86/speculation/mds: Conditionally clear CPU buffers on idle entry
    - x86/speculation/mds: Add mitigation control for MDS
    - x86/speculation/mds: Add sysfs reporting for MDS
    - x86/speculation/mds: Add mitigation mode VMWERV
    - Documentation: Move L1TF to separate directory
    - Documentation: Add MDS vulnerability documentation
    - x86/speculation/mds: Add mds=full,nosmt cmdline option
    - x86/speculation: Move arch_smt_update() call to after mitigation decisions
    - x86/speculation/mds: Add SMT warning message
    - x86/speculation/mds: Fix comment
    - x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off
    - x86/speculation/mds: Add 'mitigations=' support for MDS

  * CVE-2017-5715 // CVE-2017-5753
    - s390/speculation: Support 'mitigations=' cmdline option

  * CVE-2017-5715 // CVE-2017-5753 // CVE-2017-5754 // CVE-2018-3639
    - powerpc/speculation: Support 'mitigations=' cmdline option

  * CVE-2017-5715 // CVE-2017-5754 // CVE-2018-3620 // CVE-2018-3639 //
    CVE-2018-3646
    - cpu/speculation: Add 'mitigations=' cmdline option
    - x86/speculation: Support 'mitigations=' cmdline option

  * Packaging resync (LP: #1786013)
    - [Packaging] resync git-ubuntu-log

linux (4.18.0-19.20) cosmic; urgency=medium

  * linux: 4.18.0-19.20 -proposed tracker (LP: #1826171)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync git-ubuntu-log

  * autopkgtests run too often, too much and don't skip enough (LP: #1823056)
    - [Debian] Set +x on rebuild testcase.
    - [Debian] Skip rebuild test, for regression-suite deps.
    - [Debian] Make ubuntu-regression-suite skippable on unbootable kernels.
    - [Debian] make rebuild use skippable error codes when skipping.
    - [Debian] Only run regression-suite, if requested to.

  * CVE-2017-5753
    - s390/keyboard: sanitize array index in do_kdsk_ioctl
    - drm/bufs: Fix Spectre v1 vulnerability
    - drivers/misc/sgi-gru: fix Spectre v1 vulnerability
    - ipv4: Fix potential Spectre v1 vulnerability
    - aio: fix spectre gadget in lookup_ioctx
    - ALSA: emux: Fix potential Spectre v1 vulnerabilities
    - ALSA: pcm: Fix potential Spectre v1 vulnerability
    - ip6mr: Fix potential Spectre v1 vulnerability
    - ALSA: rme9652: Fix potential Spectre v1...

Read more...

Changed in linux (Ubuntu Cosmic):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for linux-azure has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers