commit 5ab365dd41cfec3987fda853fbcee18be5d7b081
Author: Zhixiong Chi <email address hidden>
Date: Thu Nov 10 05:38:05 2022 -0800
xfs: fix ioend batching log reservation deadlock
Problem:
We received a report of a workload that causes an xfs task to be blocked
for more than 120 seconds on log reservation via iomap_ioend completion
batching.
kernel: err [5636141.631454] INFO: task xfs-conv/dm-4:1788 blocked for more than 122 seconds.
kernel: info [267022.728862] Workqueue: xfs-conv/dm-4 xfs_end_io [xfs]
kernel: info [267022.728864] Call Trace:
kernel: info [267022.728870] __schedule+0x340/0x810
kernel: info [267022.728876] schedule+0x51/0xc0
kernel: info [267022.728913] xlog_grant_head_wait+0xc7/0x200 [xfs]
kernel: info [267022.728950] xlog_grant_head_check+0xd0/0x110 [xfs]
kernel: info [267022.728985] xfs_log_reserve+0xc3/0x1e0 [xfs]
kernel: info [267022.729023] xfs_trans_reserve+0x156/0x1b0 [xfs]
kernel: info [267022.729184] xfs_trans_alloc+0xc6/0x190 [xfs]
kernel: info [267022.729317] xfs_iomap_write_unwritten+0xaa/0x2c0 [xfs]
kernel: info [267022.729333] ? stop_one_cpu+0x71/0xa0
kernel: info [267022.729347] ? set_cpus_allowed_ptr+0x10/0x10
kernel: info [267022.729396] xfs_end_ioend+0xc4/0x100 [xfs]
kernel: info [267022.729444] ? xfs_setfilesize_ioend+0x60/0x60 [xfs]
kernel: info [267022.729491] xfs_end_io+0xb9/0xe0 [xfs]
kernel: info [267022.729505] process_one_work+0x1a1/0x370
kernel: info [267022.729516] rescuer_thread+0x207/0x350
kernel: info [267022.729528] ? worker_thread+0x370/0x370
kernel: info [267022.729537] kthread+0x12e/0x150
kernel: info [267022.729548] ? __kthread_cancel_work+0x40/0x40
kernel: info [267022.729559] ret_from_fork+0x1f/0x30
After that, the connection via ssh to the controller is stuck,
Press Ctrl+C, it entered shell and the prompt displayed '-sh-4.2$'
Solution:
Removing the preallocated transaction from xfs append ioends to avoid
the ioend completion batching log reservation deadlock.
Now we continue to process the append ioend completions via the
workqueue, but let the wq task allocate the transaction similar to other
ioend types.
Backport the four patches from upstream(git://git.kernel.org/pub/scm/
linux/kernel/git/torvalds/linux.git) for debian-based StarlingX.
Only the 0034-xfs-use-current-journal_info-for-detecting-transacti.patch
for centos-based StarlingX is from stable tree(git://git.kernel.org/pub/
scm/linux/kernel/git/stable/linux.git linux-5.10.y branch), because the
kernel has been upgraded to v5.10.152 for debian-based StarlingX which
includes this fix, so we just apply it for the centos-based one.
TestPlan:
Pass: Execute bonnie++ test for xfs filesystem successfully without
kernel panic and any xfs anomalies in the kernel logs.
$mkfs.x /dev/sdc1
$mount /dev/sdc1 ~/xfstests
$sudo bonnie++ -u root:root -d ~/xfstests
Debian:
Pass: build-pkgs -c -a
Pass: build-image
Pass: boot successfully with std/rt.
CentOS:
Pass: build-pkgs
Pass: build-iso
Pass: boot successfully with std/rt.
Closes-Bug: 1996269
Signed-off-by: Zhixiong Chi <email address hidden>
Change-Id: I1e5b85111b2b54cd249c116724b952042f9d781f
Reviewed: https:/ /review. opendev. org/c/starlingx /kernel/ +/864257 /opendev. org/starlingx/ kernel/ commit/ 5ab365dd41cfec3 987fda853fbcee1 8be5d7b081
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 5ab365dd41cfec3 987fda853fbcee1 8be5d7b081
Author: Zhixiong Chi <email address hidden>
Date: Thu Nov 10 05:38:05 2022 -0800
xfs: fix ioend batching log reservation deadlock
Problem:
We received a report of a workload that causes an xfs task to be blocked
for more than 120 seconds on log reservation via iomap_ioend completion
batching.
kernel: err [5636141.631454] INFO: task xfs-conv/dm-4:1788 blocked for
more than 122 seconds.
kernel: info [267022.728862] Workqueue: xfs-conv/dm-4 xfs_end_io [xfs] 0x340/0x810 head_wait+ 0xc7/0x200 [xfs] head_check+ 0xd0/0x110 [xfs] reserve+ 0xc3/0x1e0 [xfs] reserve+ 0x156/0x1b0 [xfs] alloc+0xc6/ 0x190 [xfs] write_unwritten +0xaa/0x2c0 [xfs] cpu+0x71/ 0xa0 allowed_ ptr+0x10/ 0x10 ioend+0xc4/ 0x100 [xfs] _ioend+ 0x60/0x60 [xfs] io+0xb9/ 0xe0 [xfs] one_work+ 0x1a1/0x370 thread+ 0x207/0x350 thread+ 0x370/0x370 cancel_ work+0x40/ 0x40 fork+0x1f/ 0x30
kernel: info [267022.728864] Call Trace:
kernel: info [267022.728870] __schedule+
kernel: info [267022.728876] schedule+0x51/0xc0
kernel: info [267022.728913] xlog_grant_
kernel: info [267022.728950] xlog_grant_
kernel: info [267022.728985] xfs_log_
kernel: info [267022.729023] xfs_trans_
kernel: info [267022.729184] xfs_trans_
kernel: info [267022.729317] xfs_iomap_
kernel: info [267022.729333] ? stop_one_
kernel: info [267022.729347] ? set_cpus_
kernel: info [267022.729396] xfs_end_
kernel: info [267022.729444] ? xfs_setfilesize
kernel: info [267022.729491] xfs_end_
kernel: info [267022.729505] process_
kernel: info [267022.729516] rescuer_
kernel: info [267022.729528] ? worker_
kernel: info [267022.729537] kthread+0x12e/0x150
kernel: info [267022.729548] ? __kthread_
kernel: info [267022.729559] ret_from_
After that, the connection via ssh to the controller is stuck,
Press Ctrl+C, it entered shell and the prompt displayed '-sh-4.2$'
Solution:
Removing the preallocated transaction from xfs append ioends to avoid
the ioend completion batching log reservation deadlock.
Now we continue to process the append ioend completions via the
workqueue, but let the wq task allocate the transaction similar to other
ioend types.
Backport the four patches from upstream( git://git. kernel. org/pub/ scm/ kernel/ git/torvalds/ linux.git) for debian-based StarlingX. use-current- journal_ info-for- detecting- transacti. patch //git.kernel. org/pub/ linux/kernel/ git/stable/ linux.git linux-5.10.y branch), because the
linux/
Only the 0034-xfs-
for centos-based StarlingX is from stable tree(git:
scm/
kernel has been upgraded to v5.10.152 for debian-based StarlingX which
includes this fix, so we just apply it for the centos-based one.
TestPlan:
Pass: Execute bonnie++ test for xfs filesystem successfully without
kernel panic and any xfs anomalies in the kernel logs.
$mkfs.x /dev/sdc1
$mount /dev/sdc1 ~/xfstests
$sudo bonnie++ -u root:root -d ~/xfstests
Debian:
Pass: build-pkgs -c -a
Pass: build-image
Pass: boot successfully with std/rt.
CentOS:
Pass: build-pkgs
Pass: build-iso
Pass: boot successfully with std/rt.
Closes-Bug: 1996269
Signed-off-by: Zhixiong Chi <email address hidden> cd249c116724b95 2042f9d781f
Change-Id: I1e5b85111b2b54