Comment 2 for bug 1996269

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kernel (master)

Reviewed: https://review.opendev.org/c/starlingx/kernel/+/864257
Committed: https://opendev.org/starlingx/kernel/commit/5ab365dd41cfec3987fda853fbcee18be5d7b081
Submitter: "Zuul (22348)"
Branch: master

commit 5ab365dd41cfec3987fda853fbcee18be5d7b081
Author: Zhixiong Chi <email address hidden>
Date: Thu Nov 10 05:38:05 2022 -0800

    xfs: fix ioend batching log reservation deadlock

    Problem:
    We received a report of a workload that causes an xfs task to be blocked
    for more than 120 seconds on log reservation via iomap_ioend completion
    batching.

     kernel: err [5636141.631454] INFO: task xfs-conv/dm-4:1788 blocked for
                                        more than 122 seconds.

     kernel: info [267022.728862] Workqueue: xfs-conv/dm-4 xfs_end_io [xfs]
     kernel: info [267022.728864] Call Trace:
     kernel: info [267022.728870] __schedule+0x340/0x810
     kernel: info [267022.728876] schedule+0x51/0xc0
     kernel: info [267022.728913] xlog_grant_head_wait+0xc7/0x200 [xfs]
     kernel: info [267022.728950] xlog_grant_head_check+0xd0/0x110 [xfs]
     kernel: info [267022.728985] xfs_log_reserve+0xc3/0x1e0 [xfs]
     kernel: info [267022.729023] xfs_trans_reserve+0x156/0x1b0 [xfs]
     kernel: info [267022.729184] xfs_trans_alloc+0xc6/0x190 [xfs]
     kernel: info [267022.729317] xfs_iomap_write_unwritten+0xaa/0x2c0 [xfs]
     kernel: info [267022.729333] ? stop_one_cpu+0x71/0xa0
     kernel: info [267022.729347] ? set_cpus_allowed_ptr+0x10/0x10
     kernel: info [267022.729396] xfs_end_ioend+0xc4/0x100 [xfs]
     kernel: info [267022.729444] ? xfs_setfilesize_ioend+0x60/0x60 [xfs]
     kernel: info [267022.729491] xfs_end_io+0xb9/0xe0 [xfs]
     kernel: info [267022.729505] process_one_work+0x1a1/0x370
     kernel: info [267022.729516] rescuer_thread+0x207/0x350
     kernel: info [267022.729528] ? worker_thread+0x370/0x370
     kernel: info [267022.729537] kthread+0x12e/0x150
     kernel: info [267022.729548] ? __kthread_cancel_work+0x40/0x40
     kernel: info [267022.729559] ret_from_fork+0x1f/0x30

    After that, the connection via ssh to the controller is stuck,
    Press Ctrl+C, it entered shell and the prompt displayed '-sh-4.2$'

    Solution:
    Removing the preallocated transaction from xfs append ioends to avoid
    the ioend completion batching log reservation deadlock.
    Now we continue to process the append ioend completions via the
    workqueue, but let the wq task allocate the transaction similar to other
    ioend types.

    Backport the four patches from upstream(git://git.kernel.org/pub/scm/
    linux/kernel/git/torvalds/linux.git) for debian-based StarlingX.
    Only the 0034-xfs-use-current-journal_info-for-detecting-transacti.patch
    for centos-based StarlingX is from stable tree(git://git.kernel.org/pub/
    scm/linux/kernel/git/stable/linux.git linux-5.10.y branch), because the
    kernel has been upgraded to v5.10.152 for debian-based StarlingX which
    includes this fix, so we just apply it for the centos-based one.

    TestPlan:
    Pass: Execute bonnie++ test for xfs filesystem successfully without
          kernel panic and any xfs anomalies in the kernel logs.
          $mkfs.x /dev/sdc1
          $mount /dev/sdc1 ~/xfstests
          $sudo bonnie++ -u root:root -d ~/xfstests
    Debian:
    Pass: build-pkgs -c -a
    Pass: build-image
    Pass: boot successfully with std/rt.
    CentOS:
    Pass: build-pkgs
    Pass: build-iso
    Pass: boot successfully with std/rt.

    Closes-Bug: 1996269

    Signed-off-by: Zhixiong Chi <email address hidden>
    Change-Id: I1e5b85111b2b54cd249c116724b952042f9d781f