kernel: INFO task xfs-conv/dm-4: blocked locked for more than 122 seconds

Bug #1996269 reported by Zhixiong Chi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Zhixiong Chi

Bug Description

Brief Description
-----------------

 kernel: err [5636141.631454] INFO: task xfs-conv/dm-4:1788 blocked for more than 122 seconds.

 kernel: info [267022.728862] Workqueue: xfs-conv/dm-4 xfs_end_io [xfs]
 kernel: info [267022.728864] Call Trace:
 kernel: info [267022.728870] __schedule+0x340/0x810
 kernel: info [267022.728876] schedule+0x51/0xc0
 kernel: info [267022.728913] xlog_grant_head_wait+0xc7/0x200 [xfs]
 kernel: info [267022.728950] xlog_grant_head_check+0xd0/0x110 [xfs]
 kernel: info [267022.728985] xfs_log_reserve+0xc3/0x1e0 [xfs]
 kernel: info [267022.729023] xfs_trans_reserve+0x156/0x1b0 [xfs]
 kernel: info [267022.729184] xfs_trans_alloc+0xc6/0x190 [xfs]
 kernel: info [267022.729317] xfs_iomap_write_unwritten+0xaa/0x2c0 [xfs]
 kernel: info [267022.729333] ? stop_one_cpu+0x71/0xa0
 kernel: info [267022.729347] ? set_cpus_allowed_ptr+0x10/0x10
 kernel: info [267022.729396] xfs_end_ioend+0xc4/0x100 [xfs]
 kernel: info [267022.729444] ? xfs_setfilesize_ioend+0x60/0x60 [xfs]
 kernel: info [267022.729491] xfs_end_io+0xb9/0xe0 [xfs]
 kernel: info [267022.729505] process_one_work+0x1a1/0x370
 kernel: info [267022.729516] rescuer_thread+0x207/0x350
 kernel: info [267022.729528] ? worker_thread+0x370/0x370
 kernel: info [267022.729537] kthread+0x12e/0x150
 kernel: info [267022.729548] ? __kthread_cancel_work+0x40/0x40
 kernel: info [267022.729559] ret_from_fork+0x1f/0x30

Severity
--------
Minor

Steps to Reproduce
------------------
Not 100% reproduce

Expected Behavior
------------------
Work well.

Actual Behavior
----------------
Attempt to login via ssh and ssh was stuck. After Ctrl+C, it entered shell and the prompt displayed '-sh-4.2$'. 'system unlock controller-0' failed.

Reproducibility
---------------
seen once in the customer env.

System Configuration
--------------------
One node system

Last Pass
---------
N/A

Timestamp/Logs
--------------
N/A

Test Activity
-------------
N/A

Workaround
----------
N/A

Changed in starlingx:
assignee: nobody → Zhixiong Chi (zhixiongchi)
status: New → In Progress
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kernel (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/kernel/+/864257

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kernel (master)
Download full text (3.6 KiB)

Reviewed: https://review.opendev.org/c/starlingx/kernel/+/864257
Committed: https://opendev.org/starlingx/kernel/commit/5ab365dd41cfec3987fda853fbcee18be5d7b081
Submitter: "Zuul (22348)"
Branch: master

commit 5ab365dd41cfec3987fda853fbcee18be5d7b081
Author: Zhixiong Chi <email address hidden>
Date: Thu Nov 10 05:38:05 2022 -0800

    xfs: fix ioend batching log reservation deadlock

    Problem:
    We received a report of a workload that causes an xfs task to be blocked
    for more than 120 seconds on log reservation via iomap_ioend completion
    batching.

     kernel: err [5636141.631454] INFO: task xfs-conv/dm-4:1788 blocked for
                                        more than 122 seconds.

     kernel: info [267022.728862] Workqueue: xfs-conv/dm-4 xfs_end_io [xfs]
     kernel: info [267022.728864] Call Trace:
     kernel: info [267022.728870] __schedule+0x340/0x810
     kernel: info [267022.728876] schedule+0x51/0xc0
     kernel: info [267022.728913] xlog_grant_head_wait+0xc7/0x200 [xfs]
     kernel: info [267022.728950] xlog_grant_head_check+0xd0/0x110 [xfs]
     kernel: info [267022.728985] xfs_log_reserve+0xc3/0x1e0 [xfs]
     kernel: info [267022.729023] xfs_trans_reserve+0x156/0x1b0 [xfs]
     kernel: info [267022.729184] xfs_trans_alloc+0xc6/0x190 [xfs]
     kernel: info [267022.729317] xfs_iomap_write_unwritten+0xaa/0x2c0 [xfs]
     kernel: info [267022.729333] ? stop_one_cpu+0x71/0xa0
     kernel: info [267022.729347] ? set_cpus_allowed_ptr+0x10/0x10
     kernel: info [267022.729396] xfs_end_ioend+0xc4/0x100 [xfs]
     kernel: info [267022.729444] ? xfs_setfilesize_ioend+0x60/0x60 [xfs]
     kernel: info [267022.729491] xfs_end_io+0xb9/0xe0 [xfs]
     kernel: info [267022.729505] process_one_work+0x1a1/0x370
     kernel: info [267022.729516] rescuer_thread+0x207/0x350
     kernel: info [267022.729528] ? worker_thread+0x370/0x370
     kernel: info [267022.729537] kthread+0x12e/0x150
     kernel: info [267022.729548] ? __kthread_cancel_work+0x40/0x40
     kernel: info [267022.729559] ret_from_fork+0x1f/0x30

    After that, the connection via ssh to the controller is stuck,
    Press Ctrl+C, it entered shell and the prompt displayed '-sh-4.2$'

    Solution:
    Removing the preallocated transaction from xfs append ioends to avoid
    the ioend completion batching log reservation deadlock.
    Now we continue to process the append ioend completions via the
    workqueue, but let the wq task allocate the transaction similar to other
    ioend types.

    Backport the four patches from upstream(git://git.kernel.org/pub/scm/
    linux/kernel/git/torvalds/linux.git) for debian-based StarlingX.
    Only the 0034-xfs-use-current-journal_info-for-detecting-transacti.patch
    for centos-based StarlingX is from stable tree(git://git.kernel.org/pub/
    scm/linux/kernel/git/stable/linux.git linux-5.10.y branch), because the
    kernel has been upgraded to v5.10.152 for debian-based StarlingX which
    includes this fix, so we just apply it for the centos-based one.

    TestPlan:
    Pass: Execute bonnie++ test for xfs filesystem successfully without
          kernel panic and any xfs anomalies in...

Read more...

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
tags: added: stx.distro.other
removed: stx.debian
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.