XFS Deadlock on 4.2+

Bug #1527062 reported by Dave Chiluk
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Dave Chiluk
Trusty
Fix Released
Undecided
Unassigned
Vivid
Fix Released
Undecided
Unassigned
Wily
Fix Released
Undecided
Unassigned
linux-lts-utopic (Ubuntu)
Invalid
Undecided
Unassigned
Trusty
Fix Released
Undecided
Unassigned
Vivid
Invalid
Undecided
Unassigned
Wily
Invalid
Undecided
Unassigned

Bug Description

[Impact]

 * An XFS Deadlock situation is possible on kernels older than 4.4rc1^

 * Hung tasks have stack traces similar to
[ 4559.110607] INFO: task kworker/1:0:17 blocked for more than 120 seconds.
[ 4559.143010] Not tainted 4.2.0-18-generic #22~14.04.1-Ubuntu
[ 4559.171972] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4559.209753] kworker/1:0 D 0000000000000000 0 17 2 0x00000000
[ 4559.209791] Workqueue: xfs-cil/sdac1 xlog_cil_push_work [xfs]
[ 4559.209794] ffff88085be9fbb8 0000000000000046 ffff88085b746040 ffff88085be8a940
[ 4559.209795] 0000000000000000 ffff88085bea0000 ffff880107fddcc0 ffff88085be8a940
[ 4559.209797] ffff880859119c00 ffff880859119d00 ffff88085be9fbd8 ffffffff817b6a77
[ 4559.209798] Call Trace:
[ 4559.209806] [<ffffffff817b6a77>] schedule+0x37/0x80
[ 4559.209817] [<ffffffffc03c105b>] xlog_state_get_iclog_space+0xdb/0x2d0 [xfs]
[ 4559.209822] [<ffffffff810a06c0>] ? wake_up_q+0x80/0x80
[ 4559.209832] [<ffffffffc03c1501>] xlog_write+0x191/0x6a0 [xfs]
[ 4559.209835] [<ffffffff813b4478>] ? prandom_u32+0x18/0x20
[ 4559.209845] [<ffffffffc03c2e49>] xlog_cil_push+0x1f9/0x3b0 [xfs]
[ 4559.209854] [<ffffffffc03c3015>] xlog_cil_push_work+0x15/0x20 [xfs]
[ 4559.209857] [<ffffffff8108f4ce>] process_one_work+0x14e/0x3d0
[ 4559.209858] [<ffffffff8108fb7a>] worker_thread+0x11a/0x470
[ 4559.209860] [<ffffffff8108fa60>] ? rescuer_thread+0x310/0x310
[ 4559.209862] [<ffffffff81095112>] kthread+0xd2/0xf0
[ 4559.209863] [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0
[ 4559.209865] [<ffffffff817ba71f>] ret_from_fork+0x3f/0x70
[ 4559.209866] [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0

or

[305651.804853] INFO: task kswapd0:194 blocked for more than 120 seconds.
[305651.836092] Not tainted 4.2.0-18-generic #22~14.04.1-Ubuntu
[305651.865655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[305651.903596] kswapd0 D ffff88085fa96640 0 194 2 0x00000000
[305651.903614] ffff8810591ab858 0000000000000046 ffff88085c2c2940 ffff88105b19a940
[305651.903616] ffff880066c64548 ffff8810591ac000 ffff8808599cae18 0000000000000000
[305651.903618] ffff88105b19a940 ffff88085a2cb000 ffff8810591ab878 ffffffff817b6a77
[305651.903620] Call Trace:
[305651.903629] [<ffffffff817b6a77>] schedule+0x37/0x80
[305651.903655] [<ffffffffc0402f6c>] _xfs_log_force_lsn+0x15c/0x2d0 [xfs]
[305651.903662] [<ffffffff810a06c0>] ? wake_up_q+0x80/0x80
[305651.903675] [<ffffffffc040310e>] xfs_log_force_lsn+0x2e/0x80 [xfs]
[305651.903687] [<ffffffffc03f5ff9>] ? xfs_iunpin_wait+0x19/0x20 [xfs]
[305651.903698] [<ffffffffc03f2a3d>] __xfs_iunpin_wait+0x8d/0x120 [xfs]
[305651.903701] [<ffffffff810b7380>] ? autoremove_wake_function+0x40/0x40
[305651.903711] [<ffffffffc03f5ff9>] xfs_iunpin_wait+0x19/0x20 [xfs]
[305651.903721] [<ffffffffc03eb665>] xfs_reclaim_inode+0x125/0x330 [xfs]
[305651.903732] [<ffffffffc03ebab8>] xfs_reclaim_inodes_ag+0x248/0x360 [xfs]
[305651.903735] [<ffffffff8120511c>] ? destroy_inode+0x3c/0x60
[305651.903744] [<ffffffffc03ec573>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
[305651.903755] [<ffffffffc03fa5e9>] xfs_fs_free_cached_objects+0x19/0x20 [xfs]
[305651.903758] [<ffffffff811ee2a1>] super_cache_scan+0x181/0x190
[305651.903761] [<ffffffff811870e6>] shrink_slab+0x206/0x380
[305651.903763] [<ffffffff8118b7a1>] shrink_zone+0x291/0x2b0
[305651.903764] [<ffffffff8118c710>] kswapd+0x500/0x9b0
[305651.903766] [<ffffffff8118c210>] ? mem_cgroup_shrink_node_zone+0x130/0x130
[305651.903768] [<ffffffff81095112>] kthread+0xd2/0xf0
[305651.903770] [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0
[305651.903772] [<ffffffff817ba71f>] ret_from_fork+0x3f/0x70
[305651.903774] [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0

[Test Case]

 * Large numbers of IO tasks to large numbers of XFS fileystems while under memory pressure. Testcase may not be guaranteed.

[Regression Potential]

 * Upstream commit
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7a29ac474a47eb8cf212b45917683ae89d6fa13b
 - This commit allocates rescuer threads for each of the XFS work queues.

 * Possible additional memory usage from rescuer threads.

[Other Info]

Dave Chiluk (chiluk)
Changed in linux-lts-wily (Ubuntu):
assignee: nobody → Dave Chiluk (chiluk)
importance: Undecided → High
status: New → Confirmed
Changed in linux (Ubuntu):
status: Triaged → Won't Fix
status: Won't Fix → In Progress
Changed in linux-lts-wily (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
Dave Chiluk (chiluk) wrote :

User reported hotfixed kernel resolved the deadlock issue. Looking at how many kernel versions are affected now.

Dave Chiluk (chiluk)
description: updated
Revision history for this message
Dave Chiluk (chiluk) wrote :

Patch submitted on kernel-team list. Trusty-Wily all appear affected.

Luis Henriques (henrix)
Changed in linux-lts-utopic (Ubuntu Vivid):
status: New → Invalid
Changed in linux-lts-utopic (Ubuntu Wily):
status: New → Invalid
Changed in linux-lts-utopic (Ubuntu):
status: New → Invalid
Changed in linux-lts-wily (Ubuntu):
status: In Progress → Invalid
Changed in linux-lts-wily (Ubuntu Vivid):
status: New → Invalid
Changed in linux-lts-wily (Ubuntu Wily):
status: New → Invalid
Luis Henriques (henrix)
Changed in linux (Ubuntu Vivid):
status: New → Fix Committed
no longer affects: linux-lts-wily (Ubuntu)
no longer affects: linux-lts-wily (Ubuntu Trusty)
no longer affects: linux-lts-wily (Ubuntu Wily)
no longer affects: linux-lts-wily (Ubuntu Vivid)
Changed in linux (Ubuntu Wily):
status: New → Fix Committed
Luis Henriques (henrix)
Changed in linux (Ubuntu Trusty):
status: New → Fix Committed
Changed in linux-lts-utopic (Ubuntu Trusty):
status: New → Fix Committed
Revision history for this message
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-vivid' to 'verification-done-vivid'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-wily' to 'verification-done-wily'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty verification-needed-utopic verification-needed-vivid verification-needed-wily
Revision history for this message
Dave Chiluk (chiluk) wrote :

User reported this was resolved with a test kernel, unfortunately 5 days is not long enough to reproduce this issue. So I'm marking this verification-done so that the fix is not dropped.

tags: added: verification-done-trusty verification-done-utopic verification-done-vivid verification-done-wily
removed: verification-needed-trusty verification-needed-utopic verification-needed-vivid verification-needed-wily
Dave Chiluk (chiluk)
description: updated
Revision history for this message
Andy Whitcroft (apw) wrote :

Fix released in 3.19.0-47.53

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
Revision history for this message
Andy Whitcroft (apw) wrote :

Fix released in 4.2.0-27.32

Changed in linux (Ubuntu Wily):
status: Fix Committed → Fix Released
Revision history for this message
Andy Whitcroft (apw) wrote :

Fix released in 3.13.0-77.121

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
B. (b-deactivatedaccount-deactivatedaccount) wrote :

FYI

I still have the issue on 3.13.0-91-generic (and 3.13.0-88-generic)
on a busy NFS server using XFS. I have this issue since 2015...

INFO: task kworker/1:2:31748 blocked for more than 120 seconds.
      Not tainted 3.13.0-91-generic #138-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/1:2 D ffff88023f433180 0 31748 2 0x00000000
Workqueue: xfs-log/dm-5 xfs_log_worker [xfs]
 ffff88012f31fb70 0000000000000046 ffff880035983000 0000000000013180
 ffff88012f31ffd8 0000000000013180 ffff880035983000 ffff88012f31fcb8
 ffff88012f31fcc0 7fffffffffffffff ffff880035983000 ffff8802317e9528
Call Trace:
 [<ffffffff8172dd89>] schedule+0x29/0x70
 [<ffffffff8172cfd9>] schedule_timeout+0x279/0x310
 [<ffffffff8172e896>] wait_for_completion+0xa6/0x150
 [<ffffffff8109d2f0>] ? wake_up_state+0x20/0x20
 [<ffffffff8108726d>] flush_work+0xed/0x1b0
 [<ffffffff81083490>] ? wake_up_worker+0x30/0x30
 [<ffffffffa01f0faf>] xlog_cil_force_lsn+0x3f/0x170 [xfs]
...

Revision history for this message
Dave Chiluk (chiluk) wrote :

@lordbaco, are you still hitting this? Also please post your entire oops next time, as I can't do much with what you've posted.

Also you might want to try moving up to the 4.4 kernel by installing linux-image-lts-xenial on trusty *(I think that's the package name).

Marking this as fixed release, unless someone screams.

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Changed in linux-lts-utopic (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.