XFS Deadlock on 4.2+

Bug #1527062 reported by Dave Chiluk on 2015-12-17
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Dave Chiluk
Trusty
Undecided
Unassigned
Vivid
Undecided
Unassigned
Wily
Undecided
Unassigned
linux-lts-utopic (Ubuntu)
Undecided
Unassigned
Trusty
Undecided
Unassigned
Vivid
Undecided
Unassigned
Wily
Undecided
Unassigned

Bug Description

[Impact]

 * An XFS Deadlock situation is possible on kernels older than 4.4rc1^

 * Hung tasks have stack traces similar to
[ 4559.110607] INFO: task kworker/1:0:17 blocked for more than 120 seconds.
[ 4559.143010] Not tainted 4.2.0-18-generic #22~14.04.1-Ubuntu
[ 4559.171972] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4559.209753] kworker/1:0 D 0000000000000000 0 17 2 0x00000000
[ 4559.209791] Workqueue: xfs-cil/sdac1 xlog_cil_push_work [xfs]
[ 4559.209794] ffff88085be9fbb8 0000000000000046 ffff88085b746040 ffff88085be8a940
[ 4559.209795] 0000000000000000 ffff88085bea0000 ffff880107fddcc0 ffff88085be8a940
[ 4559.209797] ffff880859119c00 ffff880859119d00 ffff88085be9fbd8 ffffffff817b6a77
[ 4559.209798] Call Trace:
[ 4559.209806] [<ffffffff817b6a77>] schedule+0x37/0x80
[ 4559.209817] [<ffffffffc03c105b>] xlog_state_get_iclog_space+0xdb/0x2d0 [xfs]
[ 4559.209822] [<ffffffff810a06c0>] ? wake_up_q+0x80/0x80
[ 4559.209832] [<ffffffffc03c1501>] xlog_write+0x191/0x6a0 [xfs]
[ 4559.209835] [<ffffffff813b4478>] ? prandom_u32+0x18/0x20
[ 4559.209845] [<ffffffffc03c2e49>] xlog_cil_push+0x1f9/0x3b0 [xfs]
[ 4559.209854] [<ffffffffc03c3015>] xlog_cil_push_work+0x15/0x20 [xfs]
[ 4559.209857] [<ffffffff8108f4ce>] process_one_work+0x14e/0x3d0
[ 4559.209858] [<ffffffff8108fb7a>] worker_thread+0x11a/0x470
[ 4559.209860] [<ffffffff8108fa60>] ? rescuer_thread+0x310/0x310
[ 4559.209862] [<ffffffff81095112>] kthread+0xd2/0xf0
[ 4559.209863] [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0
[ 4559.209865] [<ffffffff817ba71f>] ret_from_fork+0x3f/0x70
[ 4559.209866] [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0

or

[305651.804853] INFO: task kswapd0:194 blocked for more than 120 seconds.
[305651.836092] Not tainted 4.2.0-18-generic #22~14.04.1-Ubuntu
[305651.865655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[305651.903596] kswapd0 D ffff88085fa96640 0 194 2 0x00000000
[305651.903614] ffff8810591ab858 0000000000000046 ffff88085c2c2940 ffff88105b19a940
[305651.903616] ffff880066c64548 ffff8810591ac000 ffff8808599cae18 0000000000000000
[305651.903618] ffff88105b19a940 ffff88085a2cb000 ffff8810591ab878 ffffffff817b6a77
[305651.903620] Call Trace:
[305651.903629] [<ffffffff817b6a77>] schedule+0x37/0x80
[305651.903655] [<ffffffffc0402f6c>] _xfs_log_force_lsn+0x15c/0x2d0 [xfs]
[305651.903662] [<ffffffff810a06c0>] ? wake_up_q+0x80/0x80
[305651.903675] [<ffffffffc040310e>] xfs_log_force_lsn+0x2e/0x80 [xfs]
[305651.903687] [<ffffffffc03f5ff9>] ? xfs_iunpin_wait+0x19/0x20 [xfs]
[305651.903698] [<ffffffffc03f2a3d>] __xfs_iunpin_wait+0x8d/0x120 [xfs]
[305651.903701] [<ffffffff810b7380>] ? autoremove_wake_function+0x40/0x40
[305651.903711] [<ffffffffc03f5ff9>] xfs_iunpin_wait+0x19/0x20 [xfs]
[305651.903721] [<ffffffffc03eb665>] xfs_reclaim_inode+0x125/0x330 [xfs]
[305651.903732] [<ffffffffc03ebab8>] xfs_reclaim_inodes_ag+0x248/0x360 [xfs]
[305651.903735] [<ffffffff8120511c>] ? destroy_inode+0x3c/0x60
[305651.903744] [<ffffffffc03ec573>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
[305651.903755] [<ffffffffc03fa5e9>] xfs_fs_free_cached_objects+0x19/0x20 [xfs]
[305651.903758] [<ffffffff811ee2a1>] super_cache_scan+0x181/0x190
[305651.903761] [<ffffffff811870e6>] shrink_slab+0x206/0x380
[305651.903763] [<ffffffff8118b7a1>] shrink_zone+0x291/0x2b0
[305651.903764] [<ffffffff8118c710>] kswapd+0x500/0x9b0
[305651.903766] [<ffffffff8118c210>] ? mem_cgroup_shrink_node_zone+0x130/0x130
[305651.903768] [<ffffffff81095112>] kthread+0xd2/0xf0
[305651.903770] [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0
[305651.903772] [<ffffffff817ba71f>] ret_from_fork+0x3f/0x70
[305651.903774] [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0

[Test Case]

 * Large numbers of IO tasks to large numbers of XFS fileystems while under memory pressure. Testcase may not be guaranteed.

[Regression Potential]

 * Upstream commit
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7a29ac474a47eb8cf212b45917683ae89d6fa13b
 - This commit allocates rescuer threads for each of the XFS work queues.

 * Possible additional memory usage from rescuer threads.

[Other Info]

Dave Chiluk (chiluk) on 2015-12-17
Changed in linux-lts-wily (Ubuntu):
assignee: nobody → Dave Chiluk (chiluk)
importance: Undecided → High
status: New → Confirmed
Changed in linux (Ubuntu):
status: Triaged → Won't Fix
status: Won't Fix → In Progress
Changed in linux-lts-wily (Ubuntu):
status: Confirmed → In Progress
Dave Chiluk (chiluk) wrote :

User reported hotfixed kernel resolved the deadlock issue. Looking at how many kernel versions are affected now.

Dave Chiluk (chiluk) on 2016-01-07
description: updated
Dave Chiluk (chiluk) wrote :

Patch submitted on kernel-team list. Trusty-Wily all appear affected.

Luis Henriques (henrix) on 2016-01-11
Changed in linux-lts-utopic (Ubuntu Vivid):
status: New → Invalid
Changed in linux-lts-utopic (Ubuntu Wily):
status: New → Invalid
Changed in linux-lts-utopic (Ubuntu):
status: New → Invalid
Changed in linux-lts-wily (Ubuntu):
status: In Progress → Invalid
Changed in linux-lts-wily (Ubuntu Vivid):
status: New → Invalid
Changed in linux-lts-wily (Ubuntu Wily):
status: New → Invalid
Luis Henriques (henrix) on 2016-01-11
Changed in linux (Ubuntu Vivid):
status: New → Fix Committed
no longer affects: linux-lts-wily (Ubuntu)
no longer affects: linux-lts-wily (Ubuntu Trusty)
no longer affects: linux-lts-wily (Ubuntu Wily)
no longer affects: linux-lts-wily (Ubuntu Vivid)
Changed in linux (Ubuntu Wily):
status: New → Fix Committed
Luis Henriques (henrix) on 2016-01-11
Changed in linux (Ubuntu Trusty):
status: New → Fix Committed
Changed in linux-lts-utopic (Ubuntu Trusty):
status: New → Fix Committed
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-vivid' to 'verification-done-vivid'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-wily' to 'verification-done-wily'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty verification-needed-utopic verification-needed-vivid verification-needed-wily
Dave Chiluk (chiluk) wrote :

User reported this was resolved with a test kernel, unfortunately 5 days is not long enough to reproduce this issue. So I'm marking this verification-done so that the fix is not dropped.

tags: added: verification-done-trusty verification-done-utopic verification-done-vivid verification-done-wily
removed: verification-needed-trusty verification-needed-utopic verification-needed-vivid verification-needed-wily
Dave Chiluk (chiluk) on 2016-01-17
description: updated
Andy Whitcroft (apw) wrote :

Fix released in 3.19.0-47.53

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
Andy Whitcroft (apw) wrote :

Fix released in 4.2.0-27.32

Changed in linux (Ubuntu Wily):
status: Fix Committed → Fix Released
Andy Whitcroft (apw) wrote :

Fix released in 3.13.0-77.121

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
Guy Baconniere (lordbaco) wrote :

FYI

I still have the issue on 3.13.0-91-generic (and 3.13.0-88-generic)
on a busy NFS server using XFS. I have this issue since 2015...

INFO: task kworker/1:2:31748 blocked for more than 120 seconds.
      Not tainted 3.13.0-91-generic #138-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/1:2 D ffff88023f433180 0 31748 2 0x00000000
Workqueue: xfs-log/dm-5 xfs_log_worker [xfs]
 ffff88012f31fb70 0000000000000046 ffff880035983000 0000000000013180
 ffff88012f31ffd8 0000000000013180 ffff880035983000 ffff88012f31fcb8
 ffff88012f31fcc0 7fffffffffffffff ffff880035983000 ffff8802317e9528
Call Trace:
 [<ffffffff8172dd89>] schedule+0x29/0x70
 [<ffffffff8172cfd9>] schedule_timeout+0x279/0x310
 [<ffffffff8172e896>] wait_for_completion+0xa6/0x150
 [<ffffffff8109d2f0>] ? wake_up_state+0x20/0x20
 [<ffffffff8108726d>] flush_work+0xed/0x1b0
 [<ffffffff81083490>] ? wake_up_worker+0x30/0x30
 [<ffffffffa01f0faf>] xlog_cil_force_lsn+0x3f/0x170 [xfs]
...

Dave Chiluk (chiluk) wrote :

@lordbaco, are you still hitting this? Also please post your entire oops next time, as I can't do much with what you've posted.

Also you might want to try moving up to the 4.4 kernel by installing linux-image-lts-xenial on trusty *(I think that's the package name).

Marking this as fixed release, unless someone screams.

Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers