Comment 0 for bug 1527062

Revision history for this message
Dave Chiluk (chiluk) wrote :

[Impact]

 * An XFS Deadlock situation is possible on 4.2..4.4rc1^ and newer kernels.

 * Hung tasks have stack traces similar to
[ 4559.110607] INFO: task kworker/1:0:17 blocked for more than 120 seconds.
[ 4559.143010] Not tainted 4.2.0-18-generic #22~14.04.1-Ubuntu
[ 4559.171972] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4559.209753] kworker/1:0 D 0000000000000000 0 17 2 0x00000000
[ 4559.209791] Workqueue: xfs-cil/sdac1 xlog_cil_push_work [xfs]
[ 4559.209794] ffff88085be9fbb8 0000000000000046 ffff88085b746040 ffff88085be8a940
[ 4559.209795] 0000000000000000 ffff88085bea0000 ffff880107fddcc0 ffff88085be8a940
[ 4559.209797] ffff880859119c00 ffff880859119d00 ffff88085be9fbd8 ffffffff817b6a77
[ 4559.209798] Call Trace:
[ 4559.209806] [<ffffffff817b6a77>] schedule+0x37/0x80
[ 4559.209817] [<ffffffffc03c105b>] xlog_state_get_iclog_space+0xdb/0x2d0 [xfs]
[ 4559.209822] [<ffffffff810a06c0>] ? wake_up_q+0x80/0x80
[ 4559.209832] [<ffffffffc03c1501>] xlog_write+0x191/0x6a0 [xfs]
[ 4559.209835] [<ffffffff813b4478>] ? prandom_u32+0x18/0x20
[ 4559.209845] [<ffffffffc03c2e49>] xlog_cil_push+0x1f9/0x3b0 [xfs]
[ 4559.209854] [<ffffffffc03c3015>] xlog_cil_push_work+0x15/0x20 [xfs]
[ 4559.209857] [<ffffffff8108f4ce>] process_one_work+0x14e/0x3d0
[ 4559.209858] [<ffffffff8108fb7a>] worker_thread+0x11a/0x470
[ 4559.209860] [<ffffffff8108fa60>] ? rescuer_thread+0x310/0x310
[ 4559.209862] [<ffffffff81095112>] kthread+0xd2/0xf0
[ 4559.209863] [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0
[ 4559.209865] [<ffffffff817ba71f>] ret_from_fork+0x3f/0x70
[ 4559.209866] [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0

or

[305651.804853] INFO: task kswapd0:194 blocked for more than 120 seconds.
[305651.836092] Not tainted 4.2.0-18-generic #22~14.04.1-Ubuntu
[305651.865655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[305651.903596] kswapd0 D ffff88085fa96640 0 194 2 0x00000000
[305651.903614] ffff8810591ab858 0000000000000046 ffff88085c2c2940 ffff88105b19a940
[305651.903616] ffff880066c64548 ffff8810591ac000 ffff8808599cae18 0000000000000000
[305651.903618] ffff88105b19a940 ffff88085a2cb000 ffff8810591ab878 ffffffff817b6a77
[305651.903620] Call Trace:
[305651.903629] [<ffffffff817b6a77>] schedule+0x37/0x80
[305651.903655] [<ffffffffc0402f6c>] _xfs_log_force_lsn+0x15c/0x2d0 [xfs]
[305651.903662] [<ffffffff810a06c0>] ? wake_up_q+0x80/0x80
[305651.903675] [<ffffffffc040310e>] xfs_log_force_lsn+0x2e/0x80 [xfs]
[305651.903687] [<ffffffffc03f5ff9>] ? xfs_iunpin_wait+0x19/0x20 [xfs]
[305651.903698] [<ffffffffc03f2a3d>] __xfs_iunpin_wait+0x8d/0x120 [xfs]
[305651.903701] [<ffffffff810b7380>] ? autoremove_wake_function+0x40/0x40
[305651.903711] [<ffffffffc03f5ff9>] xfs_iunpin_wait+0x19/0x20 [xfs]
[305651.903721] [<ffffffffc03eb665>] xfs_reclaim_inode+0x125/0x330 [xfs]
[305651.903732] [<ffffffffc03ebab8>] xfs_reclaim_inodes_ag+0x248/0x360 [xfs]
[305651.903735] [<ffffffff8120511c>] ? destroy_inode+0x3c/0x60
[305651.903744] [<ffffffffc03ec573>] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
[305651.903755] [<ffffffffc03fa5e9>] xfs_fs_free_cached_objects+0x19/0x20 [xfs]
[305651.903758] [<ffffffff811ee2a1>] super_cache_scan+0x181/0x190
[305651.903761] [<ffffffff811870e6>] shrink_slab+0x206/0x380
[305651.903763] [<ffffffff8118b7a1>] shrink_zone+0x291/0x2b0
[305651.903764] [<ffffffff8118c710>] kswapd+0x500/0x9b0
[305651.903766] [<ffffffff8118c210>] ? mem_cgroup_shrink_node_zone+0x130/0x130
[305651.903768] [<ffffffff81095112>] kthread+0xd2/0xf0
[305651.903770] [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0
[305651.903772] [<ffffffff817ba71f>] ret_from_fork+0x3f/0x70
[305651.903774] [<ffffffff81095040>] ? kthread_create_on_node+0x1c0/0x1c0

[Test Case]

 * Large numbers of IO tasks to large numbers of XFS fileystems while under memory pressure. Testcase may not be guaranteed.

[Regression Potential]

 * Upstream commit
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7a29ac474a47eb8cf212b45917683ae89d6fa13b
 - This commit allocates rescuer threads for each of the XFS work queues.

 * Possible additional memory usage from rescuer threads.

[Other Info]

 * Anything else you think is useful to include
 * Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board
 * and address these questions in advance