ext4 journaling and swapping to same encrypted SSHD hangs system

Bug #1589563 reported by Pauli
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
High
Unassigned

Bug Description

Short description:
The system has single SSHD disk with luks/lvm2 encrypted root file system and swap partition. If system starts to swap ext4 journal commit and swap requests hang waiting disk io.

Relevant bug reports in other places:
So far I have managed to find only one relevant looking report about very similar problem
https://bbs.archlinux.org/viewtopic.php?id=193034

Hardware:
Acer Aspire E15 E5-571G59EQ
Intel Broadwell i5-5200U
4GB ram
nvidia GeForce 840M (binary driver installed but disabled)
Seagate ST500LM000-1EJ16 500GB HDD + 8GB SSHD

Known ways to reproduce:
Compiling templated c++ code using g++ 5 that results to g++ optimization taking huge amount of memory.
I haven't yet tested if there is any other

Reproduction details:
$ cat main.cpp
#include <iostream>

template<unsigned idx>
unsigned loop()
{
 if (idx == 1)
  return 6;

 unsigned count = 0;
 for (unsigned i = 1; i < 7; i++) {
  count += loop< (idx > 1 ? idx - 1 : 1) >();
 }
 return count;
}

int main(void)
{
 unsigned count, dice[5];

 std::cout << "6^5: " << 6*6*6*6*6 << "\n";

 std::cout << "loop all: " << loop<5>() << "\n";
 std::cout << "loop all two players: " << loop<10>() << "\n";
 return 0;
}

$ gcc --version
gcc (Ubuntu 5.3.1-14ubuntu2.1) 5.3.1 20160413

$ g++ -O2 -g -c -o main.o main.cpp

Relevant backtraces from sysrq-t (recorded using netconsole):

1. Watchdog catches jdb2 hang

[ 2784.570413] INFO: task jbd2/dm-1-8:372 blocked for more than 10 seconds.
[ 2784.571209] Tainted: P OE 4.4.0-22-generic #40-Ubuntu
[ 2784.571948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2784.572606] jbd2/dm-1-8 D ffff880035577ad8 0 372 2 0x00000000
[ 2784.573294] ffff880035577ad8 ffff880035577b00 ffff880159472940 ffff8801561e0dc0
[ 2784.573997] ffff880035578000 ffff88015ecd6d00 7fffffffffffffff ffffffff81821a00
[ 2784.574724] ffff880035577c30 ffff880035577af0 ffffffff81821205 0000000000000000
[ 2784.575445] Call Trace:
[ 2784.576124] [<ffffffff81821a00>] ? bit_wait+0x60/0x60
[ 2784.576835] [<ffffffff81821205>] schedule+0x35/0x80
[ 2784.577546] [<ffffffff81824325>] schedule_timeout+0x1b5/0x270
[ 2784.578255] [<ffffffff813b5cf3>] ? __blk_run_queue+0x33/0x40
[ 2784.578959] [<ffffffff813bc3d0>] ? blk_queue_bio+0x3d0/0x3e0
[ 2784.579635] [<ffffffff810f574c>] ? ktime_get+0x3c/0xb0
[ 2784.580328] [<ffffffff81821a00>] ? bit_wait+0x60/0x60
[ 2784.581047] [<ffffffff81820754>] io_schedule_timeout+0xa4/0x110
[ 2784.581767] [<ffffffff81821a1b>] bit_wait_io+0x1b/0x70
[ 2784.582509] [<ffffffff818215ad>] __wait_on_bit+0x5d/0x90
[ 2784.583215] [<ffffffff81821a00>] ? bit_wait+0x60/0x60
[ 2784.583951] [<ffffffff81821662>] out_of_line_wait_on_bit+0x82/0xb0
[ 2784.584687] [<ffffffff810c3ab0>] ? autoremove_wake_function+0x40/0x40
[ 2784.585434] [<ffffffff81243c32>] __wait_on_buffer+0x32/0x40
[ 2784.586174] [<ffffffff812eaf6f>] jbd2_journal_commit_transaction+0x10cf/0x1870
[ 2784.586906] [<ffffffff810ec5fe>] ? try_to_del_timer_sync+0x5e/0x90
[ 2784.587622] [<ffffffff812ef32a>] kjournald2+0xca/0x250
[ 2784.588339] [<ffffffff810c3a70>] ? wake_atomic_t_function+0x60/0x60
[ 2784.589058] [<ffffffff812ef260>] ? commit_timeout+0x10/0x10
[ 2784.589769] [<ffffffff810a0588>] kthread+0xd8/0xf0
[ 2784.590503] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2784.591215] [<ffffffff8182568f>] ret_from_fork+0x3f/0x70
[ 2784.591930] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0

2. kswaped looks like hanged too

[ 2817.029410] kswapd0 D ffff8800934ab608 0 43 2 0x00000000
[ 2817.030212] ffff8800934ab608 ffffffff811eaac6 ffff880093b6b700 ffff8800934a0000
[ 2817.031032] ffff8800934ac000 ffff8800934ab640 ffff88015eccdd00 ffff88015eccdd00
[ 2817.031860] ffff8800934a0000 ffff8800934ab620 ffffffff81821205 0000000100099a03
[ 2817.032670] Call Trace:
[ 2817.033473] [<ffffffff811eaac6>] ? ___slab_alloc+0x2a6/0x460
[ 2817.034282] [<ffffffff81821205>] schedule+0x35/0x80
[ 2817.035082] [<ffffffff81824299>] schedule_timeout+0x129/0x270
[ 2817.035898] [<ffffffff810ec370>] ? trace_event_raw_event_tick_stop+0x120/0x120
[ 2817.036709] [<ffffffff810f574c>] ? ktime_get+0x3c/0xb0
[ 2817.037515] [<ffffffff81820754>] io_schedule_timeout+0xa4/0x110
[ 2817.038327] [<ffffffff811900c8>] mempool_alloc+0x148/0x170
[ 2817.039137] [<ffffffff810c3a70>] ? wake_atomic_t_function+0x60/0x60
[ 2817.039964] [<ffffffff813b20ad>] bio_alloc_bioset+0xbd/0x260
[ 2817.040774] [<ffffffff816a33cf>] __split_and_process_bio+0x1ef/0x3e0
[ 2817.041576] [<ffffffff816a3629>] dm_make_request+0x69/0xc0
[ 2817.042377] [<ffffffff813ba502>] generic_make_request+0xf2/0x1d0
[ 2817.043180] [<ffffffff813ba656>] submit_bio+0x76/0x170
[ 2817.043999] [<ffffffff811d2548>] __swap_writepage+0x228/0x270
[ 2817.044799] [<ffffffff811d7bcf>] ? __frontswap_store+0xdf/0x110
[ 2817.045589] [<ffffffff811d25c9>] swap_writepage+0x39/0x70
[ 2817.046386] [<ffffffff811a0b6d>] pageout.isra.44+0x16d/0x280
[ 2817.047179] [<ffffffff811a31ca>] shrink_page_list+0x3ca/0x7a0
[ 2817.047990] [<ffffffff811a3c39>] shrink_inactive_list+0x209/0x520
[ 2817.048785] [<ffffffff811a48d3>] shrink_lruvec+0x593/0x750
[ 2817.049582] [<ffffffff81098239>] ? __queue_work+0x139/0x3b0
[ 2817.050373] [<ffffffff811a4b7f>] shrink_zone+0xef/0x2e0
[ 2817.051159] [<ffffffff811a5dfe>] kswapd+0x51e/0x990
[ 2817.051952] [<ffffffff811a58e0>] ? mem_cgroup_shrink_node_zone+0x1c0/0x1c0
[ 2817.052731] [<ffffffff810a0588>] kthread+0xd8/0xf0
[ 2817.053512] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2817.054299] [<ffffffff8182568f>] ret_from_fork+0x3f/0x70
[ 2817.055083] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0

3. encyptfs-thread and bioset looks like sleeping without work

[ 2817.076543] ecryptfs-kthrea S ffff880093553e40 0 46 2 0x00000000
[ 2817.077347] ffff880093553e40 00000000934a29a0 ffff880159472940 ffff8800934a2940
[ 2817.078160] ffff880093554000 ffff880093553e80 ffff8800934a2940 0000000000000000
[ 2817.078980] 0000000000000000 ffff880093553e58 ffffffff81821205 0000000000000000
[ 2817.079795] Call Trace:
[ 2817.080596] [<ffffffff81821205>] schedule+0x35/0x80
[ 2817.081407] [<ffffffff8130b02a>] ecryptfs_threadfn+0x17a/0x1c0
[ 2817.082212] [<ffffffff810c3a70>] ? wake_atomic_t_function+0x60/0x60
[ 2817.083017] [<ffffffff8130aeb0>] ? ecryptfs_add_global_auth_tok+0xa0/0xa0
[ 2817.083829] [<ffffffff810a0588>] kthread+0xd8/0xf0
[ 2817.084628] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2817.085432] [<ffffffff8182568f>] ret_from_fork+0x3f/0x70
[ 2817.086227] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0

4. scsi threads seem to be sleeping (for all cores same)

[ 2817.390470] scsi_eh_0 S ffff8800357c7e10 0 152 2 0x00000000
[ 2817.391287] ffff8800357c7e10 ffff8800357c7e08 ffff880159472940 ffff8800937a5280
[ 2817.392107] ffff8800357c8000 ffff8800937a5858 ffff8800937a5858 ffff8800937a5280
[ 2817.392917] 0000000000000000 ffff8800357c7e28 ffffffff81821205 ffff8800356c9800
[ 2817.393730] Call Trace:
[ 2817.394537] [<ffffffff81821205>] schedule+0x35/0x80
[ 2817.395339] [<ffffffff815ad2f7>] scsi_error_handler+0x97/0x8a0
[ 2817.396142] [<ffffffff81820b46>] ? __schedule+0x386/0xa10
[ 2817.396946] [<ffffffff815ad260>] ? scsi_eh_get_sense+0x250/0x250
[ 2817.397748] [<ffffffff810a0588>] kthread+0xd8/0xf0
[ 2817.398553] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2817.399363] [<ffffffff8182568f>] ret_from_fork+0x3f/0x70
[ 2817.400170] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2817.400977] scsi_tmf_0 S ffff8800357cbe38 0 153 2 0x00000000
[ 2817.401800] ffff8800357cbe38 ffff8800357cbe40 ffff880159470dc0 ffff8800937a44c0
[ 2817.402624] ffff8800357cc000 ffff8800937a44c0 ffffffff8109a830 ffff8800937e9a80
[ 2817.403439] ffff8800356e0000 ffff8800357cbe50 ffffffff81821205 ffff8800356e0030
[ 2817.404257] Call Trace:
[ 2817.405064] [<ffffffff8109a830>] ? worker_thread+0x4c0/0x4c0
[ 2817.405870] [<ffffffff81821205>] schedule+0x35/0x80
[ 2817.406673] [<ffffffff8109ab75>] rescuer_thread+0x345/0x3d0
[ 2817.407475] [<ffffffff8109a830>] ? worker_thread+0x4c0/0x4c0
[ 2817.408279] [<ffffffff810a0588>] kthread+0xd8/0xf0
[ 2817.409083] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2817.409889] [<ffffffff8182568f>] ret_from_fork+0x3f/0x70
[ 2817.410699] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0

5. kworker running dm_crypt work seem to be waiting for something in lru shrink

[ 2817.474917] kworker/u16:3 R running task 0 160 2 0x00000000
[ 2817.475748] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[ 2817.476566] ffff880035007538 ffffffff811cc3a6 ffff88015723b700 ffff8800354044c0
[ 2817.477396] ffff880035008000 ffff880035007570 ffff88015ec0dd00 ffff88015ec0dd00
[ 2817.478218] ffffffffffffffe0 ffff880035007550 ffffffff81821205 0000000100099990
[ 2817.479052] Call Trace:
[ 2817.479866] [<ffffffff811cc3a6>] ? page_referenced+0xb6/0x140
[ 2817.480694] [<ffffffff81821205>] schedule+0x35/0x80
[ 2817.481520] [<ffffffff81824299>] schedule_timeout+0x129/0x270
[ 2817.482337] [<ffffffff810ec370>] ? trace_event_raw_event_tick_stop+0x120/0x120
[ 2817.483173] [<ffffffff8182443e>] schedule_timeout_uninterruptible+0x1e/0x20
[ 2817.484009] [<ffffffff811aea13>] wait_iff_congested+0xf3/0x180
[ 2817.484847] [<ffffffff810c3a70>] ? wake_atomic_t_function+0x60/0x60
[ 2817.485682] [<ffffffff811a3f2f>] shrink_inactive_list+0x4ff/0x520
[ 2817.486513] [<ffffffff811a48d3>] shrink_lruvec+0x593/0x750
[ 2817.487332] [<ffffffff811a4b7f>] shrink_zone+0xef/0x2e0
[ 2817.488151] [<ffffffff811a4edc>] do_try_to_free_pages+0x16c/0x3f0
[ 2817.488978] [<ffffffff811a522e>] try_to_free_pages+0xce/0x180
[ 2817.489795] [<ffffffff81196eda>] __alloc_pages_nodemask+0x64a/0xb60
[ 2817.490608] [<ffffffff811ade4c>] ? zone_statistics+0x7c/0xa0
[ 2817.491423] [<ffffffff811e091c>] alloc_pages_current+0x8c/0x110
[ 2817.492237] [<ffffffff811e99ff>] new_slab+0x28f/0x490
[ 2817.493050] [<ffffffff811eaa4b>] ___slab_alloc+0x22b/0x460
[ 2817.493864] [<ffffffff8118ff05>] ? mempool_alloc_slab+0x15/0x20
[ 2817.494663] [<ffffffff8118ff05>] ? mempool_alloc_slab+0x15/0x20
[ 2817.495466] [<ffffffff811eaca0>] __slab_alloc+0x20/0x40
[ 2817.496264] [<ffffffff811eb21f>] kmem_cache_alloc+0x19f/0x1f0
[ 2817.497065] [<ffffffff8118ff05>] ? mempool_alloc_slab+0x15/0x20
[ 2817.497858] [<ffffffff8118ff05>] mempool_alloc_slab+0x15/0x20
[ 2817.498644] [<ffffffff8118ffee>] mempool_alloc+0x6e/0x170
[ 2817.499429] [<ffffffffc0162092>] ? __ablk_encrypt+0x52/0x70 [ablk_helper]
[ 2817.500218] [<ffffffff813b20ad>] bio_alloc_bioset+0xbd/0x260
[ 2817.501003] [<ffffffffc0112af4>] kcryptd_crypt+0x104/0x380 [dm_crypt]
[ 2817.501787] [<ffffffff8109a052>] process_one_work+0x162/0x480
[ 2817.502572] [<ffffffff8109a3bb>] worker_thread+0x4b/0x4c0
[ 2817.503356] [<ffffffff8109a370>] ? process_one_work+0x480/0x480
[ 2817.504135] [<ffffffff810a0588>] kthread+0xd8/0xf0
[ 2817.504919] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2817.505704] [<ffffffff8182568f>] ret_from_fork+0x3f/0x70
[ 2817.506490] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0

[ 2818.290300] kworker/u16:1 D ffff8801561fb538 0 3339 2 0x00000000
[ 2818.291008] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[ 2818.291734] ffff8801561fb538 ffffffff811cc3a6 ffff880159472940 ffff880093b82940
[ 2818.292466] ffff8801561fc000 ffff8801561fb570 ffff88015eccdd00 ffff88015eccdd00
[ 2818.293201] ffffffffffffffe0 ffff8801561fb550 ffffffff81821205 0000000100099b27
[ 2818.293938] Call Trace:
[ 2818.294658] [<ffffffff811cc3a6>] ? page_referenced+0xb6/0x140
[ 2818.295400] [<ffffffff81821205>] schedule+0x35/0x80
[ 2818.296162] [<ffffffff81824299>] schedule_timeout+0x129/0x270
[ 2818.296924] [<ffffffff810ec370>] ? trace_event_raw_event_tick_stop+0x120/0x120
[ 2818.297700] [<ffffffff8182443e>] schedule_timeout_uninterruptible+0x1e/0x20
[ 2818.298479] [<ffffffff811aea13>] wait_iff_congested+0xf3/0x180
[ 2818.299264] [<ffffffff810c3a70>] ? wake_atomic_t_function+0x60/0x60
[ 2818.300063] [<ffffffff811a3f2f>] shrink_inactive_list+0x4ff/0x520
[ 2818.300853] [<ffffffff811a48d3>] shrink_lruvec+0x593/0x750
[ 2818.301645] [<ffffffff811a4b7f>] shrink_zone+0xef/0x2e0
[ 2818.302437] [<ffffffff811a4edc>] do_try_to_free_pages+0x16c/0x3f0
[ 2818.303232] [<ffffffff811a522e>] try_to_free_pages+0xce/0x180
[ 2818.304038] [<ffffffff81196eda>] __alloc_pages_nodemask+0x64a/0xb60
[ 2818.304831] [<ffffffff811ade4c>] ? zone_statistics+0x7c/0xa0
[ 2818.305619] [<ffffffff811e091c>] alloc_pages_current+0x8c/0x110
[ 2818.306399] [<ffffffff811e99ff>] new_slab+0x28f/0x490
[ 2818.307183] [<ffffffff811eaa4b>] ___slab_alloc+0x22b/0x460
[ 2818.307975] [<ffffffff8118ff05>] ? mempool_alloc_slab+0x15/0x20
[ 2818.308751] [<ffffffff8118ff05>] ? mempool_alloc_slab+0x15/0x20
[ 2818.309513] [<ffffffff811eaca0>] __slab_alloc+0x20/0x40
[ 2818.310279] [<ffffffff811eb21f>] kmem_cache_alloc+0x19f/0x1f0
[ 2818.311049] [<ffffffff8118ff05>] ? mempool_alloc_slab+0x15/0x20
[ 2818.311834] [<ffffffff8118ff05>] mempool_alloc_slab+0x15/0x20
[ 2818.312603] [<ffffffff8118ffee>] mempool_alloc+0x6e/0x170
[ 2818.313377] [<ffffffffc0162092>] ? __ablk_encrypt+0x52/0x70 [ablk_helper]
[ 2818.314150] [<ffffffff813b20ad>] bio_alloc_bioset+0xbd/0x260
[ 2818.314925] [<ffffffffc0112af4>] kcryptd_crypt+0x104/0x380 [dm_crypt]
[ 2818.315712] [<ffffffff8109a052>] process_one_work+0x162/0x480
[ 2818.316484] [<ffffffff8109a3bb>] worker_thread+0x4b/0x4c0
[ 2818.317250] [<ffffffff8109a370>] ? process_one_work+0x480/0x480
[ 2818.318018] [<ffffffff810a0588>] kthread+0xd8/0xf0
[ 2818.318782] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2818.319561] [<ffffffff8182568f>] ret_from_fork+0x3f/0x70
[ 2818.320333] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0

[ 2818.321113] kworker/u16:2 D ffff880098653538 0 3388 2 0x00000000
[ 2818.321909] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[ 2818.322700] ffff880098653538 ffffffff811cc3a6 ffff880159471b80 ffff880093b6b700
[ 2818.323508] ffff880098654000 ffff880098653570 ffff88015ec8dd00 ffff88015ec8dd00
[ 2818.324313] ffffffffffffffe0 ffff880098653550 ffffffff81821205 0000000100099990
[ 2818.325116] Call Trace:
[ 2818.325905] [<ffffffff811cc3a6>] ? page_referenced+0xb6/0x140
[ 2818.326705] [<ffffffff81821205>] schedule+0x35/0x80
[ 2818.327498] [<ffffffff81824299>] schedule_timeout+0x129/0x270
[ 2818.328301] [<ffffffff810ec370>] ? trace_event_raw_event_tick_stop+0x120/0x120
[ 2818.329103] [<ffffffff8182443e>] schedule_timeout_uninterruptible+0x1e/0x20
[ 2818.329908] [<ffffffff811aea13>] wait_iff_congested+0xf3/0x180
[ 2818.330711] [<ffffffff810c3a70>] ? wake_atomic_t_function+0x60/0x60
[ 2818.331516] [<ffffffff811a3f2f>] shrink_inactive_list+0x4ff/0x520
[ 2818.332332] [<ffffffff811a48d3>] shrink_lruvec+0x593/0x750
[ 2818.333128] [<ffffffff811a4b7f>] shrink_zone+0xef/0x2e0
[ 2818.333912] [<ffffffff811a4edc>] do_try_to_free_pages+0x16c/0x3f0
[ 2818.334698] [<ffffffff811a522e>] try_to_free_pages+0xce/0x180
[ 2818.335483] [<ffffffff81194d44>] ? drain_local_pages+0x24/0x30
[ 2818.336278] [<ffffffff81196eda>] __alloc_pages_nodemask+0x64a/0xb60
[ 2818.337060] [<ffffffff811ade4c>] ? zone_statistics+0x7c/0xa0
[ 2818.337844] [<ffffffff811e091c>] alloc_pages_current+0x8c/0x110
[ 2818.338628] [<ffffffff811e99ff>] new_slab+0x28f/0x490
[ 2818.339407] [<ffffffff811eaa4b>] ___slab_alloc+0x22b/0x460
[ 2818.340204] [<ffffffff8118ff05>] ? mempool_alloc_slab+0x15/0x20
[ 2818.340982] [<ffffffff8118ff05>] ? mempool_alloc_slab+0x15/0x20
[ 2818.341756] [<ffffffff811eaca0>] __slab_alloc+0x20/0x40
[ 2818.342517] [<ffffffff811eb21f>] kmem_cache_alloc+0x19f/0x1f0
[ 2818.343276] [<ffffffff8118ff05>] ? mempool_alloc_slab+0x15/0x20
[ 2818.344044] [<ffffffff8118ff05>] mempool_alloc_slab+0x15/0x20
[ 2818.344793] [<ffffffff8118ffee>] mempool_alloc+0x6e/0x170
[ 2818.345540] [<ffffffffc0162092>] ? __ablk_encrypt+0x52/0x70 [ablk_helper]
[ 2818.346287] [<ffffffff813b20ad>] bio_alloc_bioset+0xbd/0x260
[ 2818.347022] [<ffffffffc0112af4>] kcryptd_crypt+0x104/0x380 [dm_crypt]
[ 2818.347775] [<ffffffff8109a052>] process_one_work+0x162/0x480
[ 2818.348518] [<ffffffff8109a3bb>] worker_thread+0x4b/0x4c0
[ 2818.349260] [<ffffffff8109a370>] ? process_one_work+0x480/0x480
[ 2818.349990] [<ffffffff810a0588>] kthread+0xd8/0xf0
[ 2818.350722] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2818.351456] [<ffffffff8182568f>] ret_from_fork+0x3f/0x70
[ 2818.352199] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0

 2818.478112] kworker/u16:4 D ffff880094c5b538 0 3966 2 0x00000000
[ 2818.478856] Workqueue: kcryptd kcryptd_crypt [dm_crypt]
[ 2818.479610] ffff880094c5b538 ffffffff811cc3a6 ffff880159470dc0 ffff88015723b700
[ 2818.480374] ffff880094c5c000 ffff880094c5b570 ffff88015ec4dd00 ffff88015ec4dd00
[ 2818.481137] ffffffffffffffe0 ffff880094c5b550 ffffffff81821205 0000000100099b56
[ 2818.481901] Call Trace:
[ 2818.482651] [<ffffffff811cc3a6>] ? page_referenced+0xb6/0x140
[ 2818.483413] [<ffffffff81821205>] schedule+0x35/0x80
[ 2818.484186] [<ffffffff81824299>] schedule_timeout+0x129/0x270
[ 2818.484940] [<ffffffff810ec370>] ? trace_event_raw_event_tick_stop+0x120/0x120
[ 2818.485691] [<ffffffff8182443e>] schedule_timeout_uninterruptible+0x1e/0x20
[ 2818.486447] [<ffffffff811aea13>] wait_iff_congested+0xf3/0x180
[ 2818.487201] [<ffffffff810c3a70>] ? wake_atomic_t_function+0x60/0x60
[ 2818.487964] [<ffffffff811a3f2f>] shrink_inactive_list+0x4ff/0x520
[ 2818.488712] [<ffffffff811a48d3>] shrink_lruvec+0x593/0x750
[ 2818.489468] [<ffffffff811a4b7f>] shrink_zone+0xef/0x2e0
[ 2818.490209] [<ffffffff811a4edc>] do_try_to_free_pages+0x16c/0x3f0
[ 2818.490951] [<ffffffff811a522e>] try_to_free_pages+0xce/0x180
[ 2818.491711] [<ffffffff81194d44>] ? drain_local_pages+0x24/0x30
[ 2818.492459] [<ffffffff81196eda>] __alloc_pages_nodemask+0x64a/0xb60
[ 2818.493202] [<ffffffff811ade4c>] ? zone_statistics+0x7c/0xa0
[ 2818.493944] [<ffffffff811e091c>] alloc_pages_current+0x8c/0x110
[ 2818.494696] [<ffffffff811e99ff>] new_slab+0x28f/0x490
[ 2818.495448] [<ffffffff811eaa4b>] ___slab_alloc+0x22b/0x460
[ 2818.496210] [<ffffffff8118ff05>] ? mempool_alloc_slab+0x15/0x20
[ 2818.496964] [<ffffffff8118ff05>] ? mempool_alloc_slab+0x15/0x20
[ 2818.497715] [<ffffffff811eaca0>] __slab_alloc+0x20/0x40
[ 2818.498469] [<ffffffff811eb21f>] kmem_cache_alloc+0x19f/0x1f0
[ 2818.499220] [<ffffffff8118ff05>] ? mempool_alloc_slab+0x15/0x20
[ 2818.499973] [<ffffffff8118ff05>] mempool_alloc_slab+0x15/0x20
[ 2818.500708] [<ffffffff8118ffee>] mempool_alloc+0x6e/0x170
[ 2818.501437] [<ffffffffc0162092>] ? __ablk_encrypt+0x52/0x70 [ablk_helper]
[ 2818.502169] [<ffffffff813b20ad>] bio_alloc_bioset+0xbd/0x260
[ 2818.502888] [<ffffffffc0112af4>] kcryptd_crypt+0x104/0x380 [dm_crypt]
[ 2818.503624] [<ffffffff8109a052>] process_one_work+0x162/0x480
[ 2818.504346] [<ffffffff8109a3bb>] worker_thread+0x4b/0x4c0
[ 2818.505069] [<ffffffff8109a370>] ? process_one_work+0x480/0x480
[ 2818.505794] [<ffffffff8109a370>] ? process_one_work+0x480/0x480
[ 2818.506496] [<ffffffff810a0588>] kthread+0xd8/0xf0
[ 2818.507205] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2818.507930] [<ffffffff8182568f>] ret_from_fork+0x3f/0x70
[ 2818.508642] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0

6. kcryptd seems to be sleeping without work

[ 2817.579092] kcryptd_io S ffff8800352ebe38 0 318 2 0x00000000
[ 2817.579901] ffff8800352ebe38 ffff8800352ebe40 ffffffff81e11500 ffff880035068dc0
[ 2817.580712] ffff8800352ec000 ffff880035068dc0 ffffffff8109a830 ffff880035686a80
[ 2817.581516] ffff8800937d9180 ffff8800352ebe50 ffffffff81821205 ffff8800937d91b0
[ 2817.582314] Call Trace:
[ 2817.583104] [<ffffffff8109a830>] ? worker_thread+0x4c0/0x4c0
[ 2817.583904] [<ffffffff81821205>] schedule+0x35/0x80
[ 2817.584699] [<ffffffff8109ab75>] rescuer_thread+0x345/0x3d0
[ 2817.585494] [<ffffffff8109a830>] ? worker_thread+0x4c0/0x4c0
[ 2817.586284] [<ffffffff810a0588>] kthread+0xd8/0xf0
[ 2817.587076] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2817.587870] [<ffffffff8182568f>] ret_from_fork+0x3f/0x70
[ 2817.588663] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2817.589463] kcryptd S ffff8800352efe38 0 319 2 0x00000000
[ 2817.590270] ffff8800352efe38 ffff8800352efe40 ffffffff81e11500 ffff880035069b80
[ 2817.591078] ffff8800352f0000 ffff880035069b80 ffffffff8109a830 ffff8800355b9480
[ 2817.591881] ffff880156062c00 ffff8800352efe50 ffffffff81821205 ffff880156062c30
[ 2817.592681] Call Trace:
[ 2817.593468] [<ffffffff8109a830>] ? worker_thread+0x4c0/0x4c0
[ 2817.594250] [<ffffffff81821205>] schedule+0x35/0x80
[ 2817.595034] [<ffffffff8109ab75>] rescuer_thread+0x345/0x3d0
[ 2817.595828] [<ffffffff8109a830>] ? worker_thread+0x4c0/0x4c0
[ 2817.596618] [<ffffffff810a0588>] kthread+0xd8/0xf0
[ 2817.597413] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2817.598214] [<ffffffff8182568f>] ret_from_fork+0x3f/0x70
[ 2817.599014] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0

7. dmcrypt_write seems like sleeping without work

[ 2817.599818] dmcrypt_write S ffff8800350f3de8 0 320 2 0x00000000
[ 2817.600626] ffff8800350f3de8 ffff8800350f3da8 ffff880159470dc0 ffff88003506a940
[ 2817.601435] ffff8800350f4000 ffff8800350f3e40 ffff8800355b9670 ffff88003506a940
[ 2817.602237] ffff8800355b9600 ffff8800350f3e00 ffffffff81821205 ffff8800355b9668
[ 2817.603038] Call Trace:
[ 2817.603827] [<ffffffff81821205>] schedule+0x35/0x80
[ 2817.604618] [<ffffffffc0113081>] dmcrypt_write+0xd1/0x1e0 [dm_crypt]
[ 2817.605412] [<ffffffff810abe40>] ? wake_up_q+0x70/0x70
[ 2817.606204] [<ffffffffc0112fb0>] ? crypt_iv_lmk_gen+0xc0/0xc0 [dm_crypt]
[ 2817.607008] [<ffffffffc0112fb0>] ? crypt_iv_lmk_gen+0xc0/0xc0 [dm_crypt]
[ 2817.607807] [<ffffffff810a0588>] kthread+0xd8/0xf0
[ 2817.608604] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2817.609404] [<ffffffff8182568f>] ret_from_fork+0x3f/0x70
[ 2817.610206] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0

8. systemd journal and logind and irqbalanced is blocked in pagefault handler

[ 2817.716663] systemd-journal D ffff8800351c7b78 0 424 1 0x00000000
[ 2817.717463] ffff8800351c7b78 ffff880094103680 ffffffff81e11500 ffff880035458000
[ 2817.718269] ffff8800351c8000 ffff88015ec16d00 7fffffffffffffff ffffffff81821a00
[ 2817.719077] ffff8800351c7cd8 ffff8800351c7b90 ffffffff81821205 0000000000000000
[ 2817.719877] Call Trace:
[ 2817.720663] [<ffffffff81821a00>] ? bit_wait+0x60/0x60
[ 2817.721458] [<ffffffff81821205>] schedule+0x35/0x80
[ 2817.722247] [<ffffffff81824325>] schedule_timeout+0x1b5/0x270
[ 2817.723049] [<ffffffff813b5cf3>] ? __blk_run_queue+0x33/0x40
[ 2817.723852] [<ffffffff813b5fba>] ? queue_unplugged+0x2a/0xb0
[ 2817.724651] [<ffffffff81821a00>] ? bit_wait+0x60/0x60
[ 2817.725448] [<ffffffff81820754>] io_schedule_timeout+0xa4/0x110
[ 2817.726244] [<ffffffff81821a1b>] bit_wait_io+0x1b/0x70
[ 2817.727021] [<ffffffff818215ad>] __wait_on_bit+0x5d/0x90
[ 2817.727800] [<ffffffff8118f79d>] wait_on_page_bit_killable+0xcd/0xf0
[ 2817.728583] [<ffffffff810c3ab0>] ? autoremove_wake_function+0x40/0x40
[ 2817.729369] [<ffffffff8118f844>] __lock_page_or_retry+0x84/0xa0
[ 2817.730143] [<ffffffff8118fa4d>] filemap_fault+0x1ed/0x3f0
[ 2817.730920] [<ffffffff811bc1f0>] __do_fault+0x50/0xe0
[ 2817.731697] [<ffffffff811bfc2b>] handle_mm_fault+0xf8b/0x1820
[ 2817.732475] [<ffffffff8125575a>] ? ep_poll+0x21a/0x3d0
[ 2817.733252] [<ffffffff8106b537>] __do_page_fault+0x197/0x400
[ 2817.734035] [<ffffffff8106b7c2>] do_page_fault+0x22/0x30
[ 2817.734811] [<ffffffff81827478>] page_fault+0x28/0x30

[ 2817.906729] systemd-logind D ffff880157b6bb78 0 857 1 0x00000000
[ 2817.907551] ffff880157b6bb78 ffffffff8118d08e ffff880159470dc0 ffff880095402940
[ 2817.908386] ffff880157b6c000 ffff88015ec56d00 7fffffffffffffff ffffffff81821a00
[ 2817.909207] ffff880157b6bcd8 ffff880157b6bb90 ffffffff81821205 0000000000000000
[ 2817.910038] Call Trace:
[ 2817.910850] [<ffffffff8118d08e>] ? find_get_entry+0x1e/0xa0
[ 2817.911666] [<ffffffff81821a00>] ? bit_wait+0x60/0x60
[ 2817.912477] [<ffffffff81821205>] schedule+0x35/0x80
[ 2817.913289] [<ffffffff81824325>] schedule_timeout+0x1b5/0x270
[ 2817.914101] [<ffffffff811c008d>] ? handle_mm_fault+0x13ed/0x1820
[ 2817.914919] [<ffffffff81821a00>] ? bit_wait+0x60/0x60
[ 2817.915731] [<ffffffff81820754>] io_schedule_timeout+0xa4/0x110
[ 2817.916546] [<ffffffff81821a1b>] bit_wait_io+0x1b/0x70
[ 2817.917353] [<ffffffff818215ad>] __wait_on_bit+0x5d/0x90
[ 2817.918151] [<ffffffff8118f79d>] wait_on_page_bit_killable+0xcd/0xf0
[ 2817.918952] [<ffffffff810c3ab0>] ? autoremove_wake_function+0x40/0x40
[ 2817.919753] [<ffffffff8118f844>] __lock_page_or_retry+0x84/0xa0
[ 2817.920557] [<ffffffff8118fa4d>] filemap_fault+0x1ed/0x3f0
[ 2817.921349] [<ffffffff811bc1f0>] __do_fault+0x50/0xe0
[ 2817.922136] [<ffffffff811bfc2b>] handle_mm_fault+0xf8b/0x1820
[ 2817.922924] [<ffffffff8125575a>] ? ep_poll+0x21a/0x3d0
[ 2817.923714] [<ffffffff8106b537>] __do_page_fault+0x197/0x400
[ 2817.924503] [<ffffffff8106b7c2>] do_page_fault+0x22/0x30
[ 2817.925295] [<ffffffff81827478>] page_fault+0x28/0x30

[ 2818.096501] irqbalance D ffff880153413b78 0 1005 1 0x00000000
[ 2818.097266] ffff880153413b78 ffff880094347080 ffff880159472940 ffff880095406040
[ 2818.098039] ffff880153414000 ffff88015ecd6d00 7fffffffffffffff ffffffff81821a00
[ 2818.098814] ffff880153413cd8 ffff880153413b90 ffffffff81821205 0000000000000000
[ 2818.099591] Call Trace:
[ 2818.100346] [<ffffffff81821a00>] ? bit_wait+0x60/0x60
[ 2818.101114] [<ffffffff81821205>] schedule+0x35/0x80
[ 2818.101871] [<ffffffff81824325>] schedule_timeout+0x1b5/0x270
[ 2818.102626] [<ffffffff813b5cf3>] ? __blk_run_queue+0x33/0x40
[ 2818.103376] [<ffffffff813b5fba>] ? queue_unplugged+0x2a/0xb0
[ 2818.104147] [<ffffffff81821a00>] ? bit_wait+0x60/0x60
[ 2818.104894] [<ffffffff81820754>] io_schedule_timeout+0xa4/0x110
[ 2818.105638] [<ffffffff81821a1b>] bit_wait_io+0x1b/0x70
[ 2818.106392] [<ffffffff818215ad>] __wait_on_bit+0x5d/0x90
[ 2818.107142] [<ffffffff8118f79d>] wait_on_page_bit_killable+0xcd/0xf0
[ 2818.107919] [<ffffffff810c3ab0>] ? autoremove_wake_function+0x40/0x40
[ 2818.108674] [<ffffffff8118f844>] __lock_page_or_retry+0x84/0xa0
[ 2818.109435] [<ffffffff8118fa4d>] filemap_fault+0x1ed/0x3f0
[ 2818.110196] [<ffffffff811bc1f0>] __do_fault+0x50/0xe0
[ 2818.110956] [<ffffffff811bfc2b>] handle_mm_fault+0xf8b/0x1820
[ 2818.111729] [<ffffffff8106b537>] __do_page_fault+0x197/0x400
[ 2818.112489] [<ffffffff8106b7c2>] do_page_fault+0x22/0x30
[ 2818.113254] [<ffffffff81827478>] page_fault+0x28/0x30

9. avahi-daemon is blocked in pagefault handler with different stacktrace

[ 2818.403217] avahi-daemon D ffff880090e671f8 0 3601 1 0x00000000
[ 2818.404027] ffff880090e671f8 ffffffff811eaac6 ffff8800350ce040 ffff880094910000
[ 2818.404832] ffff880090e68000 ffff880090e67230 ffff88015eccdd00 ffff88015eccdd00
[ 2818.405629] ffff880094910000 ffff880090e67210 ffffffff81821205 0000000100099ee5
[ 2818.406422] Call Trace:
[ 2818.407195] [<ffffffff811eaac6>] ? ___slab_alloc+0x2a6/0x460
[ 2818.407992] [<ffffffff81821205>] schedule+0x35/0x80
[ 2818.408775] [<ffffffff81824299>] schedule_timeout+0x129/0x270
[ 2818.409558] [<ffffffff810ec370>] ? trace_event_raw_event_tick_stop+0x120/0x120
[ 2818.410349] [<ffffffff810f574c>] ? ktime_get+0x3c/0xb0
[ 2818.411143] [<ffffffff81820754>] io_schedule_timeout+0xa4/0x110
[ 2818.411957] [<ffffffff811900c8>] mempool_alloc+0x148/0x170
[ 2818.412750] [<ffffffff810c3a70>] ? wake_atomic_t_function+0x60/0x60
[ 2818.413549] [<ffffffff813b20ad>] bio_alloc_bioset+0xbd/0x260
[ 2818.414342] [<ffffffff816a33cf>] __split_and_process_bio+0x1ef/0x3e0
[ 2818.415137] [<ffffffff816a3629>] dm_make_request+0x69/0xc0
[ 2818.415937] [<ffffffff813ba502>] generic_make_request+0xf2/0x1d0
[ 2818.416724] [<ffffffff813ba656>] submit_bio+0x76/0x170
[ 2818.417507] [<ffffffff811d2548>] __swap_writepage+0x228/0x270
[ 2818.418288] [<ffffffff811d7bcf>] ? __frontswap_store+0xdf/0x110
[ 2818.419067] [<ffffffff811d25c9>] swap_writepage+0x39/0x70
[ 2818.419857] [<ffffffff811a0b6d>] pageout.isra.44+0x16d/0x280
[ 2818.420639] [<ffffffff811a31ca>] shrink_page_list+0x3ca/0x7a0
[ 2818.421428] [<ffffffff811a3c39>] shrink_inactive_list+0x209/0x520
[ 2818.422212] [<ffffffff811a48d3>] shrink_lruvec+0x593/0x750
[ 2818.422997] [<ffffffff811a4b7f>] shrink_zone+0xef/0x2e0
[ 2818.423779] [<ffffffff811a4edc>] do_try_to_free_pages+0x16c/0x3f0
[ 2818.424558] [<ffffffff811a522e>] try_to_free_pages+0xce/0x180
[ 2818.425327] [<ffffffff81196eda>] __alloc_pages_nodemask+0x64a/0xb60
[ 2818.426100] [<ffffffff811e091c>] alloc_pages_current+0x8c/0x110
[ 2818.426869] [<ffffffff8118d77b>] __page_cache_alloc+0xab/0xc0
[ 2818.427652] [<ffffffff8119bf8b>] __do_page_cache_readahead+0xeb/0x230
[ 2818.428415] [<ffffffff8118d08e>] ? find_get_entry+0x1e/0xa0
[ 2818.429172] [<ffffffff8118fbd5>] filemap_fault+0x375/0x3f0
[ 2818.429932] [<ffffffff811bc1f0>] __do_fault+0x50/0xe0
[ 2818.430693] [<ffffffff811bfc2b>] handle_mm_fault+0xf8b/0x1820
[ 2818.431461] [<ffffffff81220f60>] ? poll_select_copy_remaining+0x140/0x140
[ 2818.432246] [<ffffffff8106b537>] __do_page_fault+0x197/0x400
[ 2818.433015] [<ffffffff8106b7c2>] do_page_fault+0x22/0x30
[ 2818.433782] [<ffffffff81827478>] page_fault+0x28/0x30

10. Memory hungry compilation job is blocked in pagefault handler with similar trace to avahi

[ 2818.535321] cc1plus D ffff880090bcb388 0 3977 3976 0x00000000
[ 2818.536087] ffff880090bcb388 ffffffff811eaac6 ffffffff81e11500 ffff880094915280
[ 2818.536853] ffff880090bcc000 ffff880090bcb3c0 ffff88015ec0dd00 ffff88015ec0dd00
[ 2818.537618] ffff880094915280 ffff880090bcb3a0 ffffffff81821205 0000000100099ee5
[ 2818.538375] Call Trace:
[ 2818.539128] [<ffffffff811eaac6>] ? ___slab_alloc+0x2a6/0x460
[ 2818.539897] [<ffffffff81821205>] schedule+0x35/0x80
[ 2818.540665] [<ffffffff81824299>] schedule_timeout+0x129/0x270
[ 2818.541431] [<ffffffff810ec370>] ? trace_event_raw_event_tick_stop+0x120/0x120
[ 2818.542201] [<ffffffff810f574c>] ? ktime_get+0x3c/0xb0
[ 2818.542970] [<ffffffff81820754>] io_schedule_timeout+0xa4/0x110
[ 2818.543748] [<ffffffff811900c8>] mempool_alloc+0x148/0x170
[ 2818.544517] [<ffffffff810c3a70>] ? wake_atomic_t_function+0x60/0x60
[ 2818.545286] [<ffffffff813b20ad>] bio_alloc_bioset+0xbd/0x260
[ 2818.546063] [<ffffffff816a33cf>] __split_and_process_bio+0x1ef/0x3e0
[ 2818.546840] [<ffffffff816a3629>] dm_make_request+0x69/0xc0
[ 2818.547632] [<ffffffff813ba502>] generic_make_request+0xf2/0x1d0
[ 2818.548416] [<ffffffff813ba656>] submit_bio+0x76/0x170
[ 2818.549200] [<ffffffff811d2548>] __swap_writepage+0x228/0x270
[ 2818.549980] [<ffffffff811d7bcf>] ? __frontswap_store+0xdf/0x110
[ 2818.550763] [<ffffffff811d25c9>] swap_writepage+0x39/0x70
[ 2818.551546] [<ffffffff811a0b6d>] pageout.isra.44+0x16d/0x280
[ 2818.552341] [<ffffffff811a31ca>] shrink_page_list+0x3ca/0x7a0
[ 2818.553121] [<ffffffff811a3c39>] shrink_inactive_list+0x209/0x520
[ 2818.553902] [<ffffffff811a48d3>] shrink_lruvec+0x593/0x750
[ 2818.554677] [<ffffffff810986d1>] ? queue_work_on+0x31/0x40
[ 2818.555460] [<ffffffff810986d1>] ? queue_work_on+0x31/0x40
[ 2818.556230] [<ffffffff811a4b7f>] shrink_zone+0xef/0x2e0
[ 2818.557004] [<ffffffff811a4edc>] do_try_to_free_pages+0x16c/0x3f0
[ 2818.557784] [<ffffffff811a522e>] try_to_free_pages+0xce/0x180
[ 2818.558561] [<ffffffff81196eda>] __alloc_pages_nodemask+0x64a/0xb60
[ 2818.559337] [<ffffffff811e216e>] alloc_pages_vma+0xbe/0x240
[ 2818.560123] [<ffffffff811c012e>] handle_mm_fault+0x148e/0x1820
[ 2818.560884] [<ffffffff8106b537>] __do_page_fault+0x197/0x400
[ 2818.561643] [<ffffffff8106b7c2>] do_page_fault+0x22/0x30
[ 2818.562397] [<ffffffff81827478>] page_fault+0x28/0x30

11. disk events work is blocked in lru shrink

[ 2818.563150] kworker/u16:5 D ffff8800958135c8 0 3978 2 0x00000000
[ 2818.563922] Workqueue: events_freezable_power_ disk_events_workfn
[ 2818.564681] ffff8800958135c8 ffffffff811cc3a6 ffffffff81e11500 ffff8800350ce040
[ 2818.565456] ffff880095814000 ffff880095813600 ffff88015ec0dd00 ffff88015ec0dd00
[ 2818.566234] ffffffffffffffe0 ffff8800958135e0 ffffffff81821205 0000000100099b6b
[ 2818.567022] Call Trace:
[ 2818.567803] [<ffffffff811cc3a6>] ? page_referenced+0xb6/0x140
[ 2818.568593] [<ffffffff81821205>] schedule+0x35/0x80
[ 2818.569372] [<ffffffff81824299>] schedule_timeout+0x129/0x270
[ 2818.570152] [<ffffffff810ec370>] ? trace_event_raw_event_tick_stop+0x120/0x120
[ 2818.570936] [<ffffffff8182443e>] schedule_timeout_uninterruptible+0x1e/0x20
[ 2818.571720] [<ffffffff811aea13>] wait_iff_congested+0xf3/0x180
[ 2818.572505] [<ffffffff810c3a70>] ? wake_atomic_t_function+0x60/0x60
[ 2818.573283] [<ffffffff811a3f2f>] shrink_inactive_list+0x4ff/0x520
[ 2818.574056] [<ffffffff811a48d3>] shrink_lruvec+0x593/0x750
[ 2818.574829] [<ffffffff8120ebc0>] ? super_cache_count+0x70/0xe0
[ 2818.575610] [<ffffffff811a4b7f>] shrink_zone+0xef/0x2e0
[ 2818.576381] [<ffffffff811a4edc>] do_try_to_free_pages+0x16c/0x3f0
[ 2818.577158] [<ffffffff811a522e>] try_to_free_pages+0xce/0x180
[ 2818.577941] [<ffffffff81196eda>] __alloc_pages_nodemask+0x64a/0xb60
[ 2818.578717] [<ffffffff813b60b9>] ? alloc_request_struct+0x19/0x20
[ 2818.579488] [<ffffffff811e091c>] alloc_pages_current+0x8c/0x110
[ 2818.580262] [<ffffffff813b33e7>] bio_copy_kern+0xd7/0x1e0
[ 2818.581030] [<ffffffff813c0088>] blk_rq_map_kern+0xa8/0x140
[ 2818.581797] [<ffffffff813b9e5b>] ? blk_get_request+0x8b/0xf0
[ 2818.582558] [<ffffffff815adc68>] scsi_execute+0x78/0x1d0
[ 2818.583317] [<ffffffff815af96e>] scsi_execute_req_flags+0x8e/0xf0
[ 2818.584087] [<ffffffff815c0e27>] sr_check_events+0xb7/0x2d0
[ 2818.584845] [<ffffffff815fe0d8>] cdrom_check_events+0x18/0x30
[ 2818.585610] [<ffffffff815c125a>] sr_block_check_events+0x2a/0x30
[ 2818.586367] [<ffffffff813cb5f0>] disk_check_events+0x60/0x150
[ 2818.587119] [<ffffffff813cb6f6>] disk_events_workfn+0x16/0x20
[ 2818.587875] [<ffffffff8109a052>] process_one_work+0x162/0x480
[ 2818.588624] [<ffffffff8109a3bb>] worker_thread+0x4b/0x4c0
[ 2818.589353] [<ffffffff8109a370>] ? process_one_work+0x480/0x480
[ 2818.590090] [<ffffffff8109a370>] ? process_one_work+0x480/0x480
[ 2818.590817] [<ffffffff810a0588>] kthread+0xd8/0xf0
[ 2818.591535] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2818.592274] [<ffffffff8182568f>] ret_from_fork+0x3f/0x70
[ 2818.592993] [<ffffffff810a04b0>] ? kthread_create_on_node+0x1e0/0x1e0

12. kernel workqueues have a lot of stuf waiting execution

[ 2819.710760]
[ 2819.711509] Showing busy workqueues and worker pools:
[ 2819.712278] workqueue events_freezable_power_: flags=0x86
[ 2819.713053] pwq 16: cpus=0-7 flags=0x4 nice=0 active=1/256
[ 2819.713843] in-flight: 3978:disk_events_workfn
[ 2819.714673] workqueue kcryptd: flags=0x2a
[ 2819.715464] pwq 16: cpus=0-7 flags=0x4 nice=0 active=4/4
[ 2819.716267] in-flight: 3966:kcryptd_crypt [dm_crypt], 160:kcryptd_crypt [dm_crypt], 3388:kcryptd_crypt [dm_crypt], 3339:kcryptd_crypt [dm_crypt]
[ 2819.717131] delayed: kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], <repeats enought times to make line 17983 characters wide>
[ 2822.339640] pool 16: cpus=0-7 flags=0x4 nice=0 workers=7 idle: 3979 3139

13. Swap usage stats from kernel:
[ 2960.187650] Swap cache stats: add 16096, delete 3484, find 193/223
[ 2960.187711] Free swap = 4054404kB
[ 2960.187748] Total swap = 4112380kB

Recovery from hang state:
Sometimes hang state can be recovred if I use sysrq-k and sysrq-e to kill all userspace programs. Systemd will then restart userspace process if hanged state was recovered successfully. Recovering doesn't work

Workaround:
Disabling swap with swapoff -a.
Running same compile job results to OOM killer cleanly killing the memory hungry process.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-22-generic 4.4.0-22.40
ProcVersionSignature: Ubuntu 4.4.0-22.40-generic 4.4.8
Uname: Linux 4.4.0-22-generic x86_64
NonfreeKernelModules: nvidia
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D0c', '/dev/snd/pcmC1D0p', '/dev/snd/controlC1', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D8p', '/dev/snd/pcmC0D7p', '/dev/snd/pcmC0D3p', '/dev/snd/controlC0', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CurrentDesktop: GNOME
Date: Mon Jun 6 17:55:26 2016
HibernationDevice: RESUME=UUID=04553608-429c-40dc-b53c-a200f2a70588
InstallationDate: Installed on 2037-12-25 (-7871 days ago)
InstallationMedia: Lubuntu 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.1)
MachineType: Acer Aspire E5-571G
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-22-generic root=/dev/mapper/lubuntu--vg-root ro quiet splash
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-22-generic N/A
 linux-backports-modules-4.4.0-22-generic N/A
 linux-firmware 1.157
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 09/15/2015
dmi.bios.vendor: Insyde Corp.
dmi.bios.version: V1.32
dmi.board.name: EA50_HB
dmi.board.vendor: Acer
dmi.board.version: V1.32
dmi.chassis.type: 10
dmi.chassis.vendor: Acer
dmi.chassis.version: V1.32
dmi.modalias: dmi:bvnInsydeCorp.:bvrV1.32:bd09/15/2015:svnAcer:pnAspireE5-571G:pvrV1.32:rvnAcer:rnEA50_HB:rvrV1.32:cvnAcer:ct10:cvrV1.32:
dmi.product.name: Aspire E5-571G
dmi.product.version: V1.32
dmi.sys.vendor: Acer

Revision history for this message
Pauli (paniemin) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Pauli (paniemin) wrote :

A possible problem is dm_crypt works all trying to allocate pages in bio_alloc_bioset. Based on comments around relevant code allocation is supposed to come from a memory pool. But stacktraces show that code end ups to page allocator. I'm not familiar with relevant code paths so I need to read the code more carefully to figure out what actually blocks dm_crypt works the way they are blocked in stacktraces.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.7-rc1 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.7-rc1-yakkety/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
importance: Medium → High
tags: added: kernel-da-key
penalvch (penalvch)
tags: added: latest-bios-1.32
Revision history for this message
Pauli (paniemin) wrote :

.This is a new laptop with fresh install. I received it only a few days before noticing the swapping issue. Laptop model is previous generation meaning that this hardware has been around for nearly two years.

I haven't tried older kernels if I can find a stable one. Only relevant looking report about very similar problem pointed to bug being present already in 3.19 kernels. If this is same bug then his report about older kernel being stable means I would need to install quite old kernel to find a stable one. But I might setup kernel build and bisection for the broken commit in a few days. But I wanted first to learn how relevant code behaviors in my system before trying to find the bug. That means I need to read thousands lines of code and setup some kernel traces to see how thread, hw access and irq timings interact.

I tested mainline kernel package linux-image-4.7.0-040700rc2-generic_4.7.0-040700rc2.201606051831_amd64.deb

There is some changes to system behavior but still underlying bug is present that makes swapping not working on this laptop.

Changes in 4.7-rc2 kernel:
+ SLUB now reports actively about allocation problems
+ Kernel manages to use about 1G swap space before freezing (so far only one attempt to reproduce) when 4.4 kernel consistently used only about 100M swap before freeze
+ freeze triggers OOM killer very quickly and system automatically recovers after OOM killer removes the offending compiling process.
- Netconsole has become unreliable. There is allocation failures reported from netconsole context that results to lost kernel messages
+ Local kernel log recorded to disk is complete. There is no report from systemd about ring buffer overflows.

Attaching locally recorded kern.log from boot and reproduction attempt.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-bug-exists-upstream
Revision history for this message
Pauli (paniemin) wrote :
penalvch (penalvch)
tags: added: kernel-bug-exists-upstream-4.7-rc2
Changed in linux (Ubuntu):
status: Confirmed → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.