Stress-ng

stress-ng hitting OOMKiller on systems with iSCSI mounted root causes kernel panics because swap disappears

Bug #1735033 reported by Jeff Lane  on 2017-11-29

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Stress-ng	Won't Fix	Low	Colin Ian King

Bug Description

This was reported by a tester in a SF case:

tl;dr: on a system with an iSCSI root, stress-ng hits OOMKiller which causes network config to disappear, which causes iSCSI mounted Swap partition to go offline, which cases kernel panics.

I'm not sure if this can be fixed, but I at least wanted to raise it. Unfortunately, I do not have access to the hardware to do any debugging or test fixing. :/

Original comments from SF:

I ran in to this earlier with another hardware type that I ran through the certification process recently, but eventually I was able to get it to pass.

It seems like the stress-ng / memory check test tries to consume too much memory on the host system. Eventually the kernel OOMs the process:

[ 3234.759258] Killed process 4005 (stress-ng) total-vm:15248492kB, anon-rss:15072800kB, file-rss:596kB
[38809.460810] Out of memory: Kill process 94236 (stress-ng) score 1011 or sacrifice child
[38809.569427] Killed process 94236 (stress-ng) total-vm:9298640kB, anon-rss:9170452kB, file-rss:376kB
[38810.528008] Out of memory: Kill process 94365 (stress-ng) score 1011 or sacrifice child
[38810.636486] Killed process 94365 (stress-ng) total-vm:9323888kB, anon-rss:9196108kB, file-rss:476kB
[38812.265270] Out of memory: Kill process 94232 (stress-ng) score 1011 or sacrifice child
[38812.373224] Killed process 94232 (stress-ng) total-vm:9438520kB, anon-rss:9287712kB, file-rss:476kB
[38813.317146] Out of memory: Kill process 94234 (stress-ng) score 1011 or sacrifice child
[38813.425118] Killed process 94234 (stress-ng) total-vm:9654232kB, anon-rss:9488360kB, file-rss:572kB
[38815.009344] Out of memory: Kill process 94388 (stress-ng) score 1012 or sacrifice child
[38815.117280] Killed process 94388 (stress-ng) total-vm:9971032kB, anon-rss:9821316kB, file-rss:476kB
[38816.465588] Out of memory: Kill process 94387 (stress-ng) score 1011 or sacrifice child
[38816.573569] Killed process 94387 (stress-ng) total-vm:9787868kB, anon-rss:9637116kB, file-rss:484kB
[38817.755249] Out of memory: Kill process 94350 (stress-ng) score 1012 or sacrifice child
[38817.863447] Killed process 94350 (stress-ng) total-vm:10259004kB, anon-rss:10148612kB, file-rss:516kB
[38819.064986] Out of memory: Kill process 94227 (stress-ng) score 1012 or sacrifice child
[38819.172964] Killed process 94227 (stress-ng) total-vm:10141268kB, anon-rss:9973088kB, file-rss:484kB
[39017.741928] INFO: task kswapd0:546 blocked for more than 120 seconds.
[39017.831155] Not tainted 4.4.0-96-generic #119-Ubuntu
[39017.908814] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

The problem is that eventually this seems to result in the host losing the NIC configuration. Given it's using an iSCSI root (and that's also where the swap exists), this results in the drive going offline:
...
[44830.774032] sd 0:0:0:1: rejecting I/O to offline device
[44830.847051] sd 0:0:0:1: rejecting I/O to offline device
[44830.919848] sd 0:0:0:1: rejecting I/O to offline device
[44830.993103] sd 0:0:0:1: rejecting I/O to offline device
[44830.993177] blk_update_request: I/O error, dev sda, sector 11377808
...

and just over a minute later kernel panic'ing and giving up:
...
[44896.219486] sd 0:0:0:1: rejecting I/O to offline device
[44896.219487] Write-error on swap-device (8:0:12186120)
[44896.219852] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[44896.219852]
[44896.219855] CPU: 38 PID: 1 Comm: systemd Not tainted 4.4.0-96-generic #119-Ubuntu
[44896.219856] Hardware name: Oracle Corporation ORACLE SERVER X7-2c/SERVER MODULE ASSY, , BIOS 46017200 09/08/2017
[44896.219859] 0000000000000086 5d6e6d57e6604dcf ffff885e8e8fbc48 ffffffff813fabd3
[44896.219861] ffffffff81cbaea0 ffff885e8e8fbce0 ffff885e8e8fbcd0 ffffffff8118d967
[44896.219863] ffff885e00000010 ffff885e8e8fbce0 ffff885e8e8fbc78 5d6e6d57e6604dcf
[44896.219863] Call Trace:
[44896.219868] [<ffffffff813fabd3>] dump_stack+0x63/0x90
[44896.219872] [<ffffffff8118d967>] panic+0xd3/0x215
[44896.219877] [<ffffffff810847b1>] do_exit+0xaf1/0xb00
[44896.219880] [<ffffffff81084843>] do_group_exit+0x43/0xb0
[44896.219883] [<ffffffff81090ae2>] get_signal+0x292/0x600
[44896.219887] [<ffffffff8102e567>] do_signal+0x37/0x6f0
[44896.219889] [<ffffffff811935b0>] ? __probe_kernel_read+0x40/0x90
[44896.219892] [<ffffffff8106b39b>] ? mm_fault_error+0x11b/0x160
[44896.219895] [<ffffffff8100320c>] exit_to_usermode_loop+0x8c/0xd0
[44896.219898] [<ffffffff81003c16>] prepare_exit_to_usermode+0x26/0x30
[44896.219902] [<ffffffff81843d65>] retint_user+0x8/0x10
[44896.260324] Kernel Offset: disabled
[44896.351757] ------------[ cut here ]------------
[44896.351762] WARNING: CPU: 38 PID: 619 at /build/linux-z2ccW0/linux-4.4.0/arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x60/0x70()
[44896.351786] Modules linked in: ipmi_devintf ip6table_filter ip6_tables xt_comment ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner xt_conntrack nf_conntrack iptable_filter ip_tables x_tables ipmi_ssif edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass shpchp ioatdma 8250_fintek ipmi_si dca ipmi_msghandler acpi_pad acpi_power_meter nfit mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr autofs4 btrfs iscsi_tcp libiscsi_tcp libiscsi s
csi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear bnxt_en_bpo crct10dif_pclmul vxlan crc32_pclmul ip6_udp_tunnel udp_tunnel ghash_clmulni_intel ptp aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd pps_core wmi fjes
[44896.351789] CPU: 38 PID: 619 Comm: kworker/38:1 Not tainted 4.4.0-96-generic #119-Ubuntu
[44896.351790] Hardware name: Oracle Corporation ORACLE SERVER X7-2c/SERVER MODULE ASSY, , BIOS 46017200 09/08/2017
[44896.351796] 0000000000000086 445bd792de0dc755 ffff88bebeb03db0 ffffffff813fabd3
[44896.351798] 0000000000000000 ffffffff81cade78 ffff88bebeb03de8 ffffffff810812e2
[44896.351799] 0000000000000012 ffff885ec0896e00 0000000000000026 ffff88be86ea3fc0
[44896.351799] Call Trace:
[44896.351803] <IRQ> [<ffffffff813fabd3>] dump_stack+0x63/0x90
[44896.351806] [<ffffffff810812e2>] warn_slowpath_common+0x82/0xc0
[44896.351808] [<ffffffff8108142a>] warn_slowpath_null+0x1a/0x20
[44896.351809] [<ffffffff81050a50>] native_smp_send_reschedule+0x60/0x70
[44896.351812] [<ffffffff810bf163>] trigger_load_balance+0x133/0x210
[44896.351814] [<ffffffff810adc16>] scheduler_tick+0xa6/0xd0
[44896.351818] [<ffffffff810ff300>] ? tick_sched_handle.isra.14+0x60/0x60
[44896.351820] [<ffffffff810ef5e1>] update_process_times+0x51/0x60
[44896.351821] [<ffffffff810ff2c5>] tick_sched_handle.isra.14+0x25/0x60
[44896.351823] [<ffffffff810ff33d>] tick_sched_timer+0x3d/0x70
[44896.351825] [<ffffffff810eff02>] __hrtimer_run_queues+0x102/0x290
[44896.351826] [<ffffffff810f06c8>] hrtimer_interrupt+0xa8/0x1a0
[44896.351829] [<ffffffff81053078>] local_apic_timer_interrupt+0x38/0x60
[44896.351831] [<ffffffff81845d1d>] smp_apic_timer_interrupt+0x3d/0x50
[44896.351834] [<ffffffff81843fe2>] apic_timer_interrupt+0x82/0x90
[44896.351837] <EOI> [<ffffffff810a9cdd>] ? finish_task_switch+0x7d/0x220
[44896.351839] [<ffffffff8183eab6>] __schedule+0x3b6/0xa30
[44896.351840] [<ffffffff8183f165>] schedule+0x35/0x80
[44896.351843] [<ffffffff8109aa3b>] worker_thread+0xcb/0x4c0
[44896.351845] [<ffffffff8109a970>] ? process_one_work+0x480/0x480
[44896.351846] [<ffffffff810a0ce5>] kthread+0xe5/0x100
[44896.351848] [<ffffffff810a0c00>] ? kthread_create_on_node+0x1e0/0x1e0
[44896.351850] [<ffffffff8184360f>] ret_from_fork+0x3f/0x70
[44896.351851] [<ffffffff810a0c00>] ? kthread_create_on_node+0x1e0/0x1e0
[44896.351853] ---[ end trace 21558d2f592d8db9 ]---

I've been able to restart the process over and over before on a box and eventually get it to complete, but it would be preferable not to have to do that, it ends up taking 12+ hours before eventually dying, essentially making this something I can run once a day (only got access to one machine for the testing at the moment, can't really run multiple tests in parallel).

$ free -m
total used free shared buff/cache available
Mem: 772706 1933 770265 9 508 768852
Swap: 4095 0 4095

I'm going to try bumping the swap up, see if that will help (this is using what is effectively the Canonical cloud image you supply us, which comes with no swap, however without swap tests fail)