stress-ng sockfd stressor kills 4.11 kernels

Bug #1692668 reported by Seth Forshee
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Colin Ian King

Bug Description

Seen in ADT on i386, though I've also reproduced in an amd64 vm. Running the following appears to use up all RAM in the system:

 # stress-ng -v -t 10 --sockfd 4 --ignite-cpu --syslog --verbose --verify

Revision history for this message
Seth Forshee (sforshee) wrote :

Playing around with this a bit, with 1GB of RAM in my vms this seems to always happen. If I bump it to 2GB I don't see a problem any more on amd64. I still see memory problems with i386, but I also see it with a 4.10 kernel, so not sure why we aren't seeing this fail on zesty i386.

description: updated
Revision history for this message
Seth Forshee (sforshee) wrote :

cking has spent some time looking at this, the test failure is because during the test the OOM killer is killing essentially everything except stress-ng, including things like the ssh process for the test itself. He is working on some adjustments to the test to the test to make this work better, in the mean time it's not a critical failure.

Changed in linux (Ubuntu):
assignee: nobody → Colin Ian King (colin-king)
status: Confirmed → In Progress
Revision history for this message
Colin Ian King (colin-king) wrote :

[ 85.954718] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 85.954939] Call Trace:
[ 85.955149] dump_stack+0x63/0x8d
[ 85.955352] panic+0xe4/0x22d
[ 85.955555] out_of_memory+0x363/0x4e0
[ 85.955755] __alloc_pages_slowpath+0xd3f/0xe20
[ 85.955952] __alloc_pages_nodemask+0x241/0x280
[ 85.956152] alloc_pages_current+0x95/0x140
[ 85.956353] new_slab+0x473/0x770
[ 85.956540] ___slab_alloc+0x41c/0x570
[ 85.956726] ? apparmor_file_alloc_security+0x23/0x40
[ 85.956915] ? get_empty_filp+0x5c/0x1c0
[ 85.957101] ? apparmor_file_alloc_security+0x23/0x40
[ 85.957285] __slab_alloc+0x20/0x40
[ 85.957462] ? __slab_alloc+0x20/0x40
[ 85.957638] kmem_cache_alloc_trace+0x1f9/0x240
[ 85.957813] ? get_empty_filp+0x5c/0x1c0
[ 85.957986] apparmor_file_alloc_security+0x23/0x40
[ 85.958159] security_file_alloc+0x33/0x50
[ 85.958328] get_empty_filp+0x9a/0x1c0
[ 85.958495] ? getname_flags+0x4f/0x1f0
[ 85.958660] path_openat+0x40/0x1490
[ 85.958822] ? init_object+0x69/0xa0
[ 85.958982] ? ___slab_alloc+0x1ac/0x570
[ 85.959141] do_filp_open+0x99/0x110
[ 85.959298] ? set_next_entity+0xd9/0x210
[ 85.959453] ? __check_object_size+0xb3/0x190
[ 85.959607] ? __alloc_fd+0x46/0x170
[ 85.959757] do_sys_open+0x130/0x220
[ 85.959902] ? do_sys_open+0x130/0x220
[ 85.960048] SyS_open+0x1e/0x20
[ 85.960187] entry_SYSCALL_64_fastpath+0x1e/0xa9
[ 85.960331] RIP: 0033:0x7f16e254dd70
[ 85.960473] RSP: 002b:00007ffc899232c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[ 85.960620] RAX: ffffffffffffffda RBX: 00000000000e6d52 RCX: 00007f16e254dd70
[ 85.960759] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 00005636dfad6b48
[ 85.960901] RBP: 00007ffc899233f0 R08: d299bf0369a17e00 R09: 0000000000000015
[ 85.961043] R10: 0000000000000075 R11: 0000000000000246 R12: 00007ffc89923360
[ 85.961179] R13: 00000000000ffffc R14: 00007ffc899233a0 R15: 0000000000000006
[ 85.961405] Kernel Offset: 0x3c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 85.961549] ---[ end Kernel panic - not syncing: Out of memory and no killable processes...

Revision history for this message
Colin Ian King (colin-king) wrote :

[ 186.684737] Call Trace:
[ 186.684941] dump_stack+0x63/0x8d
[ 186.685143] panic+0xe4/0x22d
[ 186.685359] out_of_memory+0x363/0x4e0
[ 186.685559] __alloc_pages_slowpath+0xd3f/0xe20
[ 186.685758] __alloc_pages_nodemask+0x241/0x280
[ 186.685953] alloc_pages_vma+0xab/0x280
[ 186.686145] __read_swap_cache_async+0x147/0x1f0
[ 186.686333] read_swap_cache_async+0x26/0x60
[ 186.686520] swapin_readahead+0x1b0/0x200
[ 186.686755] ? radix_tree_lookup_slot+0x22/0x50
[ 186.687032] ? find_get_entry+0x1e/0x140
[ 186.687301] do_swap_page+0x27a/0x740
[ 186.687584] ? do_swap_page+0x27a/0x740
[ 186.687824] __handle_mm_fault+0x6a3/0x1010
[ 186.688004] handle_mm_fault+0xf9/0x220
[ 186.688177] __do_page_fault+0x23e/0x4e0
[ 186.688351] trace_do_page_fault+0x37/0xd0
[ 186.688522] do_async_page_fault+0x19/0x70
[ 186.688683] async_page_fault+0x28/0x30
[ 186.688842] RIP: 0033:0x7f74a2735b7c
[ 186.689007] RSP: 002b:00007ffc05fdaf50 EFLAGS: 00010246
[ 186.689168] RAX: 0000000000000008 RBX: 00007ffc05fdb4b0 RCX: 0000000000000d68
[ 186.689329] RDX: 00005649cd4d17b8 RSI: 00005649cd4d17b0 RDI: 00007ffc05fdb648
[ 186.689498] RBP: 00007ffc05fdb4a0 R08: 0000000000000000 R09: 0000000000000008
[ 186.689658] R10: 00007f74a3972708 R11: 0000000000000000 R12: 00005649cd4d17b0
[ 186.689813] R13: 00007ffc05fdbea0 R14: 0000000000000000 R15: 0000000000000008
[ 186.690051] Kernel Offset: 0x2000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 186.690226] ---[ end Kernel panic - not syncing: Out of memory and no killable processes..

Revision history for this message
Colin Ian King (colin-king) wrote :
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Colin Ian King (colin-king) wrote :

This fix is now in stress-ng and should therefore be picked up on the next ADT run. Seth, can you verify that this now works on the ADT tests for the 4.11 kernel?

Revision history for this message
Seth Forshee (sforshee) wrote :

Just submitted a test run of i386 against the current artful-proposed kernel. Hopefully they actually run, many of the recent tests have failed to actually run due to broken package dependencies :-(

Revision history for this message
Colin Ian King (colin-king) wrote :

Any luck? I'd like to close this bug as the fix has been committed to the relevant autotest to disable this.

Revision history for this message
Seth Forshee (sforshee) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :

I'm going to mark this as fixed with respect to stress-ng changes + disabling this in the tests. Running this stressor as root will cause issues; this is a know problem with running root privileged memory sucking stressors.

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.