stress-ng triggering oomkiller running brk and stack stressors on all arches.

Bug #1729878 reported by Jeff Lane 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Stress-ng
Invalid
Undecided
Colin Ian King
linux (Ubuntu)
Won't Fix
Medium
Unassigned
Artful
Won't Fix
Medium
Unassigned

Bug Description

During regression testing on a z/VM instance, I noticed call traces being dumped to dmesg that seem to be related to stress-ng.

The stress-ng invocation we're using is:
stress-ng --aggressive --verify --timeout $runtime \
                                 --metrics-brief --tz --times \
                                 --af-alg 0 --bsearch 0 --context 0 --cpu 0 \
                                 --crypt 0 --hsearch 0 --longjmp 0 --lsearch 0 \
                                 --matrix 0 --qsort 0 --str 0 --stream 0 \
                                 --tsearch 0 --vecmath 0 --wcs 0

as executed by the cpu_stress script from the cpu/cpu_stress_ng test in the certification suite.

ProblemType: Bug
DistroRelease: Ubuntu 17.10
Package: linux-image-4.13.0-16-generic 4.13.0-16.19
ProcVersionSignature: Ubuntu 4.13.0-16.19-generic 4.13.4
Uname: Linux 4.13.0-16-generic s390x
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.7-0ubuntu3.1
Architecture: s390x
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Date: Fri Nov 3 11:12:04 2017
HibernationDevice: RESUME=UUID=1ce0f037-449a-43a1-af49-e730f6f99ac4
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1:
PciMultimedia:

ProcEnviron:
 TERM=screen
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C
 SHELL=/bin/bash
ProcFB: Error: [Errno 2] No such file or directory: '/proc/fb'
ProcKernelCmdLine: root=/dev/mapper/hwe0008_vg-lv crashkernel=196M BOOT_IMAGE=0
RelatedPackageVersions:
 linux-restricted-modules-4.13.0-16-generic N/A
 linux-backports-modules-4.13.0-16-generic N/A
 linux-firmware 1.169
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: Upgraded to artful on 2017-11-02 (0 days ago)
---
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.20.7-0ubuntu3.1
Architecture: s390x
ArecordDevices: Error: [Errno 2] No such file or directory
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
DistroRelease: Ubuntu 17.10
HibernationDevice: RESUME=UUID=1ce0f037-449a-43a1-af49-e730f6f99ac4
IwConfig: Error: [Errno 2] No such file or directory
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1:
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=screen
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C
 SHELL=/bin/bash
ProcFB: Error: [Errno 2] No such file or directory: '/proc/fb'
ProcKernelCmdLine: root=/dev/mapper/hwe0008_vg-lv crashkernel=196M BOOT_IMAGE=0
ProcVersionSignature: Ubuntu 4.13.0-16.19-generic 4.13.4
RelatedPackageVersions:
 linux-restricted-modules-4.13.0-16-generic N/A
 linux-backports-modules-4.13.0-16-generic N/A
 linux-firmware 1.169
RfKill: Error: [Errno 2] No such file or directory
Tags: artful
Uname: Linux 4.13.0-16-generic s390x
UpgradeStatus: Upgraded to artful on 2017-11-02 (0 days ago)
UserGroups: adm cdrom cpacfstats dip lpadmin lxd plugdev sambashare sudo
_MarkForUpload: True

Revision history for this message
Jeff Lane  (bladernr) wrote :
Revision history for this message
Jeff Lane  (bladernr) wrote :

It's possible these are just OOMkiller related, but the first few call traces seem centered around snapd and I don't notice any OOMKiller messages until towards the end of the dmesg log.

Revision history for this message
Jeff Lane  (bladernr) wrote :

I'm also not sure why we're hitting the OOM killer either, the memory config is light, but not light enough that we'd trigger this sort of thing on other systems, perhaps its s390x specific:

ubuntu@hwe0008:~$ free -m
              total used free shared buff/cache available
Mem: 3807 172 3280 1 354 3584
Swap: 5707 54 5652

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1729878

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Jeff Lane  (bladernr) wrote : AudioDevicesInUse.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Jeff Lane  (bladernr) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Jeff Lane  (bladernr) wrote : JournalErrors.txt

apport information

Revision history for this message
Jeff Lane  (bladernr) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Jeff Lane  (bladernr) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Jeff Lane  (bladernr) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Jeff Lane  (bladernr) wrote : ProcModules.txt

apport information

Revision history for this message
Jeff Lane  (bladernr) wrote : UdevDb.txt

apport information

Revision history for this message
Jeff Lane  (bladernr) wrote : WifiSyslog.txt

apport information

Revision history for this message
Jeff Lane  (bladernr) wrote : Re: stress-ng triggering call-traces on s390x z/VM instances

Fix your bot, guys... it tells me logs are missing and to run apport-collect which just collects the exact same logs ubuntu-bug collected the first time.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Do you know which prior kernels did not exhibit this bug? Was it 4.12 or earlier?

Could you test the mainline kernel to see if this bug is already fixed upstream:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14-rc7

Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu Artful):
status: New → Confirmed
importance: Undecided → Medium
tags: added: kernel-da-key
Revision history for this message
Colin Ian King (colin-king) wrote :

This seems like stress-ng was eating a load of pages and snapd just got victimized because it was expanding it's memory when there were no free pages. Oom killer can seem arbitrary like this, I expect there is a good reason for it.

Revision history for this message
Jeff Lane  (bladernr) wrote : Re: [Bug 1729878] Re: stress-ng triggering call-traces on s390x z/VM instances

On Fri, Nov 3, 2017 at 1:13 PM, Joseph Salisbury
<email address hidden> wrote:
> Do you know which prior kernels did not exhibit this bug? Was it 4.12
> or earlier?

I "think" it happened with 4.10 as well, but I'm not positive, I ran
some tests before realizing I was still on 17.04 and upgrading the VM
to 17.10.

> Could you test the mainline kernel to see if this bug is already fixed upstream:
> http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14-rc7

Just a heads up, no, I cant:

Build for s390x failed (see BUILD.LOG.s390x):
  linux-headers-4.14.0-041400rc7_4.14.0-041400rc7.201710292231_all.deb
  *_s390x.deb
\

Is there a better place (internal perhaps) to get mainline builds?

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

On 3 November 2017 at 19:27, Jeff Lane <email address hidden> wrote:
> On Fri, Nov 3, 2017 at 1:13 PM, Joseph Salisbury
> <email address hidden> wrote:
>> Do you know which prior kernels did not exhibit this bug? Was it 4.12
>> or earlier?
>
> I "think" it happened with 4.10 as well, but I'm not positive, I ran
> some tests before realizing I was still on 17.04 and upgrading the VM
> to 17.10.
>
>> Could you test the mainline kernel to see if this bug is already fixed upstream:
>> http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14-rc7
>
> Just a heads up, no, I cant:
>
> Build for s390x failed (see BUILD.LOG.s390x):
> linux-headers-4.14.0-041400rc7_4.14.0-041400rc7.201710292231_all.deb
> *_s390x.deb
> \
>
> Is there a better place (internal perhaps) to get mainline builds?
>

There is no better places, and it seems like that has been fixed up now.

--
Regards,

Dimitri.

Revision history for this message
Jeff Lane  (bladernr) wrote :

So over the weekend, I retried with 4.14.0-041400rc6-generic (since
rc7 was not successfully built when I tried this).

This too ended with call traces in the kernel log after runs of stress-ng.

I only ran our memory stress and system-wide CPU stress tests, both
tests indicated that they passed (stress-ng exited with a success
code), but I noticed three call traces in the dmesg log. I've
attached dmesg taken before and after the test run for comparison.

Just for due dilligence, I will also update to rc7 and try again, now
that the packages have been fixed, per the update from xnox.

Revision history for this message
Jeff Lane  (bladernr) wrote : Re: stress-ng triggering call-traces on s390x z/VM instances

Ok, so a little bit further. This seems to be that stress-ng is triggering OOMkiller which I'm not sure it should be, unless it's just really writing a LOT of data to memory.

My setup has been the same on all architectures and I've seen the same behaviour on Artful deployments on amd64 (Xeon Phi), Power8 and s390x:

created swap space that is 2x the amount of RAM per the output of free
ran the stress-ng based memory stress test from the certification suite
stress-ng tells us that the tests all pass
dmesg reveals invocations of OOMKiller

This is all handled by the memory_stress_ng script in the cert suite. The tests we run are defined by this block:

crt_stressors=("bsearch" "context" "hsearch" "lsearch" "matrix" \
               "memcpy" "null" "pipe" "qsort" "stack" "str" "stream" \
               "tsearch" "vm-rw" "wcs" "zero" "mlock" "mmapfork" "mmapmany" \
               "mremap" "shm-sysv" "vm-splice")
if [ "$numa_nodes" -gt 1 ]; then
    crt_stressors+=("numa")
fi

and we run them individually, one test case at a time.

I've attached a tarball with the stderr and stdout from stress-ng, dmesg contents and the test script that runs all this.

Revision history for this message
Jeff Lane  (bladernr) wrote :

The call that invokes stress-ng looks like this:

timeout -s 14 $end_time stress-ng -k --aggressive --verify --timeout $runtime --$1 0

Where $end_time is the max time before killing stress-ng IF stress-ng runs longer than the --timeout value. $1, is the stressor invoked, and as I said above, we run the stressors one at a time, not in parallel.

Jeff Lane  (bladernr)
summary: - stress-ng triggering call-traces on s390x z/VM instances
+ stress-ng triggering oomkiller running brk and stack stressors on all
+ arches.
Revision history for this message
Jeff Lane  (bladernr) wrote :
Download full text (7.2 KiB)

Ok, I've narrowed it down to the brk and stack stressors in stress-ng. I've recreated this on amd64, ppc64el and s390x across 4.4, 4.10 and 4.13 kernels.

Stress-NG reports the tests as successful so I guess now the only thing remaining is to confirm that hitting OOM killer is expected behaviour when running these stressors.

I've tried setting up swap space that is three times the amount of physical RAM and this is still invoking the OOMkiller.

ubuntu@xwing:~$ free -lm
              total used free shared buff/cache available
Mem: 3838 105 3598 0 133 3517
Low: 3838 239 3598
High: 0 0 0
Swap: 11514 68 11446

And from looking at vmstat while it's running:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r b swpd free buff cache si so bi bo in cs us sy id wa st
 0 0 37 3618 5 138 0 2 26 2880 154 138 0 1 60 39 0
 0 9 257 82 1 114 0 22 183 23495 1284 907 0 3 61 36 0
 0 9 891 84 0 110 0 63 9 65792 3266 1865 0 2 6 92 0
 0 10 1601 87 0 108 0 71 1 73444 3466 1682 0 3 6 92 0
 1 8 2300 83 0 108 0 70 6 72316 3494 1673 0 3 6 91 0
 0 10 2926 82 1 106 0 63 165 64884 2940 1593 0 2 8 90 0
 0 9 3592 95 0 106 0 67 69 68825 2987 1672 0 3 8 90 0
 0 9 4326 82 0 104 0 73 16 76003 3376 1645 0 3 5 92 0
 0 9 4910 81 0 103 0 58 177 60429 2713 1596 0 2 8 90 0
 0 10 5581 81 0 104 0 67 122 69519 2680 1581 0 2 13 85 0
 0 10 6317 89 0 105 0 74 13 76143 3389 1613 0 3 5 92 0
 0 9 6972 83 1 104 0 66 103 67807 2964 1711 0 3 7 90 0
 0 9 7591 103 1 105 0 62 39 64041 2918 1530 0 3 8 89 0
 0 9 8248 84 1 105 0 66 17 68101 3208 1573 0 3 6 91 0
 0 10 8946 82 1 105 0 70 0 72289 2934 1615 0 3 7 90 0
 0 9 9586 82 1 104 0 64 60 66192 2953 1725 0 3 6 90 0
 0 9 10276 100 1 104 0 69 3 71434 2986 1694 0 3 9 88 0
 0 10 10958 82 1 103 0 68 11 70660 3365 1704 0 4 8 88 0
 0 9 11455 82 1 102 0 50 53 51572 2269 1533 0 2 15 82 0
 0 11 10249 87 0 106 0 26 783 27601 1323 1721 0 2 16 82 0
 0 10 10407 82 0 105 0 16 6 16883 1487 1438 0 1 39 60 0
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r b swpd free buff cache si so bi bo in cs us sy id wa st
 0 11 10543 82 0 109 0 14 558 14774 1523 2202 0 1 31 68 0
 0 12 10672 82 1 113 0 13 598 13945 1552 2592 0 1 17 82 0
 1 8 10844 84 1 115 0 17 499 18516 1434 1921 0 1 22 77 0
 0 9 11080 81 1 113 0 24 0 25004 1782 1547 0 2 32 67 0
 0 9 11338 82 1 113 ...

Read more...

Revision history for this message
Jeff Lane  (bladernr) wrote :

So I did this on 4.4 on ppc64le with 32GB RAM and 96GB Swap:
ubuntu@entei:~$ free -lm
              total used free shared buff/cache available
Mem: 32565 656 30603 2 1304 30503
Low: 32565 1961 30603
High: 0 0 0
Swap: 97695 342 97353

And this time OOMKiller was not hit.

ubuntu@entei:~$ ./memory_stress_ng
Total memory is 32 GiB
Constant run time is 300 seconds per stressor
Variable run time is 620 seconds per stressor
Estimated total run time is 20 minutes

Running stress-ng stack stressor for 300 seconds....
stress-ng: info: [15395] dispatching hogs: 128 stack
stress-ng: info: [15395] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [15395] cache allocate: default cache size: 2048K
stress-ng: info: [15395] successful run completed in 317.89s (5 mins, 17.89 secs)
return_code is 0
Running stress-ng numa stressor for 300 seconds....
stress-ng: info: [15663] dispatching hogs: 128 numa
stress-ng: info: [15663] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [15663] cache allocate: default cache size: 2048K
stress-ng: info: [15664] stress-ng-numa: system has 1 of a maximum 256 memory NUMA nodes
stress-ng: info: [15663] successful run completed in 300.08s (5 mins, 0.08 secs)
return_code is 0
Running stress-ng brk stressor for 620 seconds....
stress-ng: info: [15800] dispatching hogs: 128 brk
stress-ng: info: [15800] cache allocate: using built-in defaults as unable to determine cache details
stress-ng: info: [15800] cache allocate: default cache size: 2048K
stress-ng: info: [15800] successful run completed in 666.16s (11 mins, 6.16 secs)
return_code is 0
*******************************************************************
** stress-ng memory test passed!
*******************************************************************

The only message that appeared in dmesg during this test run is one warning: (I cleared the ring buffer with dmesg -C before running the test).
ubuntu@entei:~$ dmesg
[ 7390.968438] stress-ng[15584]: bad frame in setup_rt_frame: 00003fffdd0e1040 nip

And from looking at vmstat, this is the lowest and highest consumption observed:
 0 0 340 30582 14 1299 0 0 3 385 27 25 3 3 75 19 0
 0 260 58068 196 12 1323 2 76 2571 78490 3295 5493 0 0 25 74 0

So at the least, only 340MB swap consumed and a full 30GB ram free, and at the most, 58GB swap consumed and only 196MB free.

So there DOES seem to be a limit to what these tests are consuming.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Also retried on amd64 and the effect was the same:

ubuntu@xwing:~$ free -lm
              total used free shared buff/cache available
Mem: 3838 120 3601 0 115 3510
Low: 3838 236 3601
High: 0 0 0
Swap: 97695 61 97633

no OOMKiller messages appeared in dmesg

Test passed according to test output

and the low/high usage is within what is available:

0 0 62 3605 1 111 0 0 320 1231 252 511 0 0 97 3 0
1 9 19429 86 1 109 0 58 25 60607 2863 2301 0 3 11 85 0

Changed in linux (Ubuntu):
status: Confirmed → Invalid
Changed in linux (Ubuntu Artful):
status: Confirmed → Invalid
Revision history for this message
Jeff Lane  (bladernr) wrote :

Given that, I'm marking the kernel tasks as invalid as this appears to be stress-ng being over aggressive.

MAAS deployments (and I presume server installs in general) only provide about 4 - 8GB swap max in a swap file created in the root filesystem (no swap partition anymore).

It is not reasonable to create, IMO, 100GB swap files just to be sure we don't accidentally trip the OOMKiller when running memory tests.

Also, the ratio seems to be variable. On my amd64 system, the swap consumed was 5x the amount of installed RAM. On the ppc64le system, the amount was just shy of 2x the amount of physical RAM (this is also a NUMA capable system, so maybe that's a contributing factor, beyond the differences in how each arch handles memory, which I freely admit to not understanding well.

Revision history for this message
Jeff Lane  (bladernr) wrote :

I'm still able to trigger problems with the "stack" stressor on the Xeon Phi...

ubuntu@lalande:~$ sudo ./memory_stress_ng
Total memory is 63 GiB
Constant run time is 300 seconds per stressor
Variable run time is 930 seconds per stressor
Estimated total run time is 20 minutes

Running stress-ng stack stressor for 300 seconds....
stress-ng: info: [4431] dispatching hogs: 256 stack
stress-ng: info: [4431] successful run completed in 742.49s (12 mins, 22.49 secs)
return_code is 124
*****************************************************************
** Error 124 reported on stressor stack!)
*****************************************************************
Running stress-ng brk stressor for 930 seconds....
stress-ng: info: [5007] dispatching hogs: 256 brk
stress-ng: info: [5007] successful run completed in 1412.12s (23 mins, 32.12 secs)
return_code is 124
*****************************************************************
** Error 124 reported on stressor brk!)
*****************************************************************
*******************************************************************
** stress-ng memory test failed; most recent error was 124
*******************************************************************

ubuntu@lalande:~$ free -m
              total used free shared buff/cache available
Mem: 64281 1133 62825 11 322 62530
Swap: 796134 0 796134

I'm attaching a tarball with dmesg, the output of vmstat 1 that shows the memory consumption during testing, and a copy of the test script being used.

Changed in linux (Ubuntu):
status: Invalid → Confirmed
Changed in linux (Ubuntu Artful):
status: Invalid → Confirmed
Revision history for this message
Jeff Lane  (bladernr) wrote :

Because of the above, I've re-opened the kernel tasks :(

Revision history for this message
Jeff Lane  (bladernr) wrote :

I'm not able to recreate this on a smaller generic amd64 machine:

ubuntu@xwing:~$ free
              total used free shared buff/cache available
Mem: 3930172 133912 3319200 6124 477060 3500044
Swap: 100040060 0 100040060

ubuntu@xwing:~$ sudo ./memory_stress_ng
Total memory is 4 GiB
Constant run time is 300 seconds per stressor
Variable run time is 340 seconds per stressor
Estimated total run time is 10 minutes

Running stress-ng stack stressor for 300 seconds....
stress-ng: info: [4058] dispatching hogs: 8 stack
stress-ng: info: [4058] cache allocate: default cache size: 8192K
stress-ng: info: [4058] successful run completed in 302.48s (5 mins, 2.48 secs)
return_code is 0
Running stress-ng brk stressor for 340 seconds....
stress-ng: info: [4129] dispatching hogs: 8 brk
stress-ng: info: [4129] cache allocate: default cache size: 8192K
stress-ng: info: [4129] successful run completed in 342.44s (5 mins, 42.44 secs)
return_code is 0
*******************************************************************
** stress-ng memory test passed!
*******************************************************************

Jeff Lane  (bladernr)
tags: added: hwcert-server
tags: removed: kernel-da-key
Revision history for this message
Colin Ian King (colin-king) wrote :

Is this still an open bug or can it be closed now?

Revision history for this message
Andy Whitcroft (apw) wrote : Closing unsupported series nomination.

This bug was nominated against a series that is no longer supported, ie artful. The bug task representing the artful nomination is being closed as Won't Fix.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu Artful):
status: Confirmed → Won't Fix
Revision history for this message
Colin Ian King (colin-king) wrote :

These stressors are expected to trigger the OOM killer. The oom adjustment for the stressor child processes is always adjusted to make them the first processes to be OOM'd. However, the kernel make's it choice on what gets OOM'd depending on many factors, so the stressors may end up triggering other hoggy processes to be OOM'd as well.

This is not a bug, it is expected behaviour. See the stress-ng manual, it clearly states when a stressors can trigger the OOM killer and how each stress-ng stressor copes with this, for example:

       --brk N
              start N workers that grow the data segment by one page at a time using
              multiple brk(2) calls. Each successfully allocated new page is touched
              to ensure it is resident in memory. If an out of memory condition
              occurs then the test will reset the data segment to the point before
              it started and repeat the data segment resizing over again. The
              process adjusts the out of memory setting so that it may be killed by
              the out of memory (OOM) killer before other processes. If it is
              killed by the OOM killer then it will be automatically re-started by a
              monitoring parent process.

I shall close this as it is not a bug. Stress-ng is expected to stress a kernel to trigger OOMs.

Changed in stress-ng:
status: New → Invalid
assignee: nobody → Colin Ian King (colin-king)
Brad Figg (brad-figg)
tags: added: cscc
Changed in linux (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.