CPU stress test can fail on systems with low memory

Bug #1097301 reported by Jeff Lane 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Checkbox Provider - Base
Fix Released
Medium
Maciej Kisielewski

Bug Description

This has been seen before, but here's a fresh example of the failure. On systems with low memory and higher numbers of cores, the CPU Stress Test that uses the tool "stress" can fail due to lack of memory.

Consider these two results:
https://certification.canonical.com/hardware/201208-11536/submission/2YSXo18ompLFsJd/result/test:stress%252Fcpu_stress_test:__stress__
https://certification.canonical.com/hardware/201208-11537/submission/jPsHJiz5pLCWzmC/result/test:stress%252Fcpu_stress_test:__stress__

These are essentially the same computer. The difference is that the failing system has 4 cores and 1.8GB RAM available while the passing system has 4 cores and 7.7GB available. The failing run seems to indicate this in the output of 'stress':

stress: FAIL: [3498] (495) hogvm malloc failed: Cannot allocate memory
stress: FAIL: [3478] (395) <-- worker 3498 returned error 1

We need to tweak the algorithm that tells stress how much memory to consume.

Revision history for this message
Brendan Donegan (brendan-donegan) wrote :

I think it blindly uses half a gig per core, but it should be total/cores instead i guess

Changed in checkbox:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Daniel Manrique (roadmr) wrote :

The number of CPU spinners spawned (--cpu parameter) is the result of running:

/usr/share/checkbox/scripts/cpuinfo_resource | awk '/count:/ {print $2}'

The number of memory stress (malloc/free) workers is determined by:

awk '/MemTotal/ {num_vm = $2/262144; if (num_vm != int(num_vm)) num_vm = int(num_vm) + 1; print num_vm}' /proc/meminfo

From this we can see that it assumes 256MB assigned to each worker. Stress's man page indicates this should be correct, see --vm-bytes:

      -c, --cpu N
              spawn N workers spinning on sqrt()

       -m, --vm N
              spawn N workers spinning on malloc()/free()

       --vm-bytes B
              malloc B bytes per vm worker (default is 256MB)

It then divides the total memory by 256 MB to obtain the number of workers that will use up all available memory.

On my system for instance, it results on --cpu 4 and --vm 15. This will result in 3840MB used by all vm workers (15 * 256MB).

tags: added: ce-qa-concern
Daniel Manrique (roadmr)
tags: added: scripts
Zygmunt Krynicki (zyga)
affects: checkbox → plainbox-provider-checkbox
Changed in plainbox-provider-checkbox:
assignee: nobody → Maciej Kisielewski (kissiel)
Revision history for this message
Maciej Kisielewski (kissiel) wrote :

This got fixed by:
commit e0060a8a2c98b8d85daf36eff0c6c00ea0b5e5bd
Author: Manoj Iyer <email address hidden>
Date: Tue Nov 19 13:41:46 2013 -0600

From this patch on, cpu_stress_test used --vm-bytes option to make *all* workers occupy 1/4 of system memory in total.

Tested on multiple virtual machines with low amount of memory without crash. (When tried job command from before aforementioned patch they crashed.

Marking as "Fix released", as it works fine on trunk.

Changed in plainbox-provider-checkbox:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.