Comment 6 for bug 1573062

Revision history for this message
Colin Ian King (colin-king) wrote : Re: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

I don't see any evidence of a hang, just evidence of a machine being power-cycled.

Apr 19 21:26:33 ubuntu kernel: [19749.994340] [45742] 1000 45742 369 18 6 3 14 0 stress-ng-brk
Apr 19 21:26:33 ubuntu kernel: [19749.994342] [45743] 1000 45743 369 18 6 3 14 0 stress-ng-brk
Apr 19 21:26:33 ubuntu kernel: [19749.994344] Out of memory: Kill process 45583 (stress-ng-brk) score 28 or sacrifice child
Apr 19 21:26:33 ubuntu kernel: [19749.994566] Killed process 45583 (stress-ng-brk) total-vm:7976960kB, anon-rss:7048512kB, file-rss:1152kB
Apr 20 14:28:33 binacle kernel: [ 0.000000] opal: OPAL V3 detected !
Apr 20 14:28:33 binacle kernel: [ 0.000000] Allocated 4980736 bytes for 2048 pacas at c00000000fb40000
Apr 20 14:28:33 binacle kernel: [ 0.000000] Using PowerNV machine description
Apr 20 14:28:33 binacle kernel: [ 0.000000] Page sizes from device-tree:
Apr 20 14:28:33 binacle kernel: [ 0.000000] base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
Apr 20 14:28:33 binacle kernel: [ 0.000000] base_shift=12: shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7
Apr 20 14:28:33 binacle kernel: [ 0.000000] base_shift=12: shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56
Apr 20 14:28:33 binacle kernel: [ 0.000000] base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1

Can you supply details of how stress-ng is being run?

For this brk stressor test, stress-ng performs rapid heap expansion using the brk() system call, which will force the system to consume all available memory and then transition into a swapping phase. This can lead to an apparent "hung" situation (for example, your shell may be swapped out), even though it is still active and not crashed.

The kernel messages you see in the kern.log are just the kernel OOM killer killing off the best candidate process to free up memory; this free'd memory will be consumed rapidly again by other contending brk stressors so the system will appear to be hung while it is busy cycling around on the killing and spawning of these stressors.

Was the machine ping'able? If so, it's not dead/hung. Perhaps it got power cycled prematurely. What evidence was there that the machine was hung?