[2.3] Memtester oom
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
High
|
Lee Trager | ||
2.3 |
Fix Released
|
High
|
Lee Trager |
Bug Description
In a cloud with 11 nodes w/ 512Gb each I've consistently got timeouts for memtesting. When running the memtester.sh (with added -x) manually I'm getting the below. This seems similar to Bug #1722848 (feel free to close this as a dup. if you'd rather reopen that issue).
root@x:~# /home/ubuntu/
++ cat /proc/sys/
+ min_free_
++ awk '/MemTotal/ { print (($2 * 0.077) / 10) }' /proc/meminfo
+ reserve=4.06785e+06
+ reserve=4
+ '[' 4 -le 90112 ']'
+ reserve=100352
++ awk -v reserve=100352 '/MemFree/ { print ($2 - reserve) "K"}' /proc/meminfo
+ sudo -n memtester 525184760K 1
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 512875MB (537789194240 bytes)
got 512875MB (537789194240 bytes), trying mlock .../home/
root@x# dmesg | tail
[13326.489451] [ 4963] 1000 4963 15316 0 34 4 495 0 (sd-pam)
[13326.489452] [ 5021] 1000 5021 23842 160 48 4 242 0 sshd
[13326.489454] [ 5022] 1000 5022 5362 390 15 3 500 0 bash
[13326.489455] [ 5048] 0 5048 13936 427 32 3 123 0 sudo
[13326.489456] [ 5050] 0 5050 5331 416 15 3 478 0 bash
[13326.489457] [ 5405] 0 5405 2813 364 10 3 55 0 memtester.sh
[13326.489458] [ 5409] 0 5409 13935 427 32 4 121 0 sudo
[13326.489460] [ 5410] 0 5410 131297283 130899442 255671 504 22 0 memtester
[13326.489461] Out of memory: Kill process 5410 (memtester) score 948 or sacrifice child
[13326.499320] Killed process 5410 (memtester) total-vm:
I've been able to run memtester after I added this to memtester.sh
sudo sysctl vm.overcommit_
sudo service snapd stop
...and increased the reserved mem to 2%
(snapd seemed to take quite a bit of mem and seems unnecessary here)
Versions:
maas 2.3.0-6434-
nodes running xenial w/ kernel 4.10
Related branches
- Lee Trager (community): Approve
-
Diff: 41 lines (+9/-10)1 file modifiedsrc/metadataserver/builtin_scripts/memtester.sh (+9/-10)
- Andres Rodriguez (community): Approve
- MAAS Lander: Approve
- Blake Rouse (community): Approve
-
Diff: 41 lines (+9/-10)1 file modifiedsrc/metadataserver/builtin_scripts/memtester.sh (+9/-10)
Changed in maas: | |
status: | In Progress → Fix Committed |
Changed in maas: | |
status: | Fix Committed → Fix Released |
Thanks for the bug report. Looking at the output it appears that the issue is because you are on a high memory system awk is returning the amount of available RAM in e notation. This is causing the kernel minimum value to be used which isn't working on your system.
Could you please test the related branch? It ensures awk always returns an integer which should fix your issue.