[2.3] Memtester oom

Bug #1742137 reported by Peter Sabaini
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Lee Trager
2.3
Fix Released
High
Lee Trager

Bug Description

In a cloud with 11 nodes w/ 512Gb each I've consistently got timeouts for memtesting. When running the memtester.sh (with added -x) manually I'm getting the below. This seems similar to Bug #1722848 (feel free to close this as a dup. if you'd rather reopen that issue).

root@x:~# /home/ubuntu/memtester.sh
++ cat /proc/sys/vm/min_free_kbytes
+ min_free_kbytes=90112
++ awk '/MemTotal/ { print (($2 * 0.077) / 10) }' /proc/meminfo
+ reserve=4.06785e+06
+ reserve=4
+ '[' 4 -le 90112 ']'
+ reserve=100352
++ awk -v reserve=100352 '/MemFree/ { print ($2 - reserve) "K"}' /proc/meminfo
+ sudo -n memtester 525184760K 1
memtester version 4.3.0 (64-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 512875MB (537789194240 bytes)
got 512875MB (537789194240 bytes), trying mlock .../home/ubuntu/memtester.sh: line 43: 5409 Killed sudo -n memtester $(awk -v reserve=$reserve '/MemFree/ { print ($2 - reserve) "K"}' /proc/meminfo) 1
root@x# dmesg | tail
[13326.489451] [ 4963] 1000 4963 15316 0 34 4 495 0 (sd-pam)
[13326.489452] [ 5021] 1000 5021 23842 160 48 4 242 0 sshd
[13326.489454] [ 5022] 1000 5022 5362 390 15 3 500 0 bash
[13326.489455] [ 5048] 0 5048 13936 427 32 3 123 0 sudo
[13326.489456] [ 5050] 0 5050 5331 416 15 3 478 0 bash
[13326.489457] [ 5405] 0 5405 2813 364 10 3 55 0 memtester.sh
[13326.489458] [ 5409] 0 5409 13935 427 32 4 121 0 sudo
[13326.489460] [ 5410] 0 5410 131297283 130899442 255671 504 22 0 memtester
[13326.489461] Out of memory: Kill process 5410 (memtester) score 948 or sacrifice child
[13326.499320] Killed process 5410 (memtester) total-vm:525189132kB, anon-rss:523596372kB, file-rss:1396kB, shmem-rss:0kB

I've been able to run memtester after I added this to memtester.sh

sudo sysctl vm.overcommit_memory=2
sudo service snapd stop

...and increased the reserved mem to 2%

(snapd seemed to take quite a bit of mem and seems unnecessary here)

Versions:
maas 2.3.0-6434-gd354690-0ubuntu1~16.04.1
nodes running xenial w/ kernel 4.10

Related branches

Revision history for this message
Lee Trager (ltrager) wrote :

Thanks for the bug report. Looking at the output it appears that the issue is because you are on a high memory system awk is returning the amount of available RAM in e notation. This is causing the kernel minimum value to be used which isn't working on your system.

Could you please test the related branch? It ensures awk always returns an integer which should fix your issue.

Changed in maas:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Lee Trager (ltrager)
milestone: none → 2.4.0alpha1
Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

FTR., have new memtester.sh running on two nodes for ~1.5hrs now, lg so far. Will update once complete

Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.