some pods are failing during sanity execution due to OOM killer

Bug #1826308 reported by Maria Guadalupe Perez Ibara on 2019-04-25
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Critical
Al Bailey

Bug Description

Brief Description
-----------------
some pods are failing during sanity execution.

Severity
--------
Major

Steps to Reproduce
------------------
kubectl get pods --all-namespaces |egrep -v "Completed|Running"

Expected Behavior
------------------
Pods Running

Actual Behavior
----------------
CrashLoopBackOff

Reproducibility
---------------
100% simplex, duplex - BM

System Configuration
--------------------
simplex & Duplex

Branch/Pull Time/Commit
-----------------------
OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20190424T145954Z"

JOB="STX_build_master_master"
<email address hidden>"
BUILD_NUMBER="78"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-04-24 14:59:54 +0000"

http://mirror.starlingx.cengn.ca/mirror/starlingx/master/centos/20190424T145954Z/

Timestamp/Logs
--------------
controller-0:~$ export OS_CLOUD=openstack_helm

controller-0:~$ openstack service list
Failed to discover available identity versions when contacting http://keystone.openstack.svc.cluster.local/v3. Attempting to parse version from URL.
Unable to establish connection to http://keystone.openstack.svc.cluster.local/v3/auth/tokens: HTTPConnectionPool(host='keystone.openstack.svc.cluster.local', port=80): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f10b0a90d10>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))

controller-0:~$ kubectl get pods --all-namespaces |egrep -v "Completed|Running"
The connection to the server 192.168.206.2:6443 was refused - did you specify the right host or port?

controller-0:~$ less -R /var/log/sysinv.log

controller-0:~$ cat /etc/build.info
###
### StarlingX
### Built from master

controller-0:~$ kubectl get pods --all-namespaces |egrep -v "Completed|Running"
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-jwzbb 0/1 CreateContainerConfigError 8 137m
kube-system coredns-84bb87857f-m5wtr 0/1 Pending 0 137m
kube-system ingress-4qhqh 0/1 CrashLoopBackOff 13 115m
kube-system kube-scheduler-controller-0 0/1 CrashLoopBackOff 9 136m
openstack ingress-6ff8cbb99-2mfkh 0/1 CrashLoopBackOff 14 115m
openstack ingress-error-pages-d48db556-p8dkm 0/1 RunContainerError 9 115m
openstack mariadb-ingress-error-pages-fbfb8b56b-tsfxt 0/1 RunContainerError 11 115m
openstack panko-events-cleaner-1556122200-8b29q 0/1 Init:0/1 0 30m
openstack rbd-provisioner-569784655f-s286v 0/1 CrashLoopBackOff 12 115m

Test Activity
-------------
[Sanity]

Cristopher Lemus (cjlemusc) wrote :
Download full text (6.8 KiB)

Further information gathered from a troubleshoot zoom with Al.

SIMPLEX

Kube-system pods are continuously restarting:

NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-5t4v4 1/1 Running 49 9h
kube-system coredns-84bb87857f-6psdr 1/1 Running 88 9h
kube-system ingress-error-pages-69d8d88bd4-qz6ld 1/1 Running 68 9h
kube-system ingress-q6fgn 1/1 Running 138 9h
kube-system kube-apiserver-controller-0 1/1 Running 108 9h
kube-system kube-controller-manager-controller-0 1/1 Running 46 9h
kube-system kube-scheduler-controller-0 1/1 Running 46 9h
kube-system tiller-deploy-d87d7bd75-zbtc8 1/1 Running 124 9h

The cause seems to be memory usage, Out of Memory kills are logged on /var/log/kern.log, some lines from the log:

2019-04-25T12:22:24.427 controller-0 kernel: err [38511.705026] Out of memory: Kill process 23019 (apache2) score 1002 or sacrifice child
2019-04-25T12:22:42.593 controller-0 kernel: warning [38529.834880] [<ffffffff98188cd3>] out_of_memory+0x4d3/0x510
2019-04-25T12:22:42.638 controller-0 kernel: err [38529.869458] Out of memory: Kill process 23023 (apache2) score 1002 or sacrifice child
2019-04-25T12:22:47.526 controller-0 kernel: warning [38534.756610] [<ffffffff98188cd3>] out_of_memory+0x4d3/0x510
2019-04-25T12:22:47.553 controller-0 kernel: err [38534.773152] Out of memory: Kill process 158263 (cinder-api) score 1002 or sacrifice child
2019-04-25T12:23:16.786 controller-0 kernel: warning [38563.947968] [<ffffffff98188cd3>] out_of_memory+0x4d3/0x510
2019-04-25T12:23:16.796 controller-0 kernel: err [38563.955317] Out of memory: Kill process 157281 (cinder-api) score 1002 or sacrifice child
2019-04-25T12:23:32.242 controller-0 kernel: warning [38579.367722] [<ffffffff98188cd3>] out_of_memory+0x4d3/0x510
2019-04-25T12:23:32.261 controller-0 kernel: err [38579.379438] Out of memory: Kill process 157281 (cinder-api) score 1002 or sacrifice child
2019-04-25T12:23:35.708 controller-0 kernel: warning [38582.825486] [<ffffffff98188cd3>] out_of_memory+0x4d3/0x510
2019-04-25T12:23:35.777 controller-0 kernel: err [38582.886363] Out of memory: Kill process 157281 (cinder-api) score 1002 or sacrifice child

This system has 93GB of memory:

[wrsroot@controller-0 ~(keystone_admin)]$ free -h
              total used free shared buff/cache available
Mem: 93G 86G 663M 72M 6.2G 2.0G
Swap: 0B 0B 0B

[wrsroot@controller-0 ~(keystone_admin)]$ system host-memory-list controller-0
+-----------+---------+------------+---------+----------------+--------+--------+--------+-------+-------------+----------------+----------------+------------------+----------------+---------...

Read more...

Al Bailey (albailey1974) on 2019-04-25
Changed in starlingx:
status: New → Confirmed
Cristopher Lemus (cjlemusc) wrote :

Update about DUPLEX:

After the system was stable (all pods propperly running) and retrying the sanity test, same behavior was detected. PODs are restaring:

kube-system ingress-658v7 1/1 Running 19 5h5m
kube-system ingress-error-pages-59766f99fd-nnb5p 1/1 Running 2 5h5m
kube-system ingress-error-pages-59766f99fd-xfb7m 1/1 Running 6 5h5m
kube-system ingress-wt86d 1/1 Running 3 5h5m
kube-system kube-apiserver-controller-0 1/1 Running 7 5h57m
kube-system kube-apiserver-controller-1 1/1 Running 4 5h24m
kube-system kube-controller-manager-controller-0 1/1 Running 16 5h57m
kube-system kube-controller-manager-controller-1 1/1 Running 15 5h24m
kube-system kube-proxy-9255h 1/1 Running 1 5h58m
kube-system kube-proxy-n25lg 1/1 Running 2 5h24m
kube-system kube-scheduler-controller-0 1/1 Running 17 5h57m
kube-system kube-scheduler-controller-1 1/1 Running 15 5h24m
kube-system tiller-deploy-d87d7bd75-p7k85 0/1 CrashLoopBackOff 30 5h54m

Erich Cordoba (ericho) wrote :

This is the process list sorted by RSS in a simplex system.

Erich Cordoba (ericho) wrote :

And /proc/meminfo output:

MemTotal: 97534648 kB
MemFree: 4779328 kB
MemAvailable: 5895388 kB
Buffers: 100964 kB
Cached: 713940 kB
SwapCached: 0 kB
Active: 13862360 kB
Inactive: 442608 kB
Active(anon): 13518000 kB
Inactive(anon): 41120 kB
Active(file): 344360 kB
Inactive(file): 401488 kB
Unevictable: 5424 kB
Mlocked: 5424 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 13494920 kB
Mapped: 175572 kB
Shmem: 64712 kB
Slab: 4014080 kB
SReclaimable: 1091508 kB
SUnreclaim: 2922572 kB
KernelStack: 73040 kB
PageTables: 195752 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 12796252 kB
Committed_AS: 59455132 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 1357332 kB
VmallocChunk: 34308239356 kB
HardwareCorrupted: 0 kB
CmaTotal: 16384 kB
CmaFree: 1356 kB
HugePages_Total: 34104
HugePages_Free: 34104
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 25972 kB
DirectMap2M: 2818048 kB
DirectMap1G: 98566144 kB

Cristopher Lemus (cjlemusc) wrote :

Attached docker stats from simplex configuration. Almost all containers are limited by the total amount of physical memory on the host (93GB). Is this something expected? Almost all of them are just using less than 1GB of memory. Could this be causing that docker is "holding" the memory from the physical host without actually using it?

Frank Miller (sensfan22) wrote :

Marking release gating and critical since All-in-one systems are impacted and the stx-openstack application is failing to come up on some systems.

Changed in starlingx:
status: Confirmed → Triaged
summary: - some pods are failing during sanity execution.
+ some pods are failing during sanity execution due to OOM killer
Changed in starlingx:
assignee: nobody → Cindy Xie (xxie1)
importance: Undecided → Critical
tags: added: stx.2.0 stx.containers
Cristopher Lemus (cjlemusc) wrote :
Download full text (4.2 KiB)

Hi All,

Some test were made to find the point where the memory is allocated:

Just after `config_controller` it's using just a handful of GBs:

controller-0:~$ free -h
              total used free shared buff/cache available
Mem: 93G 3.2G 84G 47M 5.5G 88G
Swap: 0B 0B 0B
controller-0:~$

Right after the unlock, when the system pass from "offline" status to "intest" it jumps from using 5.1GB to 71GB:

Commands running in parallel, system host-list:

[wrsroot@controller-0 ~(keystone_admin)]$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | disabled | offline |
+----+--------------+-------------+----------------+-------------+--------------+
[wrsroot@controller-0 ~(keystone_admin)]$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | disabled | intest |
+----+--------------+-------------+----------------+-------------+--------------+
[wrsroot@controller-0 ~(keystone_admin)]$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | disabled | intest |
+----+--------------+-------------+----------------+-------------+--------------+
[wrsroot@controller-0 ~(keystone_admin)]$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+

In parallel with free -h:

              total used free shared buff/cache available
Mem: 93G 4.9G 86G 45M 1.7G 87G
Swap: 0B 0B 0B
              total used free shared buff/cache available
Mem: 93G 5.1G 86G 45M 1.9G 86G
Swap: 0B 0B 0B
              total used free shared buff/cache available
Mem: 93G 5.1G 85G 45M 1.9G 86G
Swap: 0B 0B 0B
              total used free shared buff/cache available
Mem: 93G 71G ...

Read more...

Cindy Xie (xxie1) on 2019-04-27
Changed in starlingx:
assignee: Cindy Xie (xxie1) → Yi Wang (wangyi4)

Reviewed: https://review.opendev.org/656037
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=798b8ff4922fabc9b1bb802257c8f482146064c7
Submitter: Zuul
Branch: master

commit 798b8ff4922fabc9b1bb802257c8f482146064c7
Author: Al Bailey <email address hidden>
Date: Fri Apr 26 17:46:41 2019 -0500

    Set the number of nova schedulers to 1

    When applying the stx-openstack manifest, the default
    nova scheduler processes was equal to number of cores.

    Now it is set to 1.

    This reduces overall memory footprint and the time
    to apply the stx-openstack manifest. This reduces
    out of memory issues on systems that have a very large
    number of cores.

    Change-Id: I7f9f41f13d5ae18da2c58d9968a7ee67cbe5a05e
    Fixes-Bug: 1826308
    Signed-off-by: Al Bailey <email address hidden>

Changed in starlingx:
status: Triaged → Fix Released
Cristopher Lemus (cjlemusc) wrote :

With latest ISO, all baremetal configurations are passing sanity test (Green Status):

Regarding memory usage:

Before system is on "available" status:
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | disabled | intest |
+----+--------------+-------------+----------------+-------------+--------------+
              total used free shared buff/cache available
Mem: 93G 5.5G 85G 45M 1.8G 86G
Swap: 0B 0B 0B

Less than 5 seconds after the system is on "available" status:
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+
              total used free shared buff/cache available
Mem: 93G 72G 19G 46M 1.8G 20G
Swap: 0B 0B 0B

So it jumps from using 5.5GB to 72GB, when we reported the bug, the usage was 71GB, almost the same as today.

I'm assuming that docker reserves the memory because the pods/containers are not limited, as we can see on docker stats, almost all containers have their limit set by the total amount of physical memory on the system, i.e.:

CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
e24823b3c8aa k8s_POD_testpod_default_4831b628-6a2b-11e9-9db4-3cfdfebedf30_0 0.00% 4.32MiB / 93.02GiB 0.00% 0B / 0B 0B / 0B 1
1afe1bd620c0 k8s_nova-compute-ssh_nova-compute-controller-0-a762cb46-9486m_openstack_8afe533b-6a2a-11e9-9db4-3cfdfebedf30_0 0.00% 12.51MiB / 93.02GiB 0.01% 0B / 0B 0B / 20.5kB 1
18d5e5ddb8f0 k8s_nova-compute_nova-compute-controller-0-a762cb46-9486m_openstack_8afe533b-6a2a-11e9-9db4-3cfdfebedf30_0 0.06% 197.2MiB / 93.02GiB 0.21% 0B / 0B 1.04GB / 135kB 22

Is this behavior expected? is there a way to properly track down memory usage at docker level?

Thanks!

Frank Miller (sensfan22) on 2019-04-29
Changed in starlingx:
assignee: Yi Wang (wangyi4) → Al Bailey (albailey1974)
Yi Wang (wangyi4) wrote :

Cristopher, based on my analysis, I suppose most memory are reserved for hugepage. I have very similar phenomenon with yours. In my deployment, just after controller-0 unlock, the memory jumped from 4G to 156G (the system has 192G in total).

By check /proc/meminfo, I can see it as below. The hugepage size is 2M. So there are 152G reserved in my system.
"
HugePages_Total: 78218
HugePages_Free: 78218
"

I set a smaller number (such as 512) to /proc/sys/vm/nr_hugepages. Then I can get my memory back. Pls check below log.

controller-0:/home/wrsroot# free -h -k
              total used free shared buff/cache available
Mem: 187G 157G 27G 48M 2.6G 28G
Swap: 0B 0B 0B
controller-0:/home/wrsroot# cat /proc/sys/vm/nr_hugepages
78218
controller-0:/home/wrsroot# echo 512 > /proc/sys/vm/nr_hugepages
controller-0:/home/wrsroot# cat /proc/sys/vm/nr_hugepages
512
controller-0:/home/wrsroot# free -h -k
              total used free shared buff/cache available
Mem: 187G 5.5G 179G 48M 2.6G 180G
Swap: 0B 0B 0B
controller-0:/home/wrsroot# echo 78218 > /proc/sys/vm/nr_hugepages
controller-0:/home/wrsroot# cat /proc/sys/vm/nr_hugepages
78218
controller-0:/home/wrsroot# free -h -k
              total used free shared buff/cache available
Mem: 187G 157G 27G 48M 2.6G 28G
Swap: 0B 0B 0B

Yi Wang (wangyi4) wrote :

I launched a script to check system memory usage just when the system boot up. Here is the log.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers