Bug #1826308 “some pods are failing during sanity execution due ...” : Bugs : StarlingX

Revision history for this message

Maria Guadalupe Perez Ibara (maria-gp) wrote on 2019-04-25:

#1

ALL_NODES_20190424.171919.tar Edit (80.9 MiB, application/x-tar)

Revision history for this message

Cristopher Lemus (cjlemusc) wrote on 2019-04-25:

#2

Download full text (6.8 KiB)

Further information gathered from a troubleshoot zoom with Al.

SIMPLEX

Kube-system pods are continuously restarting:

NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-5t4v4 1/1 Running 49 9h
kube-system coredns-84bb87857f-6psdr 1/1 Running 88 9h
kube-system ingress-error-pages-69d8d88bd4-qz6ld 1/1 Running 68 9h
kube-system ingress-q6fgn 1/1 Running 138 9h
kube-system kube-apiserver-controller-0 1/1 Running 108 9h
kube-system kube-controller-manager-controller-0 1/1 Running 46 9h
kube-system kube-scheduler-controller-0 1/1 Running 46 9h
kube-system tiller-deploy-d87d7bd75-zbtc8 1/1 Running 124 9h

The cause seems to be memory usage, Out of Memory kills are logged on /var/log/kern.log, some lines from the log:

2019-04-25T12:22:24.427 controller-0 kernel: err [38511.705026] Out of memory: Kill process 23019 (apache2) score 1002 or sacrifice child
2019-04-25T12:22:42.593 controller-0 kernel: warning [38529.834880] [<ffffffff98188cd3>] out_of_memory+0x4d3/0x510
2019-04-25T12:22:42.638 controller-0 kernel: err [38529.869458] Out of memory: Kill process 23023 (apache2) score 1002 or sacrifice child
2019-04-25T12:22:47.526 controller-0 kernel: warning [38534.756610] [<ffffffff98188cd3>] out_of_memory+0x4d3/0x510
2019-04-25T12:22:47.553 controller-0 kernel: err [38534.773152] Out of memory: Kill process 158263 (cinder-api) score 1002 or sacrifice child
2019-04-25T12:23:16.786 controller-0 kernel: warning [38563.947968] [<ffffffff98188cd3>] out_of_memory+0x4d3/0x510
2019-04-25T12:23:16.796 controller-0 kernel: err [38563.955317] Out of memory: Kill process 157281 (cinder-api) score 1002 or sacrifice child
2019-04-25T12:23:32.242 controller-0 kernel: warning [38579.367722] [<ffffffff98188cd3>] out_of_memory+0x4d3/0x510
2019-04-25T12:23:32.261 controller-0 kernel: err [38579.379438] Out of memory: Kill process 157281 (cinder-api) score 1002 or sacrifice child
2019-04-25T12:23:35.708 controller-0 kernel: warning [38582.825486] [<ffffffff98188cd3>] out_of_memory+0x4d3/0x510
2019-04-25T12:23:35.777 controller-0 kernel: err [38582.886363] Out of memory: Kill process 157281 (cinder-api) score 1002 or sacrifice child

This system has 93GB of memory:

[wrsroot@controller-0 ~(keystone_admin)]$ free -h
total used free shared buff/cache available
Mem: 93G 86G 663M 72M 6.2G 2.0G
Swap: 0B 0B 0B

[wrsroot@controller-0 ~(keystone_admin)]$ system host-memory-list controller-0
+-----------+---------+------------+---------+----------------+--------+--------+--------+-------+-------------+----------------+----------------+------------------+----------------+---------...

Further information gathered from a troubleshoot zoom with Al.

SIMPLEX

Kube-system pods are continuously restarting:

NAMESPACE     NAME                                                 READY   STATUS             RESTARTS   AGE
kube-system   calico-node-5t4v4                                    1/1     Running            49         9h
kube-system   coredns-84bb87857f-6psdr                             1/1     Running            88         9h
kube-system   ingress-error-pages-69d8d88bd4-qz6ld                 1/1     Running            68         9h
kube-system   ingress-q6fgn                                        1/1     Running            138        9h
kube-system   kube-apiserver-controller-0                          1/1     Running            108        9h
kube-system   kube-controller-manager-controller-0                 1/1     Running            46         9h
kube-system   kube-scheduler-controller-0                          1/1     Running            46         9h
kube-system   tiller-deploy-d87d7bd75-zbtc8                        1/1     Running            124        9h

The cause seems to be memory usage, Out of Memory kills are logged on /var/log/kern.log, some lines from the log:

2019-04-25T12:22:24.427 controller-0 kernel: err [38511.705026] Out of memory: Kill process 23019 (apache2) score 1002 or sacrifice child
2019-04-25T12:22:42.593 controller-0 kernel: warning [38529.834880]  [<ffffffff98188cd3>] out_of_memory+0x4d3/0x510
2019-04-25T12:22:42.638 controller-0 kernel: err [38529.869458] Out of memory: Kill process 23023 (apache2) score 1002 or sacrifice child
2019-04-25T12:22:47.526 controller-0 kernel: warning [38534.756610]  [<ffffffff98188cd3>] out_of_memory+0x4d3/0x510
2019-04-25T12:22:47.553 controller-0 kernel: err [38534.773152] Out of memory: Kill process 158263 (cinder-api) score 1002 or sacrifice child
2019-04-25T12:23:16.786 controller-0 kernel: warning [38563.947968]  [<ffffffff98188cd3>] out_of_memory+0x4d3/0x510
2019-04-25T12:23:16.796 controller-0 kernel: err [38563.955317] Out of memory: Kill process 157281 (cinder-api) score 1002 or sacrifice child
2019-04-25T12:23:32.242 controller-0 kernel: warning [38579.367722]  [<ffffffff98188cd3>] out_of_memory+0x4d3/0x510
2019-04-25T12:23:32.261 controller-0 kernel: err [38579.379438] Out of memory: Kill process 157281 (cinder-api) score 1002 or sacrifice child
2019-04-25T12:23:35.708 controller-0 kernel: warning [38582.825486]  [<ffffffff98188cd3>] out_of_memory+0x4d3/0x510
2019-04-25T12:23:35.777 controller-0 kernel: err [38582.886363] Out of memory: Kill process 157281 (cinder-api) score 1002 or sacrifice child

This system has 93GB of memory:

[wrsroot@controller-0 ~(keystone_admin)]$ free -h
              total        used        free      shared  buff/cache   available
Mem:            93G         86G        663M         72M        6.2G        2.0G
Swap:            0B          0B          0B

[wrsroot@controller-0 ~(keystone_admin)]$ system host-memory-list controller-0
+-----------+---------+------------+---------+----------------+--------+--------+--------+-------+-------------+----------------+----------------+------------------+----------------+----------------+------------------+--------------+
| processor | mem_tot | mem_platfo | mem_ava | hugepages(hp)_ | vs_hp_ | vs_hp_ | vs_hp_ | vs_hp | vm_total_4K | vm_hp_total_2M | vm_hp_avail_2M | vm_hp_pending_2M | vm_hp_total_1G | vm_hp_avail_1G | vm_hp_pending_1G | vm_hp_use_1G |
|           | al(MiB) | rm(MiB)    | il(MiB) | configured     | size(M | total  | avail  | _reqd |             |                |                |                  |                |                |                  |              |
|           |         |            |         |                | iB)    |        |        |       |             |                |                |                  |                |                |                  |              |
+-----------+---------+------------+---------+----------------+--------+--------+--------+-------+-------------+----------------+----------------+------------------+----------------+----------------+------------------+--------------+
| 0         | 31903   | 14500      | 31903   | True           | 1024   | 1      | 0      | None  | 790784      | 13895          | 13895          | None             | 0              | 1              | None             | True         |
| 1         | 45932   | 2000       | 45932   | True           | 1024   | 1      | 0      | None  | 1149440     | 20209          | 20209          | None             | 0              | 1              | None             | True         |
+-----------+---------+------------+---------+----------------+--------+--------+--------+-------+-------------+----------------+----------------+------------------+----------------+----------------+------------------+--------------+

fm alarm also reports memory issues:

| 270.001  | Host controller-0 compute services failure, failed to enable kubernetes services | host=controller-0.services=compute | critical | 2019-04-25T12:25:26.     |
|          |                                                                                  |                                    |          | 585899                   |
|          |                                                                                  |                                    |          |                          |
| 100.103  | Platform Memory threshold exceeded ; threshold 90%, actual 90%                   | host=controller-0.numa=node0       | critical | 2019-04-25T12:16:54.     |
|          |                                                                                  |                                    |          | 929410                   |
|          |                                                                                  |                                    |          |                          |
| 100.103  | Platform Memory threshold exceeded ; threshold 80%, actual 80%                   | host=controller-0                  | major    | 2019-04-25T11:28:54.     |
|          |                                                                                  |                                    |          | 929803

DUPLEX
For duplex, a similar behavior was reported on sanity, where keystone stops responding, the system seems to be stable now, but keystone pod logged restarts:

[wrsroot@controller-0 ~(keystone_admin)]$ kubectl get pods --all-namespaces -o wide |grep -i keys
openstack     keystone-api-5d4679b847-jwpsj                        1/1     Running            1          173m    172.16.166.134   controller-1   <none>           <none>
openstack     keystone-api-5d4679b847-qvk8c                        1/1     Running            5          173m    172.16.192.84    controller-0   <none>           <none>

A new sanity is being executed on Duplex now that the platform is stable. I’ll update the bug with the results.

Al Bailey (albailey1974) on 2019-04-25

Changed in starlingx:
status:	New → Confirmed

Revision history for this message

Cristopher Lemus (cjlemusc) wrote on 2019-04-25:

#3

Update about DUPLEX:

After the system was stable (all pods propperly running) and retrying the sanity test, same behavior was detected. PODs are restaring:

kube-system ingress-658v7 1/1 Running 19 5h5m
kube-system ingress-error-pages-59766f99fd-nnb5p 1/1 Running 2 5h5m
kube-system ingress-error-pages-59766f99fd-xfb7m 1/1 Running 6 5h5m
kube-system ingress-wt86d 1/1 Running 3 5h5m
kube-system kube-apiserver-controller-0 1/1 Running 7 5h57m
kube-system kube-apiserver-controller-1 1/1 Running 4 5h24m
kube-system kube-controller-manager-controller-0 1/1 Running 16 5h57m
kube-system kube-controller-manager-controller-1 1/1 Running 15 5h24m
kube-system kube-proxy-9255h 1/1 Running 1 5h58m
kube-system kube-proxy-n25lg 1/1 Running 2 5h24m
kube-system kube-scheduler-controller-0 1/1 Running 17 5h57m
kube-system kube-scheduler-controller-1 1/1 Running 15 5h24m
kube-system tiller-deploy-d87d7bd75-p7k85 0/1 CrashLoopBackOff 30 5h54m

Revision history for this message

Erich Cordoba (ericho) wrote on 2019-04-25:

#4

Processes by RSS Edit (187.6 KiB, text/plain)

This is the process list sorted by RSS in a simplex system.

Revision history for this message

Erich Cordoba (ericho) wrote on 2019-04-26:

#5

And /proc/meminfo output:

MemTotal: 97534648 kB
MemFree: 4779328 kB
MemAvailable: 5895388 kB
Buffers: 100964 kB
Cached: 713940 kB
SwapCached: 0 kB
Active: 13862360 kB
Inactive: 442608 kB
Active(anon): 13518000 kB
Inactive(anon): 41120 kB
Active(file): 344360 kB
Inactive(file): 401488 kB
Unevictable: 5424 kB
Mlocked: 5424 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 13494920 kB
Mapped: 175572 kB
Shmem: 64712 kB
Slab: 4014080 kB
SReclaimable: 1091508 kB
SUnreclaim: 2922572 kB
KernelStack: 73040 kB
PageTables: 195752 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 12796252 kB
Committed_AS: 59455132 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 1357332 kB
VmallocChunk: 34308239356 kB
HardwareCorrupted: 0 kB
CmaTotal: 16384 kB
CmaFree: 1356 kB
HugePages_Total: 34104
HugePages_Free: 34104
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 25972 kB
DirectMap2M: 2818048 kB
DirectMap1G: 98566144 kB

Revision history for this message

Cristopher Lemus (cjlemusc) wrote on 2019-04-26:

#6

docker_stats.txt Edit (28.9 KiB, text/plain)

Attached docker stats from simplex configuration. Almost all containers are limited by the total amount of physical memory on the host (93GB). Is this something expected? Almost all of them are just using less than 1GB of memory. Could this be causing that docker is "holding" the memory from the physical host without actually using it?

Revision history for this message

Frank Miller (sensfan22) wrote on 2019-04-26:

#7

Marking release gating and critical since All-in-one systems are impacted and the stx-openstack application is failing to come up on some systems.

Changed in starlingx:
status:	Confirmed → Triaged
summary:	- some pods are failing during sanity execution. + some pods are failing during sanity execution due to OOM killer
Changed in starlingx:
assignee:	nobody → Cindy Xie (xxie1)
importance:	Undecided → Critical
tags:	added: stx.2.0 stx.containers

Revision history for this message

Cristopher Lemus (cjlemusc) wrote on 2019-04-26:

#8

Download full text (4.2 KiB)

Hi All,

Some test were made to find the point where the memory is allocated:

Just after `config_controller` it's using just a handful of GBs:

controller-0:~$ free -h
total used free shared buff/cache available
Mem: 93G 3.2G 84G 47M 5.5G 88G
Swap: 0B 0B 0B
controller-0:~$

Right after the unlock, when the system pass from "offline" status to "intest" it jumps from using 5.1GB to 71GB:

Commands running in parallel, system host-list:

In parallel with free -h:

              total used free shared buff/cache available
Mem: 93G 4.9G 86G 45M 1.7G 87G
Swap: 0B 0B 0B
              total used free shared buff/cache available
Mem: 93G 5.1G 86G 45M 1.9G 86G
Swap: 0B 0B 0B
              total used free shared buff/cache available
Mem: 93G 5.1G 85G 45M 1.9G 86G
Swap: 0B 0B 0B
              total used free shared buff/cache available
Mem: 93G 71G ...

Hi All,

Some test were made to find the point where the memory is allocated:

Just after `config_controller` it's using just a handful of GBs:

controller-0:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:            93G        3.2G         84G         47M        5.5G         88G
Swap:            0B          0B          0B
controller-0:~$

Right after the unlock, when the system pass from "offline" status to "intest" it jumps from using 5.1GB to 71GB:

Commands running in parallel, system host-list:

In parallel with free -h:

total        used        free      shared  buff/cache   available
Mem:            93G        4.9G         86G         45M        1.7G         87G
Swap:            0B          0B          0B
              total        used        free      shared  buff/cache   available
Mem:            93G        5.1G         86G         45M        1.9G         86G
Swap:            0B          0B          0B
              total        used        free      shared  buff/cache   available
Mem:            93G        5.1G         85G         45M        1.9G         86G
Swap:            0B          0B          0B
              total        used        free      shared  buff/cache   available
Mem:            93G         71G         19G         45M        1.9G         20G
Swap:            0B          0B          0B
              total        used        free      shared  buff/cache   available
Mem:            93G         71G         19G         46M        1.9G         20G
Swap:            0B          0B          0B

This is just with kube-system pods:

NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-84cdb6bd7c-w75rk   1/1     Running   1          36m
calico-node-zp8xv                          1/1     Running   1          36m
coredns-84bb87857f-lp8sl                   1/1     Running   1          36m
coredns-84bb87857f-r6mdf                   0/1     Pending   0          36m
kube-apiserver-controller-0                1/1     Running   1          35m
kube-controller-manager-controller-0       1/1     Running   2          35m
kube-proxy-w7sfq                           1/1     Running   1          36m
kube-scheduler-controller-0                1/1     Running   2          35m
tiller-deploy-d87d7bd75-hjb7w              1/1     Running   1          36m

Cindy Xie (xxie1) on 2019-04-27

Changed in starlingx:
assignee:	Cindy Xie (xxie1) → Yi Wang (wangyi4)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-04-27: Fix merged to config (master)

#9

Reviewed: https://review.opendev.org/656037
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=798b8ff4922fabc9b1bb802257c8f482146064c7
Submitter: Zuul
Branch: master

commit 798b8ff4922fabc9b1bb802257c8f482146064c7
Author: Al Bailey <email address hidden>
Date: Fri Apr 26 17:46:41 2019 -0500

Set the number of nova schedulers to 1

When applying the stx-openstack manifest, the default
nova scheduler processes was equal to number of cores.

Now it is set to 1.

    This reduces overall memory footprint and the time
    to apply the stx-openstack manifest. This reduces
    out of memory issues on systems that have a very large
    number of cores.

    Change-Id: I7f9f41f13d5ae18da2c58d9968a7ee67cbe5a05e
    Fixes-Bug: 1826308
    Signed-off-by: Al Bailey <email address hidden>

Changed in starlingx:
status:	Triaged → Fix Released

Revision history for this message

Cristopher Lemus (cjlemusc) wrote on 2019-04-29:

#10

With latest ISO, all baremetal configurations are passing sanity test (Green Status):

Regarding memory usage:

So it jumps from using 5.5GB to 72GB, when we reported the bug, the usage was 71GB, almost the same as today.

I'm assuming that docker reserves the memory because the pods/containers are not limited, as we can see on docker stats, almost all containers have their limit set by the total amount of physical memory on the system, i.e.:

CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
e24823b3c8aa k8s_POD_testpod_default_4831b628-6a2b-11e9-9db4-3cfdfebedf30_0 0.00% 4.32MiB / 93.02GiB 0.00% 0B / 0B 0B / 0B 1
1afe1bd620c0 k8s_nova-compute-ssh_nova-compute-controller-0-a762cb46-9486m_openstack_8afe533b-6a2a-11e9-9db4-3cfdfebedf30_0 0.00% 12.51MiB / 93.02GiB 0.01% 0B / 0B 0B / 20.5kB 1
18d5e5ddb8f0 k8s_nova-compute_nova-compute-controller-0-a762cb46-9486m_openstack_8afe533b-6a2a-11e9-9db4-3cfdfebedf30_0 0.06% 197.2MiB / 93.02GiB 0.21% 0B / 0B 1.04GB / 135kB 22

Is this behavior expected? is there a way to properly track down memory usage at docker level?

Thanks!

With latest ISO, all baremetal configurations are passing sanity test (Green Status):

Regarding memory usage:

Less than 5 seconds after the system is on "available" status:
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname     | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1  | controller-0 | controller  | unlocked       | enabled     | available    |
+----+--------------+-------------+----------------+-------------+--------------+
              total        used        free      shared  buff/cache   available
Mem:            93G         72G         19G         46M        1.8G         20G
Swap:            0B          0B          0B

So it jumps from using 5.5GB to 72GB, when we reported the bug, the usage was 71GB, almost the same as today.

I'm assuming that docker reserves the memory because the pods/containers are not limited, as we can see on docker stats, almost all containers have their limit set by the total amount of physical memory on the system, i.e.:

CONTAINER ID        NAME                                                                                                                             CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
e24823b3c8aa        k8s_POD_testpod_default_4831b628-6a2b-11e9-9db4-3cfdfebedf30_0                                                                   0.00%               4.32MiB / 93.02GiB    0.00%               0B / 0B             0B / 0B             1
1afe1bd620c0        k8s_nova-compute-ssh_nova-compute-controller-0-a762cb46-9486m_openstack_8afe533b-6a2a-11e9-9db4-3cfdfebedf30_0                   0.00%               12.51MiB / 93.02GiB   0.01%               0B / 0B             0B / 20.5kB         1
18d5e5ddb8f0        k8s_nova-compute_nova-compute-controller-0-a762cb46-9486m_openstack_8afe533b-6a2a-11e9-9db4-3cfdfebedf30_0                       0.06%               197.2MiB / 93.02GiB   0.21%               0B / 0B             1.04GB / 135kB      22

Is this behavior expected? is there a way to properly track down memory usage at docker level?

Thanks!

Frank Miller (sensfan22) on 2019-04-29

Changed in starlingx:
assignee:	Yi Wang (wangyi4) → Al Bailey (albailey1974)

Revision history for this message

Yi Wang (wangyi4) wrote on 2019-05-08:

#11

Cristopher, based on my analysis, I suppose most memory are reserved for hugepage. I have very similar phenomenon with yours. In my deployment, just after controller-0 unlock, the memory jumped from 4G to 156G (the system has 192G in total).

By check /proc/meminfo, I can see it as below. The hugepage size is 2M. So there are 152G reserved in my system.
"
HugePages_Total: 78218
HugePages_Free: 78218
"

I set a smaller number (such as 512) to /proc/sys/vm/nr_hugepages. Then I can get my memory back. Pls check below log.

controller-0:/home/wrsroot# free -h -k
              total used free shared buff/cache available
Mem: 187G 157G 27G 48M 2.6G 28G
Swap: 0B 0B 0B
controller-0:/home/wrsroot# cat /proc/sys/vm/nr_hugepages
78218
controller-0:/home/wrsroot# echo 512 > /proc/sys/vm/nr_hugepages
controller-0:/home/wrsroot# cat /proc/sys/vm/nr_hugepages
512
controller-0:/home/wrsroot# free -h -k
              total used free shared buff/cache available
Mem: 187G 5.5G 179G 48M 2.6G 180G
Swap: 0B 0B 0B
controller-0:/home/wrsroot# echo 78218 > /proc/sys/vm/nr_hugepages
controller-0:/home/wrsroot# cat /proc/sys/vm/nr_hugepages
78218
controller-0:/home/wrsroot# free -h -k
              total used free shared buff/cache available
Mem: 187G 157G 27G 48M 2.6G 28G
Swap: 0B 0B 0B

Revision history for this message

Yi Wang (wangyi4) wrote on 2019-05-08:

#12

sysmemlog Edit (258.1 KiB, text/plain)

I launched a script to check system memory usage just when the system boot up. Here is the log.

StarlingX

some pods are failing during sanity execution due to OOM killer

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches