Bug #1827258 “OOM seen on worker node after fresh install - mem ...” : Bugs : StarlingX

Revision history for this message

Maria Yousaf (myousaf) wrote on 2019-05-01:

#1

kern.log Edit (2.0 MiB, text/plain)

Revision history for this message

Maria Yousaf (myousaf) wrote on 2019-05-01:

#2

Download full text (23.1 KiB)

Some additional info:

compute-1:~$ sudo lsof -nP -a +L1
Password:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
ovsdb-ser 31924 root 7u REG 0,37 159 0 109669 /tmp/tmpfHlfhxc (deleted)

compute-1:~$ sudo systemd-cgtop -m -b -n1 --depth=6

Path Tasks %CPU Memory Input/s Output/s

/ 302 - 2.8G - -
/user.slice - - 1.1G - -
/user.slice/user-0.slice - - 1.1G - -
/k8s-infra - - 971.0M - -
/k8s-infra/kubepods - - 971.0M - -
/k8s-infra/kubepods/besteffort - - 952.6M - -
/system.slice - - 876.8M - -
/system.slice/ovs-vswitchd.service 1 - 416.4M - -
/k8s-infra/kubepods/besteffort/pod7aa85fa3-6c30-11e9-b5e2-001e67680cba - - 219.4M - -
/k8s-infra/kubepods/beste...efcf34189968c7d19744022e6d03780c33a1b0284a5bc7af98 5 - 199.7M - -
/system.slice/docker.service 20 - 162.6M - -
/k8s-infra/kubepods/besteffort/pod7a58b2e8-6c30-11e9-b5e2-001e67680cba - - 157.0M - -
/k8s-infra/kubepods/besteffort/pod7a928ea9-6c30-11e9-b5e2-001e67680cba - - 151.4M - -
/k8s-infra/kubepods/beste...3f681d0ea155a876292f94041a3581f623639b6ca71c464792 2 - 149.3M - -
/k8s-infra/kubepods/besteffort/pod7b223017-6c30-11e9-b5e2-001e67680cba - - 147.7M - -
/k8s-infra/kubepods/beste...27cc220c99d696a85d46f2f1c5b1ea415a3ce2097b1509dd01 11 - 143.7M - -
/k8s-infra/kubepods/beste...ae615c4797728d533eb040a58eeec70c0faf9587742dafd636 1 - 128.9M - -
/k8s-infra/kubepods/besteffort/pod7ad7ef3b-6c30-11e9-b5e2-001e67680cba - - 117.0M - -
/k8s-infra/kubepods/besteffort/pod7a42d449-6c30-11e9-b5e2-001e67680cba - - 110.9M - -
/k8s-infra/kubepods/beste...36ef8b375482506848519026ce48f537c344e43eaa4e2245c6 1 - 109.5M -

compute-1:~$ ipcs

------ Message Queues --------
key msqid owner perms used-bytes messages

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 0 root 644 80 2
0x00000000 32769 root 644 16384 2 ...

Some additional info:

compute-1:~$ sudo lsof -nP -a +L1
Password: 
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NLINK   NODE NAME
ovsdb-ser 31924 root    7u   REG   0,37      159     0 109669 /tmp/tmpfHlfhxc (deleted)

compute-1:~$ sudo systemd-cgtop -m -b -n1 --depth=6

Path                                                                             Tasks   %CPU   Memory  Input/s Output/s

/                                                                                  302      -     2.8G        -        -
/user.slice                                                                          -      -     1.1G        -        -
/user.slice/user-0.slice                                                             -      -     1.1G        -        -
/k8s-infra                                                                           -      -   971.0M        -        -
/k8s-infra/kubepods                                                                  -      -   971.0M        -        -
/k8s-infra/kubepods/besteffort                                                       -      -   952.6M        -        -
/system.slice                                                                        -      -   876.8M        -        -
/system.slice/ovs-vswitchd.service                                                   1      -   416.4M        -        -
/k8s-infra/kubepods/besteffort/pod7aa85fa3-6c30-11e9-b5e2-001e67680cba               -      -   219.4M        -        -
/k8s-infra/kubepods/beste...efcf34189968c7d19744022e6d03780c33a1b0284a5bc7af98       5      -   199.7M        -        -
/system.slice/docker.service                                                        20      -   162.6M        -        -
/k8s-infra/kubepods/besteffort/pod7a58b2e8-6c30-11e9-b5e2-001e67680cba               -      -   157.0M        -        -
/k8s-infra/kubepods/besteffort/pod7a928ea9-6c30-11e9-b5e2-001e67680cba               -      -   151.4M        -        -
/k8s-infra/kubepods/beste...3f681d0ea155a876292f94041a3581f623639b6ca71c464792       2      -   149.3M        -        -
/k8s-infra/kubepods/besteffort/pod7b223017-6c30-11e9-b5e2-001e67680cba               -      -   147.7M        -        -
/k8s-infra/kubepods/beste...27cc220c99d696a85d46f2f1c5b1ea415a3ce2097b1509dd01      11      -   143.7M        -        -
/k8s-infra/kubepods/beste...ae615c4797728d533eb040a58eeec70c0faf9587742dafd636       1      -   128.9M        -        -
/k8s-infra/kubepods/besteffort/pod7ad7ef3b-6c30-11e9-b5e2-001e67680cba               -      -   117.0M        -        -
/k8s-infra/kubepods/besteffort/pod7a42d449-6c30-11e9-b5e2-001e67680cba               -      -   110.9M        -        -
/k8s-infra/kubepods/beste...36ef8b375482506848519026ce48f537c344e43eaa4e2245c6       1      -   109.5M        -

compute-1:~$ ipcs

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x00000000 0          root       644        80         2                       
0x00000000 32769      root       644        16384      2                       
0x00000000 65538      root       644        280        2

------ Semaphore Arrays --------
key        semid      owner      perms      nsems     
0x000000a7 65536      root       600        1

compute-1:~$ PSS_MiB=$(cat /proc/*/smaps 2>/dev/null | awk '/^Pss:/ {a += $2;} END {printf "%d\n", a/1024.0;}')
compute-1:~$ echo $PSS_MiB
3

compute-1:~$ ps -eo pid,nlwp,rss:10,psr,comm,args --sort=-rss | \
> awk '/COMM/ || ($3>0) {r+=$3; print} END {printf "TOTAL = %.1f MiB\n", r/1024.0}'
   PID NLWP        RSS PSR COMMAND         COMMAND
 33738   50     435380   0 ovs-vswitchd    ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
108873   22     119268   0 nova-compute    /var/lib/openstack/bin/python /var/lib/openstack/bin/nova-compute --config-file /etc/nova/nova.conf --config-file /tmp/pod-shared/nova-libvirt.conf
108008    1     110628   0 /var/lib/openst /var/lib/openstack/bin/python /var/lib/openstack/bin/neutron-openvswitch-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /tmp/pod-shared/ml2-local-ip.ini --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini
 39926    1     100276   0 /var/lib/openst /var/lib/openstack/bin/python /var/lib/openstack/bin/neutron-sriov-nic-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/plugins/ml2/sriov_agent.ini
 40861    1      94736   0 /var/lib/openst /var/lib/openstack/bin/python /var/lib/openstack/bin/neutron-dhcp-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/dhcp_agent.ini --config-file /etc/neutron/metadata_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini
 41020    1      94176   0 /var/lib/openst /var/lib/openstack/bin/python /var/lib/openstack/bin/neutron-l3-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/l3_agent.ini --config-file /etc/neutron/metadata_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini
 41430    1      87948   0 /var/lib/openst /var/lib/openstack/bin/python /var/lib/openstack/bin/neutron-metadata-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/metadata_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini
 43006    1      81672   0 /var/lib/openst /var/lib/openstack/bin/python /var/lib/openstack/bin/neutron-metadata-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/metadata_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini
 43007    1      81672   0 /var/lib/openst /var/lib/openstack/bin/python /var/lib/openstack/bin/neutron-metadata-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/metadata_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini
 43008    1      81672   0 /var/lib/openst /var/lib/openstack/bin/python /var/lib/openstack/bin/neutron-metadata-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/metadata_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini
 43009    1      81672   0 /var/lib/openst /var/lib/openstack/bin/python /var/lib/openstack/bin/neutron-metadata-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/metadata_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini
 43010    1      81672   0 /var/lib/openst /var/lib/openstack/bin/python /var/lib/openstack/bin/neutron-metadata-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/metadata_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini
 43011    1      81672   0 /var/lib/openst /var/lib/openstack/bin/python /var/lib/openstack/bin/neutron-metadata-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/metadata_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini
 43012    1      81672   0 /var/lib/openst /var/lib/openstack/bin/python /var/lib/openstack/bin/neutron-metadata-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/metadata_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini
 43013    1      81672   0 /var/lib/openst /var/lib/openstack/bin/python /var/lib/openstack/bin/neutron-metadata-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/metadata_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini
 43014    1      81672   0 /var/lib/openst /var/lib/openstack/bin/python /var/lib/openstack/bin/neutron-metadata-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/metadata_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini
 43015    1      81672   0 /var/lib/openst /var/lib/openstack/bin/python /var/lib/openstack/bin/neutron-metadata-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/metadata_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini
 34865   23      80224   0 kubelet         /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --feature-gates=HugePages=false
 11640   13      74504   0 dockerd         /usr/bin/dockerd
109033   21      56168   0 privsep-helper  /var/lib/openstack/bin/python /var/lib/openstack/bin/privsep-helper --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /tmp/pod-shared/ml2-local-ip.ini --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --privsep_context neutron.privileged.default --privsep_sock_path /tmp/tmpfAhNqq/privsep.sock
 12503    1      54632   0 sysinv-agent    /usr/bin/python /usr/bin/sysinv-agent
 44305    3      52808   0 privsep-helper  /var/lib/openstack/bin/python /var/lib/openstack/bin/privsep-helper --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/l3_agent.ini --config-file /etc/neutron/metadata_agent.ini --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --privsep_context neutron.privileged.default --privsep_sock_path /tmp/tmpdtvSHc/privsep.sock
 11650   11      34020   0 collectd        /usr/sbin/collectd
111213    1      26604   0 python          python /tmp/health-probe.py --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --agent-queue-name q-agent-notifier-tunnel-update --liveness-probe
 11806   28      24672   0 docker-containe docker-containerd --config /var/run/docker/containerd/containerd.toml
 12507    1      22448   0 python          python /usr/sbin/sw-patch-agent
 11663    5      13728   0 tuned           /usr/bin/python -Es /usr/sbin/tuned -l -P
 35549    6       9204   0 kube-proxy      /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=compute-1
 12570    1       7556   0 python          python /usr/bin/logmgmt start
 41005   17       5956   0 libvirtd        /usr/sbin/libvirtd --listen
110952    1       5528   0 sshd            sshd: wrsroot [priv]
 11623    1       5316   0 iscsid          /usr/sbin/iscsid
 10824    1       5220   0 syslog-ng       /usr/sbin/syslog-ng -F -p /var/run/syslog-ng/syslog-ng.pid
     1    1       5084   0 systemd         /usr/lib/systemd/systemd --switched-root --system --deserialize 21
 10688    7       4944   0 polkitd         /usr/lib/polkit-1/polkitd --no-debug
108840   10       3616   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/cfaf2c7128c989ae615c4797728d533eb040a58eeec70c0faf9587742dafd636 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
 34933    1       3352   0 pmond           /usr/local/bin/pmond
111013    1       3320   0 sh              -sh
111710    1       3320   0 python          python /tmp/health-probe.py --config-file /etc/nova/nova.conf --service-queue-name compute
107965   10       3296   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/32c6ff2a8677afefcf34189968c7d19744022e6d03780c33a1b0284a5bc7af98 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
 34902    1       3096   0 guestServer     /usr/local/bin/guestServer -l
 34923    1       3044   0 mtcClient       /usr/local/bin/mtcClient -l
 31924    1       3040   0 ovsdb-server    ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
 34892    1       2992   0 hbsClient       /usr/local/bin/hbsClient -l
 11682    1       2924   0 lldpd           /usr/sbin/lldpd -k
111000    1       2852   0 sshd            sshd: wrsroot@pts/0
 34839    1       2620   0 mtcalarmd       /usr/local/bin/mtcalarmd -l
 11383    1       2212   0 dhclient        /sbin/dhclient -lf /var/run/dhclient--enp11s0f1.lease --restrict-interfaces -pf /var/run/dhclient-enp11s0f1.pid -H compute-1 enp11s0f1
112374    1       2128   0 ps              ps -eo pid,nlwp,rss:10,psr,comm,args --sort=-rss
 34899    1       1928   0 lmond           /usr/local/bin/lmond
111046    1       1924   0 sh              -sh
 10664    1       1916   0 dbus-daemon     /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
 31791    6       1904   0 nslcd           /usr/sbin/nslcd
 32332    1       1832   0 ntpd            /usr/sbin/ntpd -u ntp:ntp -p /var/run/ntp.pid
  6576    1       1820   0 systemd-udevd   /usr/lib/systemd/systemd-udevd
 34877    1       1696   0 mtclogd         /usr/local/bin/mtclogd -l
 11625    1       1688   0 rpc.statd       /usr/sbin/rpc.statd
 11698    1       1680   0 lldpd           /usr/sbin/lldpd -k
 10715    1       1676   0 systemd-logind  /usr/lib/systemd/systemd-logind
 10661    1       1636   0 smartd          /usr/sbin/smartd -n -q never
 11686    1       1544   0 sshd            /usr/sbin/sshd
 34894    1       1528   0 fsmond          /usr/local/bin/fsmond -l
 12391    1       1516   0 crond           /usr/sbin/crond -n
 12484    1       1388   0 sh              /bin/sh /usr/bin/nfscheck.sh
 10662    1       1380   0 rpcbind         /sbin/rpcbind -w
 34956    1       1376   0 hostwd          /usr/local/bin/hostwd
 11667    2       1368   0 sm-eru          /usr/bin/sm-eru
  6554    1       1348   0 systemd-journal /usr/lib/systemd/systemd-journald
 10685    6       1324   0 gssproxy        /usr/sbin/gssproxy -D
 35242   10       1180   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/213410db7179ab99f7344a2497a556b8c2324c5903c6b4a0e18ab49333a37618 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
 40875   10       1132   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/d8416c1a7dbfbd3f681d0ea155a876292f94041a3581f623639b6ca71c464792 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
 35531   10       1120   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/c67f9fb4a15189aeb64be9567c2b1fe0c450874c9f082df76611570cb02bfe27 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
 11904    1       1100   0 rpc.mountd      /usr/sbin/rpc.mountd -f /etc/exports 
 66686    1       1100   0 anacron         /usr/sbin/anacron -s
111920    1       1100   0 python          python /tmp/health-probe.py --config-file /etc/nova/nova.conf --service-queue-name compute --liveness-probe
 39000   10       1072   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/e58b08063f19619ea30648d33cd8def56af559377e5150d1dba51968c0af8505 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
 43273    1       1044   0 sshd            /usr/sbin/sshd -D -e -o Port=8022
 39908   10       1040   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/a5c1280576c0cf36ef8b375482506848519026ce48f537c344e43eaa4e2245c6 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
 40792   10       1000   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/0daa90c0e260f49b01a28698c69400a003e4014ce282be31b4dfd1402feb7f3c -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
 36751   10        992   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/d63c0af84847b6dae889fe9e894688198540d1bc3be6116666f885c8d7aa69c8 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
 43255   10        984   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/e6f1b1d120f53c1a3f9db573f15aa2aaa7c70e68f2bfde96b81d9f8fa73ab717 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
112375    1        976   0 awk             awk /COMM/ || ($3>0) {r+=$3; print} END {printf "TOTAL = %.1f MiB\n", r/1024.0}
 36309   10        972   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/9d45671c34cf5a70b7bf1652555fbb5492fb212fb12e884729d3527aa3e308f0 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
 40831   10        968   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/f7cbabef2899b4358d352ecb57e4411b5a6fd7aa22065795c53f31ce67cce549 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
 41381   10        964   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/2982afac5e040a27cc220c99d696a85d46f2f1c5b1ea415a3ce2097b1509dd01 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
110247    1        952   0 ovsdb-client    ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json
 37829   10        932   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/d2613efddb9a6cdde285aa5b337739f6264bf92d8018a92c69f4800d20d671fa -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
 35421   10        876   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/fb7a040463a2751455f04cdf474715b85dfe2ef6dbb14bbe3c72fc622378ba1c -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
 38193   10        876   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/ab8dd8bcae3fad34ded45b9256b89728ccf8eae8b4ac04293fcce9097d369ab7 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
 37146   10        868   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/449f0929b552eaa5f4b9eb97cbb6f2fb0dd923764cbf62b5d531b9e41f1c5729 -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
 37392   10        832   0 docker-containe docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/4219a0280a4ee4ac3c74b1054e6696518b5e6aefe82c52f3360fa8e27908230c -address /var/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /var/run/docker/runtime-runc
 34889    1        796   0 agetty          /sbin/agetty --noclear tty1 linux
 10748    1        784   0 rsync           /usr/bin/rsync --daemon --no-detach
 34897    1        772   0 agetty          /sbin/agetty --keep-baud 115200 38400 9600 ttyS0 vt220
 31671    1        676   0 fm_event_syslog /bin/bash /usr/sbin/fm_event_syslogger
 11679    1        536   0 rpc.idmapd      /usr/sbin/rpc.idmapd
 10829    1        508   0 acpid           /usr/sbin/acpid
 11621    1        508   0 iscsid          /usr/sbin/iscsid
112011    1        360   0 sleep           sleep 60
112340    1        292   0 bash            bash -c ovs-vsctl list-br | grep -q br-int
 35274    1          4   0 pause           /pause
 35439    1          4   0 pause           /pause
 36327    1          4   0 pause           /pause
 36769    1          4   0 pause           /pause
 37164    1          4   0 pause           /pause
 37410    1          4   0 pause           /pause
 37883    1          4   0 pause           /pause
 38228    1          4   0 pause           /pause
 39072    1          4   0 pause           /pause
TOTAL = 2392.0 MiB

compute-1:~$ sudo grep dirty /proc/*/numa_maps |awk '/kernelpagesize_kB=/' |grep -v '=4'
/proc/33738/numa_maps:7fb400000000 default file=/mnt/huge-1048576kB/rtemap_1 huge dirty=1 N1=1 kernelpagesize_kB=1048576
/proc/33738/numa_maps:7fb480000000 default file=/mnt/huge-1048576kB/rtemap_0 huge dirty=1 N0=1 kernelpagesize_kB=1048576

compute-1:~$ grep -rs HugePages /sys/devices/system/node
/sys/devices/system/node/node0/meminfo:Node 0 HugePages_Total: 14272
/sys/devices/system/node/node0/meminfo:Node 0 HugePages_Free:  14272
/sys/devices/system/node/node0/meminfo:Node 0 HugePages_Surp:      0
/sys/devices/system/node/node1/meminfo:Node 1 HugePages_Total: 14387
/sys/devices/system/node/node1/meminfo:Node 1 HugePages_Free:  14387
/sys/devices/system/node/node1/meminfo:Node 1 HugePages_Surp:      0

Numan Waheed (nwaheed) on 2019-05-06

tags:

added: stx.retestneeded

Revision history for this message

Maria Yousaf (myousaf) wrote on 2019-05-07:

#3

I'm seeing this again on a worker node in IP33-36 (worker-0) running load: 20190505T233000Z:

compute-0:~$ [59763.882249] Out of memory: Kill process 635670 (python) score 0
or sacrifice child
[59764.055873] Killed process 635741 (smart) total-vm:17480kB, anon-rss:2328kB,
file-rss:716kB, shmem-rss:0kB

Looks like it might be the order=0 memory issue.

Revision history for this message

Maria Yousaf (myousaf) wrote on 2019-05-07:

#4

LP1827258.tar Edit (138.5 MiB, application/x-tar)

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-05-08:

#5

@Maria, was the stx-openstack application running at the time the oom error was reported?

Changed in starlingx:
status:	New → Incomplete

Revision history for this message

Cindy Xie (xxie1) wrote on 2019-05-09:

#6

@Maria, do you think the issue is similar as this one? https://bugs.launchpad.net/starlingx/+bug/1826308
this was marked as "fix released" but we think it still there.

Revision history for this message

Maria Yousaf (myousaf) wrote on 2019-05-09:

#7

I'm not sure Cindy. I think someone will need to look at the logs and triage it, to see if it is the same issue or different.

Revision history for this message

Maria Yousaf (myousaf) wrote on 2019-05-10:

#8

@Ghada, I believe application-apply was done when the original issue was seen but I'm not 100% sure. Jim Gauld helped me investigate. He may have some additional information.

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-05-15:

#9

Marking as gating for now and will monitor for a re-occurrence.

Changed in starlingx:
importance:	Undecided → Medium
status:	Incomplete → Triaged
assignee:	nobody → Gerry Kopec (gerry-kopec)
tags:	added: stx.2.0 stx.distro.other

Revision history for this message

Bin Yang (byangintel) wrote on 2019-05-30:

#10

@Maria

from compute-0_20190507.124154/var/log/kern.log
===============================================

    [59763.881672] sm_debug invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
    ...
    [59763.882079] Node 0 DMA free:15848kB min:60kB low:72kB high:88kB
    [59763.882083] lowmem_reserve[]: 0 2800 15838 15838
    [59763.882085] Node 0 DMA32 free:63564kB min:11480kB low:14348kB high:17220kB
    [59763.882088] lowmem_reserve[]: 0 0 13038 13038
    [59763.882090] Node 0 Normal free:53384kB min:53464kB low:66828kB high:80196kB
    [59763.882092] lowmem_reserve[]: 0 0 0 0
    [59763.882094] Node 1 Normal free:66048kB min:66060kB low:82572kB high:99088kB
    [59763.882097] lowmem_reserve[]: 0 0 0 0
    ...
    [59763.882249] Out of memory: Kill process 635670 (python) score 0 or sacrifice child

gfp_mask=0x201da: it means the page allocation is from ZONE_NORMAL

But
Node 0 Normal, free:53384kB < min:53464kB: it has no enough space
Node 1 Normal, free:66048kB < min:66060kB: it also has no enough space
Since the free < min, kernel will start oom killer.

from compute-1_20190507.124154/var/log/kern.log
===============================================
[ 1515.471830] calico-node invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=999
...
[ 1515.471950] Node 0 DMA free:15848kB min:60kB low:72kB high:88kB
[ 1515.471954] lowmem_reserve[]: 0 2800 15838 15838
[ 1515.471956] Node 0 DMA32 free:63620kB min:11480kB low:14348kB high:17220kB
[ 1515.471959] lowmem_reserve[]: 0 0 13038 13038
[ 1515.471961] Node 0 Normal free:53148kB min:53464kB low:66828kB high:80196kB
[ 1515.471964] lowmem_reserve[]: 0 0 0 0
...
[ 1515.472142] Out of memory: Kill process 60356 (kubernetes-entr) score 1000 or sacrifice child

gfp_mask=0x201da: it means the page allocation is from ZONE_NORMAL

But
Node 0 Normal, free:53148kB < min:53464kB: it has no enough space
Since the free < min, kernel will start oom killer.

Conclusion:
===========
The kernel has no enough memory for page allocation. "order=0" means 1 page is required.
The fact is "no free memory which causes oom".
So it is not a bug of "mem available but out of order 0 pages".

@Maria

from compute-0_20190507.124154/var/log/kern.log
===============================================

[59763.881672] sm_debug invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
    ...
    [59763.882079] Node 0 DMA free:15848kB min:60kB low:72kB high:88kB
    [59763.882083] lowmem_reserve[]: 0 2800 15838 15838
    [59763.882085] Node 0 DMA32 free:63564kB min:11480kB low:14348kB high:17220kB
    [59763.882088] lowmem_reserve[]: 0 0 13038 13038
    [59763.882090] Node 0 Normal free:53384kB min:53464kB low:66828kB high:80196kB
    [59763.882092] lowmem_reserve[]: 0 0 0 0
    [59763.882094] Node 1 Normal free:66048kB min:66060kB low:82572kB high:99088kB
    [59763.882097] lowmem_reserve[]: 0 0 0 0
    ...
    [59763.882249] Out of memory: Kill process 635670 (python) score 0 or sacrifice child

gfp_mask=0x201da: it means the page allocation is from ZONE_NORMAL

But
    Node 0 Normal, free:53384kB < min:53464kB: it has no enough space
    Node 1 Normal, free:66048kB < min:66060kB: it also has no enough space
Since the free < min, kernel will start oom killer.

from compute-1_20190507.124154/var/log/kern.log
===============================================
[ 1515.471830] calico-node invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=999
...
[ 1515.471950] Node 0 DMA free:15848kB min:60kB low:72kB high:88kB
[ 1515.471954] lowmem_reserve[]: 0 2800 15838 15838
[ 1515.471956] Node 0 DMA32 free:63620kB min:11480kB low:14348kB high:17220kB
[ 1515.471959] lowmem_reserve[]: 0 0 13038 13038
[ 1515.471961] Node 0 Normal free:53148kB min:53464kB low:66828kB high:80196kB
[ 1515.471964] lowmem_reserve[]: 0 0 0 0
...
[ 1515.472142] Out of memory: Kill process 60356 (kubernetes-entr) score 1000 or sacrifice child

gfp_mask=0x201da: it means the page allocation is from ZONE_NORMAL

But
    Node 0 Normal, free:53148kB < min:53464kB: it has no enough space
Since the free < min, kernel will start oom killer.

Conclusion:
===========
The kernel has no enough memory for page allocation. "order=0" means 1 page is required.
The fact is "no free memory which causes oom".
So it is not a bug of "mem available but out of order 0 pages".

Revision history for this message

Bin Yang (byangintel) wrote on 2019-06-04:

#11

@Maria

from compute-1_20190507.124154/var/log/kern.log
===============================================
kernel: info [ 1515.776621] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
...
kernel: info [ 1515.776671] [40564] 0 40564 34049530 71961 64718 0 0 ovs-vswitchd

from: include/linux/mm_types.h
==============================
unsigned long total_vm; /* Total pages mapped */

What workload did you run? it looks the mapped pages of ovs-vswitchd is huge.

Revision history for this message

Cindy Xie (xxie1) wrote on 2019-06-07:

#12

assign to @Bin Yang as he is looking into this bug already.

Changed in starlingx:
assignee:	Gerry Kopec (gerry-kopec) → Bin Yang (byangintel)

Revision history for this message

Bin Yang (byangintel) wrote on 2019-06-10:

#13

Based on the logs, it should not be a bug.

Changed in starlingx:
status:	Triaged → Invalid

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-06-14:

#14

Can you elaborate on why this is not an issue? The system ran out of 4k pages. This is not expected under any condition.

Revision history for this message

Bin Yang (byangintel) wrote on 2019-06-15:

#15

As I mentioned above, " free < min, kernel will start oom killer".

And you can find the kernel code for this logic:
mm/page_alloc.c:
__zone_watermark_ok()
=============================

        /*
         * Check watermarks for an order-0 allocation request. If these
         * are not met, then a high-order request also cannot go ahead
         * even if a suitable page happened to be free.
         */
        if (free_pages <= min + z->lowmem_reserve[classzone_idx])
                return false;

        /* If this is an order-0 request then the watermark is fine */
        if (!order)
                return true;

And here is a related document:
https://www.kernel.org/doc/Documentation/sysctl/vm.txt
min_free_kbytes:
================

This is used to force the Linux VM to keep a minimum number
of kilobytes free. The VM uses this number to compute a
watermark[WMARK_MIN] value for each lowmem zone in the system.
Each lowmem zone gets a number of reserved free pages based
proportionally on its size.

Some minimal amount of memory is needed to satisfy PF_MEMALLOC
allocations; if you set this to lower than 1024KB, your system will
become subtly broken, and prone to deadlock under high loads.

Setting this too high will OOM your machine instantly.

And the watermarks are setup base on this kernel parameter:
========================
mm/page_alloc.c
init_per_zone_wmark_min()

The default setting is calculated based on your total system memory size. I think min watermark = 53464kB is reasonable.

Revision history for this message

Brent Rowsell (brent-rowsell) wrote on 2019-06-17:

#16

The issue is we ran out of free kernel memory. We should never run out under any circumstances.
This LP needs to determine why we ran out and address the issue.

Changed in starlingx:
status:	Invalid → Confirmed

Revision history for this message

Bin Yang (byangintel) wrote on 2019-06-23:

#17

from compute-1_20190507.124154/var/log/kern.log
    2019-05-06T16:24:12.266 localhost kernel: debug [ 0.000000] On node 0 totalpages: 4174118
    ...
    2019-05-06T17:28:56.749 compute-1 kernel: info [ 1515.471986] Node 0 hugepages_total=1 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
    2019-05-06T17:28:56.749 compute-1 kernel: info [ 1515.471987] Node 0 hugepages_total=6807 hugepages_free=6807 hugepages_surp=0 hugepages_size=2048kB
    ...

from hieradata/192.168.204.77.yaml:
    platform::compute::hugepage::params::vm_2M_pages: '"7024,7172"'
    ...
    platform::compute::params::worker_base_reserved: ("node0:8000MB:1" "node1:2000MB:1")

from puppet.log
    ...
    Exec[Allocate 7024 /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages]
    ...
    Exec[Allocate 7172 /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages]
    ...

The total memory on node 0 is 16GB;

From kenrel log, 2M hugepage is 6807 which is smaller than 7024. It looks system has no 7024 2M pages
Total expected hugepage size is: 7024*2M + 1G = 14.7GB

It is not reasonable. We have several reserved memory resources as below:
    1. 8GB reserved by worker_reserved.conf:
        WORKER_BASE_RESERVED=("node0:8000MB:1" "node1:2000MB:1")
    2. 10% reserved by below code:
        sysinv host.py: vm_hugepages_nr_2M = int(m.vm_hugepages_possible_2M * 0.9)

vm_hugepages_possible_2M is calculated by _inode_get_memory_hugepages() function as below logic:

    node_total_kb = total_hp_mb * SIZE_KB + free_kb + pss_mb * SIZE_KB
        total_hp_mb is 0 since 2M hugepage is not reserved by kernel command line
        free_kb is from /sys/devices/system/node/node0/meminfo
        pss_mb is collected from /proc/*/smaps

    vm_hugepages_possible_2M: node_total_kb - base_mem_mb - vswitch_mem_kb
        base_mem_mb is 8GB from WORKER_BASE_RESERVED in worker_reserved.conf
        vswitch_mem_kb is 1GB from COMPUTE_VSWITCH_MEMORY in worker_reserved.conf

So far, the vm_hugepages_possible_2M is always corrected on Shanghai bare metal tests.

Could reporters help to provide more info while this bug is triggered?
1. run system host-memory-list <compute node>
2. run cat /proc/sys/vm/overcommit_* #the mode will impact the free_kb calculation
3. run cat /proc/*/smaps 2>/dev/null | awk '/^Pss:/ {a += $2;} END {printf "%d\n", a/1024.0;}' on compute nodes
4. run cat /sys/devices/system/node/node*/meminfo on compute nodes

thanks,
Bin

from compute-1_20190507.124154/var/log/kern.log
    2019-05-06T16:24:12.266 localhost kernel: debug [    0.000000] On node 0 totalpages: 4174118
    ...
    2019-05-06T17:28:56.749 compute-1 kernel: info [ 1515.471986] Node 0 hugepages_total=1 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
    2019-05-06T17:28:56.749 compute-1 kernel: info [ 1515.471987] Node 0 hugepages_total=6807 hugepages_free=6807 hugepages_surp=0 hugepages_size=2048kB
    ...

from hieradata/192.168.204.77.yaml:
    platform::compute::hugepage::params::vm_2M_pages: '"7024,7172"'
    ...
    platform::compute::params::worker_base_reserved: ("node0:8000MB:1" "node1:2000MB:1")

from puppet.log
    ...
    Exec[Allocate 7024 /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages]
    ...
    Exec[Allocate 7172 /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages]
    ...

The total memory on node 0 is 16GB;

From kenrel log, 2M hugepage is 6807 which is smaller than 7024. It looks system has no 7024 2M pages
Total expected hugepage size is: 7024*2M + 1G = 14.7GB

It is not reasonable. We have several reserved memory resources as below:
    1. 8GB reserved by worker_reserved.conf:
        WORKER_BASE_RESERVED=("node0:8000MB:1" "node1:2000MB:1")
    2. 10% reserved by below code:
        sysinv host.py: vm_hugepages_nr_2M = int(m.vm_hugepages_possible_2M * 0.9)

vm_hugepages_possible_2M is calculated by _inode_get_memory_hugepages() function as below logic:

node_total_kb = total_hp_mb * SIZE_KB + free_kb + pss_mb * SIZE_KB
        total_hp_mb is 0 since 2M hugepage is not reserved by kernel command line
        free_kb is from /sys/devices/system/node/node0/meminfo
        pss_mb is collected from /proc/*/smaps

vm_hugepages_possible_2M: node_total_kb - base_mem_mb - vswitch_mem_kb
        base_mem_mb is 8GB from WORKER_BASE_RESERVED in worker_reserved.conf
        vswitch_mem_kb is 1GB from COMPUTE_VSWITCH_MEMORY in worker_reserved.conf

So far, the vm_hugepages_possible_2M is always corrected on Shanghai bare metal tests.

Could reporters help to provide more info while this bug is triggered?
1. run system host-memory-list <compute node>
2. run cat /proc/sys/vm/overcommit_*  #the mode will impact the free_kb calculation
3. run cat /proc/*/smaps 2>/dev/null | awk '/^Pss:/ {a += $2;} END {printf "%d\n", a/1024.0;}' on compute nodes
4. run cat /sys/devices/system/node/node*/meminfo on compute nodes

thanks,
Bin

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-27: Fix proposed to config (master)

#18

Fix proposed to branch: master
Review: https://review.opendev.org/667811

Changed in starlingx:
status:	Confirmed → In Progress

Revision history for this message

Bin Yang (byangintel) wrote on 2019-06-27:

#19

After code review, the hugepage allocation check code has a problem.

It only check whether the total allocation memory is bigger than total node
memory size. If the pending hugepage size is between max possible size and node
total size, the check will be passed.

Then, the hugepage allocation size will be overflow. The normal 4K pages will
not be enough. So the OOM min watermark will be hit.

Revision history for this message

Bin Yang (byangintel) wrote on 2019-06-27:

#20

By default, if no pending hugepage request, _update_huge_pages() will allocate
(m.vm_hugepages_possible_2M*0.9) 2M hugepage. But m.vm_hugepages_nr_2M_pending
will be used with priority. If user config 2M hugepage manually with a wrong
size, this issue will be triggered.

Could reporter help to double check the test scripts?

Revision history for this message

Bart Wensley (bartwensley) wrote on 2019-06-27:

#21

Download full text (4.1 KiB)

I am seeing the same problem on a 2+2 lab. There are two compute hosts in this lab, with the same hardware and 32G of memory each. It looks like the two computes (same hardware) are reporting different numbers of available 2M pages:

2019-06-25 17:58:53.971 109784 INFO sysinv.api.controllers.v1.host [-] Updating mem values of host(compute-0) node(5): {'vm_hugepages_nr_4K': 0, 'vm_hugepages_nr_2M': 3330, 'vswitch_hugepages_nr': 1}
2019-06-25 17:58:53.989 109784 INFO sysinv.api.controllers.v1.host [-] Updating mem values of host(compute-0) node(6): {'vm_hugepages_nr_4K': 94976, 'vm_hugepages_nr_2M': 6264, 'vswitch_hugepages_nr': 1}

2019-06-25 17:59:03.741 109782 INFO sysinv.api.controllers.v1.host [-] Updating mem values of host(compute-1) node(3): {'vm_hugepages_nr_4K': 0, 'vm_hugepages_nr_2M': 7025, 'vswitch_hugepages_nr': 1}
2019-06-25 17:59:03.762 109782 INFO sysinv.api.controllers.v1.host [-] Updating mem values of host(compute-1) node(4): {'vm_hugepages_nr_4K': 0, 'vm_hugepages_nr_2M': 7169, 'vswitch_hugepages_nr': 1}

The platform is configuring an incorrect number of 2M pages for compute-1:

platform::compute::hugepage::params::nr_hugepages_1G: ("node0:1048576kB:1" "node1:1048576kB:1")
platform::compute::hugepage::params::vm_2M_pages: '"6783,7169"'
platform::compute::params::worker_base_reserved: ("node0:8000MB:1" "node1:2000MB:1")

This would require 6782*2M + 1G = 14.9G. That leaves almost no memory on node 0 (which has 16G) for the platform - there is supposed to be 8G reserved. The kernel isn’t even able to allocate the 6783

I don’t understand why puppet things that the 2M page allocation was successful - it tries to allocate 7025 pages on node0:

2019-06-25T19:48:09.660 Notice: 2019-06-25 19:48:08 +0000 /Stage[main]/Platform::Compute::Allocate/Allocate_pages[Start node0 2048kB]/Exec[Allocate 7025 /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages]/returns: executed successfully
2019-06-25T19:48:29.051 Notice: 2019-06-25 19:48:29 +0000 /Stage[main]/Platform::Compute::Allocate/Allocate_pages[Start node1 2048kB]/Exec[Allocate 7169 /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages]/returns: executed successfully

But it appears that only 6758 were allocated:

2019-06-27T02:53:41.333 compute-1 kernel: info [46730.474219] Node 0 hugepages_total=1 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
2019-06-27T02:53:41.333 compute-1 kernel: info [46730.474221] Node 0 hugepages_total=6758 hugepages_free=6758 hugepages_surp=0 hugepages_size=2048kB
2019-06-27T02:53:41.333 compute-1 kernel: info [46730.474222] Node 1 hugepages_total=1 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
2019-06-27T02:53:41.333 compute-1 kernel: info [46730.474223] Node 1 hugepages_total=7169 hugepages_free=7169 hugepages_surp=0 hugepages_size=2048kB

This results in compute-1 being unusable. It essentially has no free memory:
compute-1:~# free -h
total used free shared buff/cache available
Mem: 31G 30G 238M 12M 365M 7.7M
Swap: 0B 0B 0B
compute-1:~# memtop
memtop 0.1 -- selected options: delay ...

StarlingX

OOM seen on worker node after fresh install - mem available but out of order 0 pages

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Changed in starlingx:
status:	In Progress → Fix Released