StarlingX

Kubernetes: compute hosts run out of memory and reboot

Bug #1815106 reported by Frank Miller on 2019-02-07

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Fix Released	High	Jim Gauld

Bug Description

Brief Description
-----------------
While testing in a 2+2+2 kubernetes configuration, I see that compute-0 went for a spontaneous reboot. It seems that maintenance rebooted the host due to heartbeat failure, but looking on compute-0, it seems that the issue was caused by it running out of memory and the oom killer kicked in. This host had only been up for less than 13 hours.

Running memtop for 10 minutes on either compute shows a leak of more than 130MB (in 10 minutes) - that is going by the Avail column. For example:

compute-0:~$ memtop --delay=30 --repeat 10000
memtop 0.1 -- selected options: delay = 30.000s, repeat = 10000, period = 300000.000s, non-strict, unit = MiB
yyyy-mm-dd hh:mm:ss.fff Tot Used Free Ca Buf Slab CAS CLim Dirty WBack Anon Avail 0:Avail 0:HFree 1:Avail 1:HFree
2019-02-06 21:42:02.213 128726.2 112389.3 14135.2 1386.1 75.6 2620.4 7924.8 11341.1 0.1 0.0 2688.4 16336.9 10711.9 48640.0 5625.0 54844.0
2019-02-06 21:42:32.213 128726.2 112392.8 14130.4 1386.1 75.7 2626.0 7924.5 11341.1 0.1 0.0 2688.5 16333.5 10710.4 48640.0 5623.1 54844.0
2019-02-06 21:43:02.213 128726.2 112400.4 14121.4 1386.1 75.8 2631.6 7927.9 11341.1 0.1 0.0 2690.4 16325.8 10707.2 48640.0 5618.7 54844.0
2019-02-06 21:43:32.214 128726.2 112404.0 14116.4 1386.2 75.9 2637.0 7928.0 11341.1 0.1 0.0 2690.3 16322.2 10706.7 48640.0 5615.5 54844.0
2019-02-06 21:44:02.214 128726.2 112415.0 14104.2 1386.2 76.0 2644.0 7929.5 11341.1 0.1 0.0 2693.8 16311.3 10700.7 48640.0 5610.5 54844.0
2019-02-06 21:44:32.214 128726.2 112420.5 14097.4 1386.3 76.1 2649.5 7929.9 11341.1 0.1 0.0 2693.5 16305.8 10698.9 48640.0 5606.8 54844.0
2019-02-06 21:45:02.215 128726.2 112432.9 14083.5 1386.3 76.2 2655.3 7943.6 11341.1 0.1 0.0 2698.9 16293.3 10691.4 48640.0 5602.0 54844.0
2019-02-06 21:45:32.215 128726.2 112433.5 14081.5 1386.3 76.2 2661.1 7943.4 11341.1 0.1 0.0 2699.7 16292.7 10692.3 48640.0 5600.4 54844.0
2019-02-06 21:46:02.215 128726.2 112443.5 14069.8 1386.4 76.3 2667.7 7944.3 11341.1 0.1 0.0 2700.7 16282.7 10688.5 48640.0 5594.2 54844.0
2019-02-06 21:46:32.216 128726.2 112446.7 14065.4 1386.4 76.4 2672.5 7944.3 11341.1 0.1 0.0 2699.5 16279.5 10687.1 48640.0 5592.4 54844.0
2019-02-06 21:47:02.216 128726.2 112459.8 14050.9 1386.4 76.5 2679.0 7950.0 11341.1 0.1 0.0 2705.3 16266.4 10682.3 48640.0 5584.1 54844.0
2019-02-06 21:47:32.216 128726.2 112464.7 14045.2 1386.5 76.6 2683.5 7949.8 11341.1 0.1 0.0 2706.7 16261.5 10679.3 48640.0 5582.2 54844.0
2019-02-06 21:48:02.217 128726.2 112477.1 14031.0 1386.5 76.7 2690.8 7957.1 11341.1 0.1 0.0 2711.0 16249.1 10670.7 48640.0 5578.4 54844.0
2019-02-06 21:48:32.217 128726.2 112479.1 14027.7 1386.5 76.7 2696.4 8039.1 11341.1 0.1 0.0 2710.6 16247.1 10670.6 48640.0 5577.0 54844.0
2019-02-06 21:49:02.217 128726.2 112486.7 14018.6 1386.6 76.8 2701.0 7962.6 11341.1 0.1 0.0 2711.6 16239.5 10664.3 48640.0 5575.1 54844.0
2019-02-06 21:49:32.218 128726.2 112489.9 14014.2 1386.6 76.9 2706.8 7959.0 11341.1 0.1 0.0 2711.9 16236.3 10664.0 48640.0 5572.3 54844.0
2019-02-06 21:50:02.218 128726.2 112515.3 13987.5 1386.6 77.0 2712.8 7973.9 11341.1 0.1 0.0 2730.5 16210.9 10645.4 48640.0 5565.5 54844.0
2019-02-06 21:50:32.218 128726.2 112517.4 13984.1 1386.7 77.1 2718.1 7973.4 11341.1 0.1 0.0 2730.3 16208.8 10643.8 48640.0 5565.1 54844.0
2019-02-06 21:51:02.218 128726.2 112526.5 13973.7 1386.7 77.2 2724.7 7973.7 11341.1 0.1 0.0 2730.6 16199.7 10639.4 48640.0 5560.3 54844.0

In this compute host, the problem pod seems to be garbd - it was created here:

2019-02-06T19:06:44.312 compute-0 kubelet[57915]: info I0206 19:06:44.312423 57915 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "osh-openstack-garbd-garbd-token-9tnbq" (UniqueName: "kubernet
es.io/secret/57dcc8ff-2a42-11e9-9512-6805ca3a1a98-osh-openstack-garbd-garbd-token-9tnbq") pod "osh-openstack-garbd-garbd-cddcb95d7-wjv55" (UID: "57dcc8ff-2a42-11e9-9512-6805ca3a1a98")
2019-02-06T19:06:44.312 compute-0 kubelet[57915]: info I0206 19:06:44.312459 57915 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "garbd-bin" (UniqueName: "kubernetes.io/configmap/57dcc8ff-2a4
2-11e9-9512-6805ca3a1a98-garbd-bin") pod "osh-openstack-garbd-garbd-cddcb95d7-wjv55" (UID: "57dcc8ff-2a42-11e9-9512-6805ca3a1a98")
2019-02-06T19:06:44.416 compute-0 systemd[1]: info Started Kubernetes transient mount for /var/lib/kubelet/pods/57dcc8ff-2a42-11e9-9512-6805ca3a1a98/volumes/kubernetes.io~secret/osh-openstack-garbd-garbd-token-9tnbq.
2019-02-06T19:06:44.416 compute-0 systemd[1]: info Starting Kubernetes transient mount for /var/lib/kubelet/pods/57dcc8ff-2a42-11e9-9512-6805ca3a1a98/volumes/kubernetes.io~secret/osh-openstack-garbd-garbd-token-9tnbq.
2019-02-06T19:06:44.779 compute-0 kubelet[57915]: info 2019-02-06 19:06:44.779 [INFO][75660] calico.go 166: Calico CNI found existing endpoint: &{{WorkloadEndpoint projectcalico.org/v3} {compute--0-k8s-osh--openstack--garbd--garbd--cd
dcb95d7--wjv55-eth0 osh-openstack-garbd-garbd-cddcb95d7- openstack 57dcc8ff-2a42-11e9-9512-6805ca3a1a98 810097 0 2019-02-06 19:06:44 +0000 UTC <nil> <nil> map[projectcalico.org/orchestrator:k8s application:garbd component:server pod-
template-hash:cddcb95d7 release_group:osh-openstack-garbd projectcalico.org/namespace:openstack] map[] [] nil [] } {k8s compute-0 osh-openstack-garbd-garbd-cddcb95d7-wjv55 eth0 [] [] [kns.openstack] calibdcd363781d []}} Container
ID="a994bf8b0f963f5b002ed6b1af6e3399df6fde33eed50f263983308ba39a9c81" Namespace="openstack" Pod="osh-openstack-garbd-garbd-cddcb95d7-wjv55" WorkloadEndpoint="compute--0-k8s-osh--openstack--garbd--garbd--cddcb95d7--wjv55-"

Looks like the pod was deleted here:

2019-02-07T07:20:12.664 compute-0 kubelet[57927]: info I0207 07:20:12.664107 57927 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "garbd-bin" (UniqueName: "kubernetes.io/configmap/57dcc8ff-2a4
2-11e9-9512-6805ca3a1a98-garbd-bin") pod "osh-openstack-garbd-garbd-cddcb95d7-wjv55" (UID: "57dcc8ff-2a42-11e9-9512-6805ca3a1a98")
2019-02-07T07:20:12.664 compute-0 kubelet[57927]: info I0207 07:20:12.664778 57927 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "osh-openstack-garbd-garbd-token-9tnbq" (UniqueName: "kubernet
es.io/secret/57dcc8ff-2a42-11e9-9512-6805ca3a1a98-osh-openstack-garbd-garbd-token-9tnbq") pod "osh-openstack-garbd-garbd-cddcb95d7-wjv55" (UID: "57dcc8ff-2a42-11e9-9512-6805ca3a1a98")
2019-02-07T07:20:12.776 compute-0 kubelet[57927]: info I0207 07:20:12.776189 57927 reconciler.go:301] Volume detached for volume "osh-openstack-garbd-garbd-token-9tnbq" (UniqueName: "kubernetes.io/secret/57dcc8ff-2a42-11e9-9512-6805
ca3a1a98-osh-openstack-garbd-garbd-token-9tnbq") on node "compute-0" DevicePath ""
2019-02-07T07:20:12.776 compute-0 kubelet[57927]: info I0207 07:20:12.776211 57927 reconciler.go:301] Volume detached for volume "garbd-bin" (UniqueName: "kubernetes.io/configmap/57dcc8ff-2a42-11e9-9512-6805ca3a1a98-garbd-bin") on n
ode "compute-0" DevicePath ""
2019-02-07T07:20:12.911 compute-0 kubelet[57927]: info 2019-02-07 07:20:12.911 [INFO][59320] k8s.go 349: Endpoint deletion will be handled by Kubernetes deletion of the Pod. ContainerID="a994bf8b0f963f5b002ed6b1af6e3399df6fde33eed50f2
63983308ba39a9c81" endpoint=&v3.WorkloadEndpoint{TypeMeta:v1.TypeMeta{Kind:"WorkloadEndpoint", APIVersion:"projectcalico.org/v3"}, ObjectMeta:v1.ObjectMeta{Name:"compute--0-k8s-osh--openstack--garbd--garbd--cddcb95d7--wjv55-eth0", Gen
erateName:"osh-openstack-garbd-garbd-cddcb95d7-", Namespace:"openstack", SelfLink:"", UID:"57dcc8ff-2a42-11e9-9512-6805ca3a1a98", ResourceVersion:"955208", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:636850768
04, loc:(*time.Location)(0x1da8ce0)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"application":"garbd", "component":"server", "pod-template-hash":"cddcb95d7", "release_group"
:"osh-openstack-garbd", "projectcalico.org/namespace":"openstack", "projectcalico.org/orchestrator":"k8s"}, Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers
:[]string(nil), ClusterName:""}, Spec:v3.WorkloadEndpointSpec{Orchestrator:"k8s", Workload:"", Node:"compute-0", ContainerID:"", Pod:"osh-openstack-garbd-garbd-cddcb95d7-wjv55", Endpoint:"eth0", IPNetworks:[]string{"172.16.2.5/32"}, I
PNATs:[]v3.IPNAT(nil), IPv4Gateway:"", IPv6Gateway:"", Profiles:[]string{"kns.openstack"}, InterfaceName:"calibdcd363781d", MAC:"", Ports:[]v3.EndpointPort(nil)}}

And ever since that time the following logs have been coming out:

2019-02-07T07:20:22.285 compute-0 kubelet[57927]: info E0207 07:20:22.285418 57927 kubelet_volumes.go:140] Orphaned pod "57dcc8ff-2a42-11e9-9512-6805ca3a1a98" found, but volume paths are still present on disk : There were a total of
1 errors similar to this. Turn up verbosity to see them.
2019-02-07T07:20:24.269 compute-0 kubelet[57927]: info E0207 07:20:24.269096 57927 kubelet_volumes.go:140] Orphaned pod "57dcc8ff-2a42-11e9-9512-6805ca3a1a98" found, but volume paths are still present on disk : There were a total of
1 errors similar to this. Turn up verbosity to see them.
2019-02-07T07:20:26.274 compute-0 kubelet[57927]: info E0207 07:20:26.274389 57927 kubelet_volumes.go:140] Orphaned pod "57dcc8ff-2a42-11e9-9512-6805ca3a1a98" found, but volume paths are still present on disk : There were a total of
1 errors similar to this. Turn up verbosity to see them.

An upstream bug report that seems to describe this issue (it hasn’t been fixed yet):
https://github.com/kubernetes/kubernetes/issues/60987

Severity
--------
Major: System/Feature is usable but degraded

Steps to Reproduce
------------------
Not sure what triggered the issue.

Expected Behavior
------------------
Compute hosts should not run out of memory over time.

Actual Behavior
----------------
Compute hosts run out of memory and reboot after approximately 12 hours.

Reproducibility
---------------
Intermittent - not seen in all labs.

System Configuration
--------------------
2+2+2 system

Branch/Pull Time/Commit
-----------------------
###
### StarlingX
### Release 19.01
###

OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="f/stein"

JOB="STX_build_stein_master"
<email address hidden>"
BUILD_NUMBER="40"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-02-01 19:58:51 +0000"

Timestamp/Logs
--------------
See above

Tags:

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-02-08:

Marking as release gating; issue related to container env.

Changed in starlingx:
importance:	Undecided → High
status:	New → Triaged
tags:	added: stx.2019.05

Revision history for this message

Chris Friesen (cbf123) wrote on 2019-02-08:

It's worth noting that this has not been seen in other labs, and after Bart's lab was reinstalled we haven't seen the problem there again yet.

Frank Miller (sensfan22) on 2019-02-12

Changed in starlingx:
assignee:	Chris Friesen (cbf123) → Jim Gauld (jgauld)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-02-14: Fix proposed to stx-config (master)

Fix proposed to branch: master
Review: https://review.openstack.org/637051

Changed in starlingx:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-02-15: Fix merged to stx-config (master)

Reviewed: https://review.openstack.org/637051
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=acefd544f0f02aa348e29a46be925436349e542d
Submitter: Zuul
Branch: master

commit acefd544f0f02aa348e29a46be925436349e542d
Author: Jim Gauld <email address hidden>
Date: Thu Feb 14 15:42:07 2019 -0500

Mitigate memory leak of sessions by disabling sudo for sriov agent

    The sriov agent was polling devices via 'sudo ip link show',
    and this resulted in a severe memory leak. The usage of 'sudo'
    uses the host 'dbus-daemon', and somewhere the host does not
    clean up login sessions.

    Symptoms:
    - gradual run out of memory until system unstable, host spontaneous
      reboot due to delay or OOM
    - huge growth of kernel slab
    - thousands of /sys/fs/cgroup/systemd/user.slice/user-0.slice
      session-x*.scope files with empty 'tasks', i.e., sessions
      that should have deleted
    - huge latency seen with ssh and various systemd commands

The problem is mitigated by disabling 'sudo' for sriov agent, using
a helm override that configures [agent]/root_helper='' .

    Testing:
    - Verified that we could launch a VM with SR-IOV interface;
      VFs were able to set MAC and VLAN attributes.

Closes-Bug: 1815106

Change-Id: I0c57629c01b7407c99cc7f38b409019ab87af859
Signed-off-by: Jim Gauld <email address hidden>

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-02-19: Fix proposed to stx-config (f/stein)

Fix proposed to branch: f/stein
Review: https://review.openstack.org/637977

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-02-20: Change abandoned on stx-config (f/stein)

Change abandoned by Saul Wold (<email address hidden>) on branch: f/stein
Review: https://review.openstack.org/637977
Reason: Scott will provide a correct merge, sorry for the noise here.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-02-20: Fix proposed to stx-config (f/stein)

Fix proposed to branch: f/stein
Review: https://review.openstack.org/638217

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-02-20: Fix merged to stx-config (f/stein)

Download full text (6.9 KiB)

Reviewed: https://review.openstack.org/638217
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=b09d0898b6eaec572be3195ae25ec15413136552
Submitter: Zuul
Branch: f/stein

commit 1c467789c43827321e4319d50065fdbab1be35a2
Author: David Sullivan <email address hidden>
Date: Wed Feb 20 00:49:17 2019 -0500

Add replica settings for mariadb ingress pod

    There was no mariadb replica override for the ingress pod. On AIO-SX
    this caused two pods to be scheduled. When anti-affinity was added to
    mariadb this broke application-apply on AIO-SX.

The mariadb ingress pod replication will be set to the number of
controllers.

    Change-Id: Icf3f1979720629904ca9ddcabf59e8ecfab709e5
    Story: 2004520
    Task: 29570
    Signed-off-by: David Sullivan <email address hidden>

commit ed3c63a06da2cb04b7415cb1b5ba6340c3fa229a
Author: Erich Cordoba <email address hidden>
Date: Tue Feb 19 12:09:42 2019 -0600

Add DNS requirement for kubernetes and helm.

    `helm init` is being execute before networking and DNS is properly
    configured in the controller. A dependency was added to kubernetes
    to setup DNS, helm manifest was updated to depend on kubernetes.

Also, the `--skip-refresh` flag was added to helm init for second
controller to avoid timeout scenarios on proxy enviroments.

Closes-Bug: 1814968

    Change-Id: I65759314b3a861e7fdb428889aa5f5c1c7037661
    Suggested-by: Mingyuan Qi <email address hidden>
    Signed-off-by: Erich Cordoba <email address hidden>

commit 70ed5b099496c98b37a94b061610d48c9263f554
Author: Alex Kozyrev <email address hidden>
Date: Fri Feb 15 15:46:32 2019 -0500

Enable Barbican provisioning in SM in kubernetes environment

    Since Barbican is in charge of storing BMC passwords for MTCE now
    we need it to run as a bare-metal service alongside with kubernetes.
    This patch enables SM provisioning for barbican in this case.

    Change-Id: Id51f679738d429e78f388b6dc42e7606ef0c41ab
    Story: 2003108
    Task: 27700
    Signed-off-by: Alex Kozyrev <email address hidden>

commit 0dd4b86526609b86d8c7395a7c9af13e7f769596
Author: David Sullivan <email address hidden>
Date: Tue Feb 12 14:09:10 2019 -0500

Add replica and anti-affinity settings

Add anti-affinity settings to openstack pods. Add replication to
novncproxy, aodh, panko and rbd_provisioner services.

    Change-Id: I8091a54cab98ff295eba6e7dd6fa76827d149b5f
    Story: 2004520
    Task: 29418
    Signed-off-by: David Sullivan <email address hidden>

commit 5b94294002617b18bc0f98b206a24cec38a5b929
Author: Angie Wang <email address hidden>
Date: Thu Feb 7 23:42:25 2019 -0500

Support stx-openstack app install with the authed local registry

    The functionality of local docker registry authentication will be
    enabled in commit https://review.openstack.org/#/c/626355/.
    However, local docker registry is currently used to pull/push images
    during application apply without authentication and no credentials
    passed to the kubernetes when pulling images ...

Reviewed:  https://review.openstack.org/638217
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=b09d0898b6eaec572be3195ae25ec15413136552
Submitter: Zuul
Branch:    f/stein

commit 1c467789c43827321e4319d50065fdbab1be35a2
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Wed Feb 20 00:49:17 2019 -0500

Add replica settings for mariadb ingress pod
    
    There was no mariadb replica override for the ingress pod. On AIO-SX
    this caused two pods to be scheduled. When anti-affinity was added to
    mariadb this broke application-apply on AIO-SX.
    
    The mariadb ingress pod replication will be set to the number of
    controllers.
    
    Change-Id: Icf3f1979720629904ca9ddcabf59e8ecfab709e5
    Story: 2004520
    Task: 29570
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit ed3c63a06da2cb04b7415cb1b5ba6340c3fa229a
Author: Erich Cordoba <erich.cordoba.malibran@intel.com>
Date:   Tue Feb 19 12:09:42 2019 -0600

Add DNS requirement for kubernetes and helm.
    
    `helm init` is being execute before networking and DNS is properly
    configured in the controller. A dependency was added to kubernetes
    to setup DNS, helm manifest was updated to depend on kubernetes.
    
    Also, the `--skip-refresh` flag was added to helm init for second
    controller to avoid timeout scenarios on proxy enviroments.
    
    Closes-Bug: 1814968
    
    Change-Id: I65759314b3a861e7fdb428889aa5f5c1c7037661
    Suggested-by: Mingyuan Qi <mingyuan.qi@intel.com>
    Signed-off-by: Erich Cordoba <erich.cordoba.malibran@intel.com>

commit 70ed5b099496c98b37a94b061610d48c9263f554
Author: Alex Kozyrev <alex.kozyrev@windriver.com>
Date:   Fri Feb 15 15:46:32 2019 -0500

Enable Barbican provisioning in SM in kubernetes environment
    
    Since Barbican is in charge of storing BMC passwords for MTCE now
    we need it to run as a bare-metal service alongside with kubernetes.
    This patch enables SM provisioning for barbican in this case.
    
    Change-Id: Id51f679738d429e78f388b6dc42e7606ef0c41ab
    Story: 2003108
    Task: 27700
    Signed-off-by: Alex Kozyrev <alex.kozyrev@windriver.com>

commit 0dd4b86526609b86d8c7395a7c9af13e7f769596
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Tue Feb 12 14:09:10 2019 -0500

Add replica and anti-affinity settings
    
    Add anti-affinity settings to openstack pods. Add replication to
    novncproxy, aodh, panko and rbd_provisioner services.
    
    Change-Id: I8091a54cab98ff295eba6e7dd6fa76827d149b5f
    Story: 2004520
    Task: 29418
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit 5b94294002617b18bc0f98b206a24cec38a5b929
Author: Angie Wang <angie.wang@windriver.com>
Date:   Thu Feb 7 23:42:25 2019 -0500

Support stx-openstack app install with the authed local registry
    
    The functionality of local docker registry authentication will be
    enabled in commit https://review.openstack.org/#/c/626355/.
    However, local docker registry is currently used to pull/push images
    during application apply without authentication and no credentials
    passed to the kubernetes when pulling images on other nodes except
    for active controller.
    
    In order to install stx-openstack app with local docker registry that
    has authentication turned on, this commit updates the following:
     1. Pass the user credentials when pulling/pushing images from local
        registry during application apply.
     2. Create a well-known registry secret "default-registry-key" which
        holds the authorization token during stx-openstack app apply and
        delete the secret during removal. The helm-toolkit is updated to
        refer to this secret in k8s openstack service account template for
        pulling images from local by kubelet. This secret is also added to
        rbd-provisioner service account as well since it is not using
        helm-toolkit to create service account.
    
    Note: #2 is short-term solution. The long-term solution is to implement
    the BP https://blueprints.launchpad.net/openstack-helm/+spec/support
    -docker-registry-with-authentication-turned-on.
    
    Story: 2002840
    Task: 28945
    Depends-On: https://review.openstack.org/636181
    Change-Id: I015dccd12c5c7fa7a4bea74eef8d172f03b5d60e
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit d5db10f6b7df537924efef684395bee3c608d23a
Author: Kristine Bujold <kristine.bujold@windriver.com>
Date:   Tue Feb 12 10:03:48 2019 -0500

Move neutron static configs to Armada manifest
    
    Move all neutron static configurations from the overrides to the
    Armada manifest.
    
    This is being done so we have a consistent way of managing
    containerized openstack configurations. Static configurations will
    be located in the Armada manifest and dynamic configuration will be
    located in the overrides files.
    
    Story: 2003909
    Task: 29433
    
    Change-Id: I5baf0bbc15912e0303955456151e69856bba0385
    Signed-off-by: Kristine Bujold <kristine.bujold@windriver.com>

commit cf23446094d52851e4bd2ade516ab724b65844f0
Author: Dean Troyer <dtroyer@gmail.com>
Date:   Tue Feb 12 17:06:53 2019 -0600

Fix configutilities and controllerconfig installs in DevStack
    
    Use the DevStack-provided functions to do the Python installations
    for configutilities and controllerconfig.
    
    Prepare the plugin setting for declaring DevStack prereqs that
    is available in master's DevStack playbook.
    
    Also do not enable all services by default. sysinv-api is disabled
    in the devstack job as it does not properly start under Bionic.  We
    will address this separately.
    
    Change-Id: Ib57863526d285049b5964828e1b60bf215d25a23
    Signed-off-by: Dean Troyer <dtroyer@gmail.com>

commit acefd544f0f02aa348e29a46be925436349e542d
Author: Jim Gauld <james.gauld@windriver.com>
Date:   Thu Feb 14 15:42:07 2019 -0500

Mitigate memory leak of sessions by disabling sudo for sriov agent
    
    The sriov agent was polling devices via 'sudo ip link show',
    and this resulted in a severe memory leak. The usage of 'sudo'
    uses the host 'dbus-daemon', and somewhere the host does not
    clean up login sessions.
    
    Symptoms:
    - gradual run out of memory until system unstable, host spontaneous
      reboot due to delay or OOM
    - huge growth of kernel slab
    - thousands of /sys/fs/cgroup/systemd/user.slice/user-0.slice
      session-x*.scope files with empty 'tasks', i.e., sessions
      that should have deleted
    - huge latency seen with ssh and various systemd commands
    
    The problem is mitigated by disabling 'sudo' for sriov agent, using
    a helm override that configures [agent]/root_helper='' .
    
    Testing:
    - Verified that we could launch a VM with SR-IOV interface;
      VFs were able to set MAC and VLAN attributes.
    
    Closes-Bug: 1815106
    
    Change-Id: I0c57629c01b7407c99cc7f38b409019ab87af859
    Signed-off-by: Jim Gauld <james.gauld@windriver.com>

tags:

added: in-f-stein

Ken Young (kenyis) on 2019-04-05

tags:

added: stx.2.0
removed: stx.2019.05

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.