StarlingX

pods do not get restarted in an AIO-DX system

Bug #1900920 reported by Steven Webster on 2020-10-21

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Fix Released	Medium	Douglas Henrique Koerich

Bug Description

Brief Description
-----------------

Pods that are in a k8s deployment, daemonset, etc can be labeled as restart-on-reboot="true", which will automatically cause them to be restarted after the worker manifest has completed in an AIO system. This label is primarily used for pods using SR-IOV interfaces, as the pod will start coming up after the controller manifest is completed, but before the SR-IOV devices are bound with an appropriate driver.

In an AIO-DX system however, the reboot can fail to occur if no node selector has been set, as the query for labeled pods depends on a field selector specifying the host the recovery script is running on.

The reboot will fail to occur if the script looks for labeled pods before the pod has been scheduled on the node the script is running on.

Severity
--------
Provide the severity of the defect.
Minor: System/Feature is usable with minor issue

Steps to Reproduce
------------------
- As part of a daemonset, label a pod with restart-on-reboot=true
- Ensure the pod cannot be scheduled on the other AIO-DX node (label, taint, etc)
- Reboot the node the pod is scheduled on and observe the k8s-pod-recovery logs in /var/log/daemon.log
- Observe no log specifying the pod has been recovered

Expected Behavior
------------------
The pod should be recovered by the script

Actual Behavior
----------------
The pod may not be recovered by the script

Reproducibility
---------------
50/50

System Configuration
--------------------
AIO-DX

Branch/Pull Time/Commit
-----------------------
master 2020-10-20

Test Activity
-------------
Developer Testing

Workaround
----------
Use an init container for the pod in question with a few second delay
or
restart the pod manually

Tags:

CVE References

Steven Webster (swebster-wr) on 2020-10-21

tags:

added: stx.networking

Ghada Khalil (gkhalil) on 2020-10-29

Changed in starlingx:
importance:	Undecided → Medium
status:	New → Triaged
tags:	added: stx.5.0
Changed in starlingx:
assignee:	nobody → Cole Walker (cwalops)

Revision history for this message

Steven Webster (swebster-wr) wrote on 2020-11-12:

Ref: https://bugs.launchpad.net/starlingx/+bug/1896631

Ghada Khalil (gkhalil) on 2021-02-25

Changed in starlingx:
assignee:	Cole Walker (cwalops) → Douglas Henrique Koerich (dkoerich-wr)

Douglas Henrique Koerich (dkoerich-wr) on 2021-02-26

Changed in starlingx:
status:	Triaged → In Progress

Revision history for this message

Douglas Henrique Koerich (dkoerich-wr) wrote on 2021-03-11:

By following the steps indicated in the bug description above it was possible to reproduce the issue in an AIO-DX environment, according to the following timeline (at the host the pod(s) was(were) scheduled on):

t=0s: Finished controller manifest
t=8s: Started worker manifest
t=37s: Start of k8s-pod-recovery
t=38s: Finished worker manifest
t=63s: Started created "restart-on-reboot" labeled pod(s)
t=281s: Same labeled pod(s) verified w/o restarting

The restart of the pod(s) is not performed because the query on the labeled pods to be recovered returns an empty set when the k8s-pod-recovery is launched.

By moving the handling of labeled pods to after they are in a stable state, the restart of them is correctly performed:

t=0s: Finished controller manifest
t=9s: Started worker manifest
t=66s: Start of k8s-pod-recovery
t=67s: Finished worker manifest
t=73s: Started created "restart-on-reboot" labeled pod(s)
t=190s: Labeled pod(s) is(are) restarted
t=408s: New labeled pod(s) verified

Revision history for this message

Douglas Henrique Koerich (dkoerich-wr) wrote on 2021-03-11:

Please refer to/review the fix at https://review.opendev.org/c/starlingx/integ/+/780011.

Revision history for this message

Douglas Henrique Koerich (dkoerich-wr) wrote on 2021-03-15:

daemon.log Edit (331.2 KiB, text/plain)

Tests were done with the additional waiting procedure for labeled pods and results can be inspected in the attached daemon.log file.

Revision history for this message

Douglas Henrique Koerich (dkoerich-wr) wrote on 2021-03-15:

test.yaml Edit (623 bytes, text/plain)

Pods used in the test are described by the test.yaml file attached.

Douglas Henrique Koerich (dkoerich-wr) on 2021-03-17

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-31: Fix proposed to integ (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/integ/+/793754

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-07: Fix merged to integ (f/centos8)

Download full text (37.0 KiB)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/793754
Committed: https://opendev.org/starlingx/integ/commit/a13966754d4e19423874ca31bf1533f057380c52
Submitter: "Zuul (22348)"
Branch: f/centos8

commit b310077093fd567944c6a46b7d0adcabe1f2b4b9
Author: Mihnea Saracin <email address hidden>
Date: Sat May 22 18:19:54 2021 +0300

Fix resize of filesystems in puppet logical_volume

    After system reinstalls there is stale data on the disk
    and puppet fails when resizing, reporting some wrong filesystem
    types. In our case docker-lv was reported as drbd when
    it should have been xfs.

    This problem was solved in some cases e.g:
    when doing a live fs resize we wipe the last 10MB
    at the end of partition:
    https://opendev.org/starlingx/stx-puppet/src/branch/master/puppet-manifests/src/modules/platform/manifests/filesystem.pp#L146

    Our issue happened here:
    https://opendev.org/starlingx/stx-puppet/src/branch/master/puppet-manifests/src/modules/platform/manifests/filesystem.pp#L65
    Resize can happen at unlock when a bigger size is detected for the
    filesystem and the 'logical_volume' will resize it.
    To fix this we have to wipe the last 10MB of the partition after the
    'lvextend' cmd in the 'logical_volume' module.

Tested the following scenarios:

B&R on SX with default sizes of filesystems and cgts-vg.

B&R on SX with with docker-lv of size 50G, backup-lv also 50G and
cgts-vg with additional physical volumes:

    - name: cgts-vg
        physicalVolumes:
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 50
        type: partition
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 30
        type: partition
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-3.0
        type: disk

B&R on DX system with backup of size 70G and cgts-vg
with additional physical volumes:

    physicalVolumes:
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 50
        type: partition
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 30
        type: partition
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-3.0
        type: disk

    Closes-Bug: 1926591
    Change-Id: I55ae6954d24ba32e40c2e5e276ec17015d9bba44
    Signed-off-by: Mihnea Saracin <email address hidden>

commit 3225570530458956fd642fa06b83360a7e4e2e61
Author: Mihnea Saracin <email address hidden>
Date: Thu May 20 14:33:58 2021 +0300

Execute once the ceph services script on AIO

    The MTC client manages ceph services via ceph.sh which
    is installed on all node types in
    /etc/service.d/{controller,worker,storage}/ceph.sh

    Since the AIO controllers have both controller and worker
    personalities, the MTC client will execute the ceph script
    twice (/etc/service.d/worker/ceph.sh,
    /etc/service.d/controller/ceph.sh).
    This behavior will generate some issues.

We fix this by exiting the ceph script if it is the one from
/etc/services.d/worker on AIO systems.

Closes-Bug: 1928934
Change-Id: I3e4dc313cc3764f870b8f6c640a60338...

Reviewed:  https://review.opendev.org/c/starlingx/integ/+/793754
Committed: https://opendev.org/starlingx/integ/commit/a13966754d4e19423874ca31bf1533f057380c52
Submitter: "Zuul (22348)"
Branch:    f/centos8

commit b310077093fd567944c6a46b7d0adcabe1f2b4b9
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Sat May 22 18:19:54 2021 +0300

Fix resize of filesystems in puppet logical_volume
    
    After system reinstalls there is stale data on the disk
    and puppet fails when resizing, reporting some wrong filesystem
    types. In our case docker-lv was reported as drbd when
    it should have been xfs.
    
    This problem was solved in some cases e.g:
    when doing a live fs resize we wipe the last 10MB
    at the end of partition:
    https://opendev.org/starlingx/stx-puppet/src/branch/master/puppet-manifests/src/modules/platform/manifests/filesystem.pp#L146
    
    Our issue happened here:
    https://opendev.org/starlingx/stx-puppet/src/branch/master/puppet-manifests/src/modules/platform/manifests/filesystem.pp#L65
    Resize can happen at unlock when a bigger size is detected for the
    filesystem and the 'logical_volume' will resize it.
    To fix this we have to wipe the last 10MB of the partition after the
    'lvextend' cmd in the 'logical_volume' module.
    
    Tested the following scenarios:
    
    B&R on SX with default sizes of filesystems and cgts-vg.
    
    B&R on SX with with docker-lv of size 50G, backup-lv also 50G and
    cgts-vg with additional physical volumes:
    
    - name: cgts-vg
        physicalVolumes:
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 50
        type: partition
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 30
        type: partition
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-3.0
        type: disk
    
    B&R on DX system with backup of size 70G and cgts-vg
    with additional physical volumes:
    
    physicalVolumes:
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 50
        type: partition
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 30
        type: partition
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-3.0
        type: disk
    
    Closes-Bug: 1926591
    Change-Id: I55ae6954d24ba32e40c2e5e276ec17015d9bba44
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit 3225570530458956fd642fa06b83360a7e4e2e61
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Thu May 20 14:33:58 2021 +0300

Execute once the ceph services script on AIO
    
    The MTC client manages ceph services via ceph.sh which
    is installed on all node types in
    /etc/service.d/{controller,worker,storage}/ceph.sh
    
    Since the AIO controllers have both controller and worker
    personalities, the MTC client will execute the ceph script
    twice (/etc/service.d/worker/ceph.sh,
    /etc/service.d/controller/ceph.sh).
    This behavior will generate some issues.
    
    We fix this by exiting the ceph script if it is the one from
    /etc/services.d/worker on AIO systems.
    
    Closes-Bug: 1928934
    Change-Id: I3e4dc313cc3764f870b8f6c640a6033822639926
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit b428a5de0070c6df82536b8b5b782810ebd9efda
Author: Cole Walker <cole.walker@windriver.com>
Date:   Wed May 19 16:57:54 2021 +0000

Revert "Remove recover operations to "restart-on-reboot" pods"
    
    This reverts commit 8abcbf6fb1951b25e9964933558b75b9aff88135.
    
    Reason for revert:
    
    After performing a backup and restore on an AIO-SX system, SRIOV pods do
    not return to a running state and are instead stuck in "container
    creating". The workaround for this is to restart SRIOV pods when the
    system unlocks.
    
    Reverting this commit to allow users to label SRIOV pods and have them
    restarted by k8s-pod-recovery. Labelled pods will be restarted by
    k8s-pod-recovery and will be running after backup and restore is
    completed.
    
    This change has been tested by performing backup and restore on an
    AIO-SX system. SRIOV pods now come up correctly when labelled with
    restart-on-reboot=true
    
    Closes-Bug: 1928965
    
    Signed-off-by: Cole Walker <cole.walker@windriver.com>
    Change-Id: I9c520c0a47aabca7b96e50adf0f71742f4199c2f

commit 4e1aa82e96d9b4caeff7e7b31632733c395c6ad0
Author: Robert Church <robert.church@windriver.com>
Date:   Sat May 15 16:24:29 2021 -0400

Update postgres liveness check to support IPv6 addresses
    
    Templating will add square brackets for IPv6 addresses which are
    interpreted as an array vs. a string. Quote this so that it interpreted
    correctly.
    
    Change-Id: I2b705015a74ea2e4e914b7a83cdceed37d49b766
    Related-Bug: #1917308
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit b3540ccfdfa6956fb20c62e5e5bb76af56d2ab63
Author: Robert Church <robert.church@windriver.com>
Date:   Wed May 12 22:36:23 2021 -0400

Update the liveness probe to verify postgres connectivity
    
    Change the tillerLivenessProbeTemplate to test the connectivity to the
    postgres backend. We will override the periodSeconds and
    failureThreshold when installing the helm chart to trigger a restart of
    the tiller pod over a swact when the postgres DB/server moves from one
    controller to the other.
    
    This will help guarantee that the tiller connection is always
    re-established if the connectivity to the postgres backend fails.
    
    Change-Id: I7fbed33a8c821f6c9254f58d5953e2115cf4141a
    Related-Bug: #1917308
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit 03665ae745babb4524e2b9b9cc0f768eaf1e8781
Author: Angie Wang <angie.wang@windriver.com>
Date:   Mon May 10 18:54:07 2021 -0400

Add armada namespace in k8s pod recovery
    
    Update k8s pod recovery service to include armada namespace
    so armada pod that stuck in an unknown state after host
    lock/unlock or reboot could be recovered by the service.
    
    Change-Id: Iacd92637a9b4fcaf4c0076e922e1bd739f69a584
    Closes-Bug: 1928018
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 764cac1642a8820d169576da3d8d886449d3cf73
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Tue May 11 17:04:01 2021 +0000

Armada: Fix tiller stuck connecting to postgres database
    
    Tiller may start executing before IPv6 network is fully initialized.
    This will result in tiller not being fully functional.
    The liveness probe will detect that tiller didn't start properly and
    restart it. But this might happen an unlimited number of times in a row.
    
    Wait until ping is succesful to the ip of the postgres database.
    This ensures that networking finished setting up.
    Credits to Cole Walker <cole.walker@windriver.com> for proposing the
    idea.
    
    Depends-On: I177bb628497611eb64472291a04d635856c26590
    Closes-Bug: 1928141
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I9c5be3f30fad2650e6aa53fb80ef44f7798813ed

commit 1974b3f570c0a21ec5e4cfe7d806c58a01a7dd0c
Author: Don Penney <don.penney@windriver.com>
Date:   Fri May 7 09:01:47 2021 -0400

Copy shim.efi to /pxeboot for UEFI pxeboot support
    
    Package a copy of the shim.efi file to /pxeboot to support UEFI secure
    boot. The recent grub2 update for CVE-2020-15705 requires the use of
    shim.efi in order to support kernel signature validation.
    
    Change-Id: If87925e1697b34d7ff1a7a770d9f13619dd9dd52
    Partial-Bug: 1927730
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit b1ac60470315153dc9bc03f7f0bb1bfb221f6c5d
Author: Davlet Panech <davlet.panech@windriver.com>
Date:   Wed May 5 10:42:56 2021 -0400

Pin clearlinux/golang to v1.15.10 in Dockerfiles
    
    Upstream Dockerfiles use clearlinux/golang:latest as the base, which is
    broken as of now. Solution: change it to last known working tag before
    building.
    
    Closes-Bug: 1927153
    Signed-off-by: Davlet Panech <davlet.panech@windriver.com>
    Change-Id: Ic13973c0518eeab74ec86884036d08c2b8a4961f

commit 4850ab86da1cecca239d2ffa6dded4c0946e8a43
Author: Li Zhou <li.zhou@windriver.com>
Date:   Tue Apr 13 04:34:32 2021 -0400

systemd: Upgrade to version 219-78.el7_9.3
    
    This fixes the issue of systemd sending tons of useless
    PropertiesChanged messages when a mount happens as described in:
    https://bugzilla.redhat.com/show_bug.cgi?id=1793527
    
    Depends-On: https://review.opendev.org/c/starlingx/tools/+/786601
    Partial-Bug: #1924691
    Signed-off-by: Li Zhou <li.zhou@windriver.com>
    Change-Id: I3596303d77211a135e8559a05806395328725cde

commit 18010eb1d637c8ea3c9fdd9be5684f1b5ee8b23c
Author: Thiago Brito <thiago.brito@windriver.com>
Date:   Thu Mar 25 14:55:11 2021 -0400

Upversioning armada tarball to 7ef4b86
    
    A fix landed upstream to deal with armada waiting indefinitely for
    evicted pods, which intermittently fails stx-openstack
    application. This commit upversions the tarball version to the
    one containing that change.
    
    Removing patches 0002 and 0003 since the commits are already on
    the armada code at this version.
    
    Story: 2008645
    Task: 41906
    
    Signed-off-by: Thiago Brito <thiago.brito@windriver.com>
    Change-Id: I62caf0a403a054c30b5bbfc1a3c5bc4cf73b60a6

commit ccfeeef59d39e42b2775bb5a216732c4999f6e42
Author: Li Zhou <li.zhou@windriver.com>
Date:   Mon Apr 12 02:15:25 2021 -0400

systemd: Prevent excessive /proc/1/mountinfo reparsing
    
    Backport the patches for this issue:
    https://bugzilla.redhat.com/show_bug.cgi?id=1819868
    
    We met such an issue:
    When testing a large number of pods (> 230), occasionally observed a
    number of issues related to systemd process:
        systemd ran continually 90-100% cpu usage
        systemd memory usage started increasing rapidly (20GB/hour)
        systemctl commands would always timeout (Failed to get properties:
            Connection timed out)
        sm services failed and can't recover: open-ldap,
            registry-token-server, docker-distribution, etcd
        new pods can't start, and got stuck in state ContainerCreating
    
    Those patches work to prevent excessive /proc/1/mountinfo reparsing.
    It has been verified that those patches can improve this performance
    greatly.
    
    16 commits are listed in sequence (from [1] to [16]) at below link
    for the issue:
    https://github.com/systemd-rhel/rhel-8/pull/154/commits
    
    [16](10)core: prevent excessive /proc/self/mountinfo parsing
    [15][Dropped-6]test: add ratelimiting test
    [14](9)sd-event: add ability to ratelimit event sources
    [13](8)sd-event: increase n_enabled_child_sources just once
    [12](7)sd-event: update state at the end in event_source_enable
    [11](6)sd-event: remove earliest_index/latest_index into common part of
    event source objects
    [10][Dropped-5]sd-event: follow coding style with naming return
    parameter
    [9] [Dropped-4]sd-event: ref event loop while in sd_event_prepare() ot
    sd_event_run()
    [8] (5)sd-event: refuse running default event loops in any other thread
    than the one they are default for
    [7] [Dropped-3]sd-event: let's suffix last_run/last_log with "_usec"
    [6] [Dropped-2]sd-event: fix delays assert brain-o (#17790)
    [5] (4)sd-event: split out code to add/remove timer event sources to
    earliest/latest prioq
    [4] (3)sd-event: split clock data allocation out of sd_event_add_time()
    [3] [Dropped-1]sd-event: mention that two debug logged events are
    ignored
    [2] (2)sd-event: split out enable and disable codepaths from
    sd_event_source_set_enabled()
    [1] (1)sd-event: split out helper functions for reshuffling prioqs
    
    I ported 10 of them back (from (1) to (10)) to fix this issue
    and dropped the other 6 (from [Dropped-1] to [Dropped-6]) for those
    reasons:
    [Dropped-1]Only changes error log.
    [Dropped-2]Fixes a bug introduced in a commit which doesn't exist in
    this version.
    [Dropped-3]Only changes vars' names and there is no functional change.
    [Dropped-4]More commits are needed for merging it, while I don't see
    any help on adding the rate-limiting ability.
    [Dropped-5]Change coding style for a function which isn't really used
    by anyone.
    [Dropped-6]Add test cases.
    
    Closes-Bug: #1924686
    Signed-off-by: Li Zhou <li.zhou@windriver.com>
    Change-Id: Ia4c8f162cb1a47b40d1b26cf4d604976b97e92d6

commit e62b1a53b9148738a7c36355b19607d6e6f3d0d7
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Tue Apr 20 17:32:45 2021 -0500

Unmount all targets during drbd stop
    
    When stopping drbd, we need to unmount targets from each device.
    Devices with multiple mountpoints can fail to unmount, leading to
    metadata corruption. Add --all-targets to the umount command.
    
    Closes-Bug: 1920245
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>
    Change-Id: Ic1b4583c72a0dd256724b8672dbb59126273330b

commit de263f633ea359507357d3d4c53e98a71bff5afc
Author: Cole Walker <cole.walker@windriver.com>
Date:   Tue Apr 13 16:47:24 2021 -0400

Add alternative command to disable lldp agent for i40e devices
    
    LLDP information is not available for certain i40e network devices when
    running system host-lldp-neighbor-show.
    
    This is caused by the firmware lldp agent on the devices not getting
    disabled by the i40e-lldp-configure.sh script which is invoked by lldpd.
    
    The command used to disable the firmware lldp agent in the script works
    for some firmware versions found on devices, but not others. This change
    adds an ethtool command to disable the lldp agent which works for these
    other firmware versions.
    
    From testing, the ethtool method is used for firmware versions 5.05 and
    8.10. The sysfs method is used for firmware version 7.10. In all cases,
    the driver version is 2.14.13
    
    Closes-Bug: 1923665
    
    Signed-off-by: Cole Walker <cole.walker@windriver.com>
    Change-Id: Ifac34091599bd4020bf55cc1b8ba3119edccb297

commit 3924cfe7ae390678ae4df9b544acf8b373440183
Author: Marcus Secato <marcus.viniciuscarvalhosecato@windriver.com>
Date:   Thu Apr 15 17:52:58 2021 -0400

Set proper user ID for armada-api container
    
    Since armada application moved to Kubernetes cluster, processes and
    commands are not executed with the 'armada' user in armada-api
    container. Previously when armada was a separated container user was
    enforced through 'docker exec'.
    
    Closes-Bug: 1924579
    
    Signed-off-by: Marcus Secato <marcus.viniciuscarvalhosecato@windriver.com>
    Change-Id: I5600974c0b9c3ade73a58dae300e8f3b18c6aefd

commit 8abcbf6fb1951b25e9964933558b75b9aff88135
Author: Bin Qian <bin.qian@windriver.com>
Date:   Thu Apr 8 12:58:44 2021 -0400

Remove recover operations to "restart-on-reboot" pods
    
    The pods being labeled as "restart-on-reboot" is to workaround
    kubernetes restart on worker manifest. As the AIO running a
    single manifest to start kubernetes only once, the operation
    is no longer needed.
    
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/785736
    Change-Id: I0d6c549199559b2bc19d8edff52f64ea0b08b50d
    Closes-Bug: 1918139
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit 859e8eb7309f6c26f1ebbc0898e87e82d56af97b
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Tue Apr 14 20:16:48 2020 -0400

add isolcpus device plugin for kubernetes
    
    In order to minimize latency as much as possible, we want to allow
    kubernetes containers to make use of CPUs which have been specified
    as "isolated" via the kernel boot args.
    
    This commit creates an isolcpus device plugin, which detects the isolated
    CPUs and exports them to kubelet via the device plugin API.
    
    See kubernetes/plugins/isolcpus-device-plugin/files/README.md for
    more information on the behaviour and design choices for this commit.
    
    When we move to a newer version of the Intel device plugin manager we
    may be able to simplify some of this.  See the above README.md file
    for details.
    
    Change-Id: I3bfe04ab6e7fbafefa63f6dc43cb2ed79a52579f
    Story: 2008760
    Task: 42165
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit 777b7d88630bae55bf130e240212a2abf288bbd3
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Mon Oct 26 17:30:00 2020 -0400

enable support for kubernetes to ignore isolcpus
    
    The normal mechanisms for allocating isolated CPUs do not allow
    a mix of isolated and exclusive CPUs in the same container.  In
    order to allow this in *very* limited cases where the pod spec
    is known in advance we will add the ability to disable the normal
    isolcpus behaviour.
    
    If the file "/etc/kubernetes/ignore_isolcpus" exists, then kubelet
    will basically forget everything it knows about isolcpus and just
    treat them like regular CPUs.
    
    The admin user can then rely on the fact that CPU allocation is
    deterministic to ensure that the isolcpus they configure end up being
    allocated to the correct pods.
    
    Story: 2008760
    Task: 42164
    Change-Id: Ie38c81209ee407ac98b4882f2581fc14622b3af1
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit 4150b7a6b61365525c6201ad04eb678c96a578d5
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Mon Aug 31 11:06:59 2020 -0400

kubeadm: create platform pods with zero CPU resources
    
    We want to specify zero CPU resources when creating the manifests
    for the static platform pods, as a workaround for the lack of
    separate resource tracking for platform resources.
    
    We also specify zero CPU resources for the coredns deployment.
    manifests.go appears to be the main file for this, not sure if the
    others are used by I changed them just in case.
    
    Story: 2008760
    Task: 42163
    Change-Id: I6410b8af556d5167d1813e7545fad8baa27b1100
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit 71876817aa2a0a0109d7dfe6bf34c1344b3d5f06
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Fri Apr 24 02:25:22 2020 -0400

fix exclusive CPU alloc being deleted at container restart
    
    The expectation is that exclusive CPU allocations happen at pod
    creation time. When a container restarts, it should not have its
    exclusive CPU allocations removed, and it should not need to
    re-allocate CPUs.
    
    There are a few places in the current code that look for containers
    that have exited and call CpuManager.RemoveContainer() to clean up
    the container.  This will end up deleting any exclusive CPU
    allocations for that container, and if the container restarts within
    the same pod it will end up using the default cpuset rather than
    what should be exclusive CPUs.
    
    Removing those calls and adding resource cleanup at allocation
    time should get rid of the problem.
    
    This should eventually go into upstream 1.18.1, at which point
    we can just revert this commit.
    
    Story: 2008760
    Task: 42160
    Change-Id: I61d3670805ef805e21b9c54daf0677d4c7e1bc74
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit b88df951face924d0c29fa76ee03424c4546afd2
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Fri Apr 24 02:18:00 2020 -0400

Add kubernetes support for isolated cpus
    
    This introduces the concept of "isolated CPUs", which are CPUs that
    have been isolated at the kernel level via the "isolcpus" kernel boot
    parameter.
    
    When starting the kubelet process, the set of reserved CPUs (including
    both platform and isolated CPUs) will be specified via
    '--reserved-cpus'.  The isolated CPUs will be identified by looking at
    "/sys/devices/system/cpu/isolated" and treated separately from the
    platform CPUs (which are used to run infrastructure pods).
    
    A plugin (outside the scope of this commit) exposes the isolated
    CPUs to kubelet via the device plugin API.
    
    If a pod specifies some number of "isolcpus" resources, the device manager
    will allocate them.  In this code we check whether such resources have
    been allocated, and if so we set the container cpuset to the isolated
    CPUs.  This does mean that it really only makes sense to specify "isolcpus"
    resources for best-effort or burstable pods, not for guaranteed ones since
    that would throw off the accounting code.  In order to ensure the accounting
    still works as designed, if "isolcpus" are specified for guaranteed pods,
    the affinity will be set to the non-isolated CPUs.
    
    Story: 2008760
    Task: 42161
    Change-Id: I7bd2eabb4c82faea63e3ad129ef735b9d1223e11
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit a436a17d711db209c2bc8802360b9f31df16c237
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Fri Apr 24 02:12:35 2020 -0400

kubelet cpumanager patches for low-latency
    
    In order to minimize latency as much as possible, kubernetes containers
    require isolation of platform CPUs, isolcpus CPUs, and shared CPUs. For
    Guaranteed pods, we also need to disable CFS quota throttling.
    
    Infrastructure pods are allowed to run on platform CPUs since they're
    basically doing platform work.  This frees up some resources on
    application CPUs for "normal" kubernetes containers.
    
    Story: 2008760
    Task: 42159
    Change-Id: I28c99565ed8081496f6d8be4aa68144a1d3578ed
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit 9d60767e32ab937be2e7d1beaf86c8651dd2ac5a
Author: Li Zhou <li.zhou@windriver.com>
Date:   Wed Mar 31 23:32:19 2021 -0400

ntp: fix CVE-2020-13817
    
    Update ntp source package to:
    ntp-4.2.6p5-29.el7.centos.2.src.rpm
    In fact it is version ntp-4.2.6p5-29.el7_8.2.
    (Refer to https://git.centos.org/rpms/ntp/c/
    e9ba41e9edf8efad8f090aad24845b8f4db0668d?branch=c7)
    
    Story: 2008532
    Task: 41691
    Signed-off-by: Li Zhou <li.zhou@windriver.com>
    Change-Id: If5db6b15b9c01a20a614bb160bba575c6b578d3e

commit 7badc1dad154bd28a8d299d748854dad53606c82
Author: Babak Sarashki <babak.sarashki@windriver.com>
Date:   Wed Mar 3 12:15:52 2021 +0000

integ: add nvidia gpu-operator helm charts
    
    This commit adds nvidia gpu-operator helm charts use case for
    custom container runtime feature. To load nvidia-gpu-operator
    on starlingx:
    
    system service-parameter-add platform container_runtime \
    custom_container_runtime=\
    nvidia:/usr/local/nvidia/toolkit/nvidia-container-runtime
    
    And define  runtimeClass for nvidia gpu  pods:
    
    kind: RuntimeClass
    apiVersion: node.k8s.io/v1beta1
    metadata:
      name: nvidia
    handler: nvidia
    
    The above will direct all containerd creations of pods with nvidia
    runtimeClass to nvidia-container-runtime -- where the nvidia-conta
    iner-runtime is installed by the operator onto a hostMount.
    
    Story: 2008434
    Task: 41978
    
    Signed-off-by: Babak Sarashki <babak.sarashki@windriver.com>
    Change-Id: Ifea8cdf6eb89a159f446c53566279e72fcf0e45e

commit 9c8d4bbcfb0d2b84bfc17276f4afa906b8e97686
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Wed Jul 15 19:45:24 2020 -0400

fix net/http caching of broken persistent connections
    
    The net/http transport code is currently broken, it keeps broken
    persistent connections in the cache if a write error happens during
    h2 handshake.
    
    This is documented in the upstream bug at:
    https://github.com/golang/go/issues/40213
    
    The problem occurs because in the "go" compiler the http2 code is
    imported into http as a bundle, with an additional "http2" prefix
    applied.  This messes up the erringRoundTripper handling because
    the name doesn't match.
    
    The solution is to have the "go" compiler look for an interface
    instead, so we add a new dummy function that doesn't actually do
    anything and then the "go" compiler can check whether the specified
    RoundTripper implements the dummy function.
    
    Specifically for Kubernetes we need to update the http2 code in the
    "vendor" subdirectory.  A separate change is being made in the "go"
    compiler.
    
    Partial-Bug: 1887438
    Depends-On: https://review.opendev.org/c/starlingx/compile/+/780669
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
    Change-Id: I95dcbda879973524cd23b2a374537a675ce9435f

commit f161f7f18e6a12592f8e807b15576d6609d5946e
Author: Jim Gauld <James.Gauld@windriver.com>
Date:   Mon Mar 29 12:31:25 2021 +0000

Revert "integ: gpu-operator helm charts"
    
    This reverts commit 41bdf53f65684b54abaa3098a5fe3acf568cdf2a.
    
    Reason for revert: gpu operator patch is breaking stx-master build.
    
    e.g.,
    08:06:44 Failed to build packages:  gpu-operator-1.6.0-0.tis.1.src.rpm; problem with:
    Patch #2 (enablement-support-on-starlingx-cloud-platform.patch):
    . .
    Skipping patch.
    1 out of 1 hunk ignored -- saving rejects to file deployments/gpu-operator/templates/operator.yaml.rej
    patching file deployments/gpu-operator/values.yaml
    error: Bad exit status from /var/tmp/rpm-tmp.VQuqLh (%prep)
    
    Change-Id: Id7a05987586582c940d605874d1e0f813333f2c3

commit 41bdf53f65684b54abaa3098a5fe3acf568cdf2a
Author: Babak Sarashki <babak.sarashki@windriver.com>
Date:   Wed Mar 3 12:15:52 2021 +0000

integ: gpu-operator helm charts
    
    This commit adds nvidia gpu-operator helm charts use case for
    custom container runtime feature. To load nvidia-gpu-operator
    on starlingx:
    
    system service-parameter-add platform container_runtime \
    custom_container_runtime=\
    nvidia:/usr/local/nvidia/toolkit/nvidia-container-runtime
    
    And define  runtimeClass for nvidia gpu  pods:
    
    kind: RuntimeClass
    apiVersion: node.k8s.io/v1beta1
    metadata:
      name: nvidia
    handler: nvidia
    
    The above will direct all containerd creations of pods with nvidia
    runtimeClass to nvidia-container-runtime -- where the nvidia-conta
    iner-runtime is installed by the operator onto a hostMount.
    
    Story: 2008434
    Task: 41978
    
    Signed-off-by: Babak Sarashki <babak.sarashki@windriver.com>
    Change-Id: I999804d4697349bc0966d0a6e653d7bce15e18fc

commit 3832fabeff1493d424593ec502d261508d9e6e75
Author: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
Date:   Mon Mar 22 15:06:41 2021 -0400

Upgrade pf-bb-config to version 21.3
    
    Upgrade of pf-bb-config package to v21.3 is required by Intel in order
    to have better support to ACC100 (Mount Bryce) device.
    
    Story: 2008440
    Task: 42090
    Change-Id: I2af1ca9fc43ae78f41f30f4bde255afaacb56c46
    Signed-off-by: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>

commit 852ec5ed538f5091ee7e6aa604be68295c09d21b
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Thu Mar 4 17:36:54 2021 +0200

Add custom apps in the k8s-pod-recovery service
    
    At startup, there might be pods that are left in unknown states.
    The k8s-pod-recovery service takes care of
    recovering these unknown pods in specific namespaces.
    To fix this for custom apps that are not part of starlingx,
    we modify the service to look into the /etc/k8s-post-recovery.d
    directory for conf files. Any app that needs to be recovered by this
    service will have to create a conf file e.g the app-1 will create
    /etc/k8s-post-recovery.d/APP_1.conf which will contain the following:
    namespace=app-1-namespace
    
    Closes-Bug: 1917781
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>
    Change-Id: I8febdb685d506cff3c34946163612cafdab3e3a8

commit 6169cc5d81f809a1237ba341f7cb87d09fdd811e
Author: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
Date:   Thu Mar 11 09:12:49 2021 -0500

Handle labeled pods after stabilized
    
    Pods that are in a k8s deployment, daemonset, etc can be labeled as
    restart-on-reboot="true", which will automatically cause them to be
    restarted after the worker manifest has completed in an AIO system.
    It may happen, however, that k8s-pod-recovery service is started
    before the pods are scheduled and created at the node the script is
    running on, causing them to be not restarted. The proposed solution is
    to wait for stabilization of labeled pods before restarting them.
    
    Closes-Bug: 1900920
    Signed-off-by: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
    Change-Id: I5c73bd838ab2be070bd40bea9e315dcf3852e47f

commit cb85cff32ba0afc48fbe16ab94dd36edc979fbb4
Author: Zhixiong Chi <zhixiong.chi@windriver.com>
Date:   Wed Jan 20 21:41:20 2021 -0500

dhcp: fix CVE-2019-6470
    
    Upgrade dhcp pkg to dhcp-4.2.5-82.el7.centos.src.rpm
    
    Adjust the context of the patch to match to apply the new version.
    At the same time as the new version depends on the bind-export
    pacakges, so we also add the dependence package in tools repo.
     bind-export-libs-9.11.4-26.P2.el7.x86_64.rpm
     bind-export-devel-9.11.4-26.P2.el7.x86_64.rpm
    
    In addition, since the patch dhcp-dhclient_ipv6_prefix.patch set the
    default prefixlen to 128, which is usually the specifications call
    for host address and it doesn't include any on-link information.
    By contrast, 64 indicates that's subnet area, and this vaule is used
    frequently as usual. So we still use the previous value 64.
    As a result we don't need to modify the relevant place where every
    application code needed for the compatibility any more.
    
    Depends-On: https://review.opendev.org/c/starlingx/tools/+/772241
    
    Story: 2008532
    Task: 41638
    Change-Id: I0305711790d8e3fb1adfa69e1077468456b65d84
    Signed-off-by: Zhixiong Chi <zhixiong.chi@windriver.com>

commit 7d7fe3dc61864e38c6f183aa6ae7583844b44183
Author: Joe Slater <joe.slater@windriver.com>
Date:   Wed Feb 24 17:39:05 2021 -0500

sudo: fix CVE-2021-3156
    
    Advance to sudo-1.8.23-10.el7_9.1.src.rpm.
    
    Closes-Bug: 1916946
    Change-Id: Ibb90439c77d6f5b1badcadb37080ff9e330787d5
    Signed-off-by: Joe Slater <joe.slater@windriver.com>

commit eccff3b0e661592084d9114a9a41816761e1f9b5
Author: Steven Webster <steven.webster@windriver.com>
Date:   Wed Feb 17 12:39:52 2021 -0500

Uprev SR-IOV CNI image
    
    This commit uprevs the SR-IOV CNI image to pick up a few bug
    fixes.  Specifically, this commit will allow rate-limiting
    configuration on a VF to be retained after the VF has been
    used by a pod (and pod subsequently deleted).
    
    Testing:
    
    NICs:
    Ethernet Controller X710 for 10GbE SFP+
    Mellanox MT27700 Family [ConnectX-4]
    
    Functional:
    Connectivity testing (kernel + DPDK)
    Devices allocated appropriately to pod
    Rate-limiting information retained after pod deletion
    
    Partial-Bug: #1915951
    
    Signed-off-by: Steven Webster <steven.webster@windriver.com>
    Change-Id: I32395c4805164401519cde8bc503f040c4187250

commit 8a33372bee65b517850245b55f771e3cd6bba0ff
Author: Babak Sarashki <zbsarashki@gmail.com>
Date:   Thu Feb 4 23:21:48 2021 +0000

Add: PF Baseband Device config application for ACC100
    
    This introduces PF BBDEV (baseband device) Configuration Application
    "pf_bb_config" and inih. PF BBDEV program accesses the configuration
    space and sets the various parameters through memory-mapped IO
    read/writes. This is needed for Intel ACC100 (Mt Bryce) configuration
    and QMGR related settings.
    
    PF BBDEV requires inih for parsing .INI configuration file. This
    commit adds the inih for static linkage with PF BBDEV.
    
    Story: 2008440
    Task: 41472
    Signed-off-by: Babak Sarashki <zbsarashki@gmail.com>
    Change-Id: Idaebcac5d0021d5c11c7ab27e13176139ba66c3b

commit 7b5b3aeabfdb47b51fce5f1591d82fd3ca5d9672
Author: Zhixiong Chi <zhixiong.chi@windriver.com>
Date:   Wed Feb 10 21:00:04 2021 -0500

Revert "dhcp: fix CVE-2019-6470"
    
    This reverts commit 613fbf258f72042f912a1fde5608168b1068db36.
    
    Since this upversioned package updates the prefixlen to 128, and it
    will occur all hosts offline after booting off the controller-0.
    At the same time this issue will block the use of recent loads for
    both development and test activities. So we revert the patch firstly,
    and investigate deeply then send the new review and request of the
    upgraded patch with the appropriate offline fix.
    
    Closes-Bug: #1915050
    
    Signed-off-by: Zhixiong Chi <zhixiong.chi@windriver.com>
    Change-Id: I02ecaa1bda463efb38d9c32a47f2221d0de7f99d

commit 29dd2fd42acf3e37e53715ca4755b8b089c743cb
Author: Li Zhou <li.zhou@windriver.com>
Date:   Tue Jan 26 07:50:09 2021 +0000

openssh: fix CVE-2018-15473 from source build
    
    Upgrade to openssh-7.4p1-21 for fixing CVE.
    
    Story: 2008532
    Task: 41668
    Signed-off-by: Li Zhou <li.zhou@windriver.com>
    Change-Id: Ic3e10b3455587bba16585fe8e235c4c0655f1e3e

commit d053c675546e944d0a08fb6f8d2b831647f70663
Author: Li Zhou <li.zhou@windriver.com>
Date:   Tue Jan 26 07:21:41 2021 +0000

sudo: fix CVE-2019-18634
    
    Upgrade to sudo-1.8.23-10 for fixing CVE.
    
    Story: 2008532
    Task: 41689
    Signed-off-by: Li Zhou <li.zhou@windriver.com>
    Change-Id: I863e66ee887de40d75db7951f4ba408ad022c131

commit a0b2acecaac080345c1cd42c3ad7fc05d75ac96a
Author: Zhixiong Chi <zhixiong.chi@windriver.com>
Date:   Mon Jan 25 03:49:38 2021 -0500

grub2: fix CVE-2020-15707
    
    Avoid to the heap-based buffer overflow.
    
    Upgrade to the below package to fix the CVE issue:
     grub2-2.02-0.86.el7.centos.src.rpm
    
    At the same time adjust the context and drop
    0004-grub2-remove-32b-requirements.patch since it already had been
    included in the new version.
    
    Story: 2008532
    Task: 41664
    Change-Id: I7943127323ee28457ffe0a4ece54764633f86d9f
    Signed-off-by: Zhixiong Chi <zhixiong.chi@windriver.com>

commit 613fbf258f72042f912a1fde5608168b1068db36
Author: Zhixiong Chi <zhixiong.chi@windriver.com>
Date:   Wed Jan 20 21:41:20 2021 -0500

dhcp: fix CVE-2019-6470
    
    Upgrade dhcp pkg to dhcp-4.2.5-82.el7.centos.src.rpm
    
    At the same time since the new version depends on the bind-export
    pacakge, so we also add the dependence package in tools repo.
    
    Depends-On: https://review.opendev.org/c/starlingx/tools/+/771744
    
    Story: 2008532
    Task: 41638
    Change-Id: Ic25b4404475a6f914e5a524db7d60d7e9dcffc85
    Signed-off-by: Zhixiong Chi <zhixiong.chi@windriver.com>

commit 8ec4e97b34e49d2ad212bc16f7863fd83eff6f8e
Author: Don Penney <don.penney@windriver.com>
Date:   Thu Dec 17 13:26:44 2020 -0500

Add auto-version for remaining stx/integ packages
    
    Update remaining StarlingX packages with hardcoded TIS_PATCH_VER to
    use PKG_GITREVCOUNT where possible, with offsets as needed to ensure
    the version is incremented above the hardcoded version.
    
    Change-Id: I9b40cd7e41c0cd713b73741ac3c8cab41d358642
    Story: 2008455
    Task: 41461
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit 46d8d8fdf1ec75d74df6e2f20dff5f31732b9dc7
Author: Robert Church <robert.church@windriver.com>
Date:   Sat Dec 12 01:08:54 2020 -0500

Add conditions to when RBD devices are unmounted
    
    ceph-preshutdown.sh is called as a post operation when docker is
    stopped/restarted. Based on current service dependencies, when docker is
    restarted this will also trigger a restart of containerd.
    
    Puppet manifests will restart containerd and docker for various
    operations both on system boot and during runtime operations when their
    configuration has changed.
    
    This update adds conditions to ensure that the RBD devices are only
    unmounted when the system is shutting down. This avoids the RBD backed
    persistent volumes from being forcibly removed from running pods and
    being remounted read-only during these restart scenarios.
    
    Change-Id: I7adfddf135debcc8bcaa1f93866e1a276b554c88
    Closes-Bug: #1901449
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit d815cfe2f2003f4a64a5e0b5348b0e9daabb58df
Author: Nicolas Alvarez <nicolas.alvarez@windriver.com>
Date:   Thu Dec 3 17:42:31 2020 -0300

Uninstall SNMP RPM Host-Based from STX.
    
    Uninstall SNMP RPM Host-Based from starlingx/integ repo because it
    will be containerized.
    Also disable snmp from networking/lldpd/centos/lldpd.spec file.
    
    Story: 2008132
    Task: 41322
    Depends-On: https://review.opendev.org/761792
    Signed-off-by: Nicolas Alvarez <nicolas.alvarez@windriver.com>
    
    Change-Id: Ifda06a5eb3bd0ec9683823b643e6d9cc0e7c97e2

commit 39f6f92cc888b3893b4d4717fba1599056382997
Author: Angie Wang <angie.wang@windriver.com>
Date:   Mon Sep 28 11:24:12 2020 -0400

Armada: add configurations for helm sql storage backend
    
    Configmap is the default helmv2 storage backend to store
    release information but its 1MB resource limit prevents
    scaling up stx openstack workers, so we want to use sql
    as helm storage backend.
    
    Update armada chart to support sql storage backend
    configuration for helm/tiller.
    
    Upstream review: https://review.opendev.org/#/c/759899/
    
    Partial-Bug: 1887677
    Change-Id: Ifcb7f28e99413be5a0dbfddf684ca064866860f5
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

tags:

added: in-f-centos8