Bug #1927515 “ETCD poor latency performance and failure under lo...” : Bugs : StarlingX

Jim Gauld (jgauld) on 2021-05-06

Changed in starlingx:
assignee:	nobody → Jim Gauld (jgauld)
status:	New → Confirmed
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-06: Fix proposed to utilities (master)

#1

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/utilities/+/790094

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-06: Fix proposed to config-files (master)

#2

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config-files/+/790098

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-06: Fix proposed to stx-puppet (master)

#3

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/790132

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-07: Fix merged to utilities (master)

#4

Reviewed: https://review.opendev.org/c/starlingx/utilities/+/790094
Committed: https://opendev.org/starlingx/utilities/commit/6045b1b8a0d8ed6a94d06cdfc994bf1a5fa9dbb5
Submitter: "Zuul (22348)"
Branch: master

commit 6045b1b8a0d8ed6a94d06cdfc994bf1a5fa9dbb5
Author: Jim Gauld <email address hidden>
Date: Thu May 6 11:58:34 2021 -0400

Provide utility script is-rootdisk-device.sh

    This provides a utility script to determine which disk contains the root
    filesystem. This can also be used as a helper function for io-scheduler
    udev rules that require specific configuration for root disk.

    Example usage:
    /usr/local/bin/is-rootdisk-device.sh
    ROOTDISK_DEVICE=sda

/usr/local/bin/is-rootdisk-device.sh /dev/sda
ROOTDISK_DEVICE=sda

/usr/local/bin/is-rootdisk-device.sh /dev/sdb
(i.e., no output)

    Partial-Bug: 1927515
    Signed-off-by: Jim Gauld <email address hidden>
    Change-Id: Ib0d4a161a407b08d294c5ff9aa0b7590961e18c9

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-07: Fix merged to config-files (master)

#5

Reviewed: https://review.opendev.org/c/starlingx/config-files/+/790098
Committed: https://opendev.org/starlingx/config-files/commit/e82d1b9e70dd50fbec76db7cfc51e433c5b6bf9e
Submitter: "Zuul (22348)"
Branch: master

commit e82d1b9e70dd50fbec76db7cfc51e433c5b6bf9e
Author: Jim Gauld <email address hidden>
Date: Thu May 6 12:14:39 2021 -0400

Configure io-scheduler udev rules for ETCD and HW-RAID

This configures io-scheduler udev rules for etcd and hw-raid
performance.

This sets the io-scheduler to 'cfq' tuned parameters for 'controller'
nodetype with root file-system disk.

This sets io-scheduler to 'noop' for HW-RAID Dell PowerEdge R720,
this was a missing commit from pre-starlingx.

    Partial-Bug: 1927515
    Depends-On: https://review.opendev.org/c/starlingx/utilities/+/790094
    Signed-off-by: Jim Gauld <email address hidden>
    Change-Id: Iaf1de8d962d1e8d253c72e680370666a2aed8c8e

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-07: Fix merged to stx-puppet (master)

#6

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/790132
Committed: https://opendev.org/starlingx/stx-puppet/commit/0b429c7cb0c16e34755c1b1e146ebb8b006d44dc
Submitter: "Zuul (22348)"
Branch: master

commit 0b429c7cb0c16e34755c1b1e146ebb8b006d44dc
Author: Jim Gauld <email address hidden>
Date: Thu May 6 12:33:24 2021 -0400

Configure etcd service critical process nice and ionice

    The etcd server is a critical "interactive" process that requires
    low-latency. This process has many etcd threads, each worker does
    minimal work and wakes up frequently. The threads do small amount of
    writes to commit.

    The etcd server will start exceeding heartbeat interval of 100ms and
    the election timeout of 1000ms under load and independent disk stress,
    if not properly tuned as a critical process. This cascades into many
    failures.

This requires io-scheduler 'cfq' to take advantage of io-nice policy
and priority. This bumps up to best-effort/0 from best-effort/4.

This sets nice -19 from nice 0. This helps tremendously with
interactive processes for linux CFS (completely-fair-scheduler).

    With tuned settings, under application load and additional disk stress,
    we see a dramatic reduction of 'blocked_max' and no more kern.log
    etcdserver related errors for exceeding the timeouts.
    We see dramatic improvement to system responsiveness for kubectl,
    kube-apiserver. This prevents pods from failing when clients they
    cannot renew lease.

Note that 'blocked_max' scheduler stats for this process represents
involuntary wait for disk related delay, scheduling delay, etc.

    Testing coverage:
    - various root disk HW: RAID, NVMe, SSD, VBox
    - sanity on multiple labs: R730_1 with RAID, WFP13_14

    Configuration change used in testing:
    - baseline: deadline, best-effort/4,
    - system under test: cfq, best-effort/0, nice -19
    - dd stress was single writer to root disk:
      while true; do
        dd if=/dev/zero of=./test.dd bs=200K count=20000 conv=fsync
      done

    Compared results and observe system behaviour:
    - watch kern.log for etcserver 'took too long', and 'wal: sync'
    - watch fm alarms
    - watch kubectl pod status
    - observe performance with: iotop, schedtop, iostat

    Tests performed:
    - DRBD resync with and without dd writer stress
    - swact with and without dd stress
    - large application apply + dd writer stress
    - launch large number of pods (eg, scale nginx with 80 pods),
      watch systemctl status commands using strace to check for hang
    - copy very large files, create big tarballs, write mkisofs iso
    - host install

    Closes-Bug: 1927515
    Depends-On: https://review.opendev.org/c/starlingx/config-files/+/790098
    Signed-off-by: Jim Gauld <email address hidden>
    Change-Id: Ieeeba5c1375d8d99401f839c7409a9de356fda87

Reviewed:  https://review.opendev.org/c/starlingx/stx-puppet/+/790132
Committed: https://opendev.org/starlingx/stx-puppet/commit/0b429c7cb0c16e34755c1b1e146ebb8b006d44dc
Submitter: "Zuul (22348)"
Branch:    master

commit 0b429c7cb0c16e34755c1b1e146ebb8b006d44dc
Author: Jim Gauld <james.gauld@windriver.com>
Date:   Thu May 6 12:33:24 2021 -0400

Configure etcd service critical process nice and ionice
    
    The etcd server is a critical "interactive" process that requires
    low-latency. This process has many etcd threads, each worker does
    minimal work and wakes up frequently. The threads do small amount of
    writes to commit.
    
    The etcd server will start exceeding heartbeat interval of 100ms and
    the election timeout of 1000ms under load and independent disk stress,
    if not properly tuned as a critical process. This cascades into many
    failures.
    
    This requires io-scheduler 'cfq' to take advantage of io-nice policy
    and priority. This bumps up to best-effort/0 from best-effort/4.
    
    This sets nice -19 from nice 0. This helps tremendously with
    interactive processes for linux CFS (completely-fair-scheduler).
    
    With tuned settings, under application load and additional disk stress,
    we see a dramatic reduction of 'blocked_max' and no more kern.log
    etcdserver related errors for exceeding the timeouts.
    We see dramatic improvement to system responsiveness for kubectl,
    kube-apiserver. This prevents pods from failing when clients they
    cannot renew lease.
    
    Note that 'blocked_max' scheduler stats for this process represents
    involuntary wait for disk related delay, scheduling delay, etc.
    
    Testing coverage:
    - various root disk HW: RAID, NVMe, SSD, VBox
    - sanity on multiple labs: R730_1 with RAID, WFP13_14
    
    Configuration change used in testing:
    - baseline: deadline, best-effort/4,
    - system under test: cfq, best-effort/0, nice -19
    - dd stress was single writer to root disk:
      while true; do
        dd if=/dev/zero of=./test.dd bs=200K count=20000 conv=fsync
      done
    
    Compared results and observe system behaviour:
    - watch kern.log for etcserver 'took too long', and 'wal: sync'
    - watch fm alarms
    - watch kubectl pod status
    - observe performance with: iotop, schedtop, iostat
    
    Tests performed:
    - DRBD resync with and without dd writer stress
    - swact with and without dd stress
    - large application apply + dd writer stress
    - launch large number of pods (eg, scale nginx with 80 pods),
      watch systemctl status commands using strace to check for hang
    - copy very large files, create big tarballs, write mkisofs iso
    - host install
    
    Closes-Bug: 1927515
    Depends-On: https://review.opendev.org/c/starlingx/config-files/+/790098
    Signed-off-by: Jim Gauld <james.gauld@windriver.com>
    Change-Id: Ieeeba5c1375d8d99401f839c7409a9de356fda87

Ghada Khalil (gkhalil) on 2021-05-07

tags:	added: stx.containers
Changed in starlingx:
importance:	Undecided → Medium
tags:	added: stx.6.0

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-18: Fix proposed to stx-puppet (f/centos8)

#7

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792009

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-18: Change abandoned on stx-puppet (f/centos8)

#8

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792009

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-18: Fix proposed to stx-puppet (f/centos8)

#9

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792013

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-18: Change abandoned on stx-puppet (f/centos8)

#10

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792013

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-18: Fix proposed to stx-puppet (f/centos8)

#11

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792018

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-18: Change abandoned on stx-puppet (f/centos8)

#12

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792018

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-18: Fix proposed to stx-puppet (f/centos8)

#13

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/792029

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-19: Fix proposed to utilities (f/centos8)

#14

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/utilities/+/792213

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-28: Fix merged to utilities (f/centos8)

#15

Download full text (29.4 KiB)

Reviewed: https://review.opendev.org/c/starlingx/utilities/+/792213
Committed: https://opendev.org/starlingx/utilities/commit/c4d042615e6fe8944a4628fa1a29e86e012a9bf5
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 557cada006fd5a3bd81ad5af387c37657801f8c5
Author: Fernando Theirs <email address hidden>
Date: Thu May 13 16:21:47 2021 -0300

Collect is missing etcdctl output

    When the collect tool is run, it does not include the contents
    of the etcd database. Fixes have been made for this to dump the
    contents in "etcd_database.dump" file.

Verify if etcd access is secured. In that case, certificates
will be used.

Closes-Bug: 1911935

Signed-off-by: Fernando Theirs <email address hidden>
Change-Id: Idbc60edffa978a7a6bead939a4eb54f4abae29a6

commit 6045b1b8a0d8ed6a94d06cdfc994bf1a5fa9dbb5
Author: Jim Gauld <email address hidden>
Date: Thu May 6 11:58:34 2021 -0400

Provide utility script is-rootdisk-device.sh

    This provides a utility script to determine which disk contains the root
    filesystem. This can also be used as a helper function for io-scheduler
    udev rules that require specific configuration for root disk.

    Example usage:
    /usr/local/bin/is-rootdisk-device.sh
    ROOTDISK_DEVICE=sda

/usr/local/bin/is-rootdisk-device.sh /dev/sda
ROOTDISK_DEVICE=sda

/usr/local/bin/is-rootdisk-device.sh /dev/sdb
(i.e., no output)

    Partial-Bug: 1927515
    Signed-off-by: Jim Gauld <email address hidden>
    Change-Id: Ib0d4a161a407b08d294c5ff9aa0b7590961e18c9

commit 88a678f142cfe86c58b6405aae6babbc08de0e8f
Author: Chen, Haochuan Z <email address hidden>
Date: Fri Mar 26 09:09:41 2021 +0800

Add packages to stx-ceph-manager image

This update installs ceph-mgr, ceph-mon, ceph-osd packages as part
of stx-ceph-manager image.

Partial-Bug: 1920882

Change-Id: I4afde8b1476e14453fac8561f1edde7360b8ee96
Signed-off-by: Chen, Haochuan Z <email address hidden>

commit 09b3542fcc6cc0300a9cae0d302225e6977780f3
Author: Scott Little <email address hidden>
Date: Thu Mar 25 11:49:49 2021 -0400

Set SW_VERSION 21.05

Prep for the StarlingX 5.0 release.
SW_VERSION, also known as PLATFORM_RELEASE, uses YY.MM format.

    Story: 2008055
    Task: 42115
    Signed-off-by: Scott Little <email address hidden>
    Change-Id: If7c91a2b523358269ae4850961cf4189ffcd7a75

commit ae4cefd0e2a0001476782c31e1003810da2b4838
Author: Chris Friesen <email address hidden>
Date: Thu Mar 4 18:04:12 2021 -0500

add dcmanager-audit-worker to patch restart script

Need to add the new process to the patch restart script.

    Story: 2007267
    Task: 41999
    Signed-off-by: Chris Friesen <email address hidden>
    Change-Id: If5faa806bd0d52ddbf1343b064959f4207cf975a

commit 27fce5a52321f3014fa8ae9181d344bc774289da
Author: Enzo Candotti <email address hidden>
Date: Mon Feb 1 12:47:38 2021 -0300

Add resource CPU and memory info in collect

This adds commands to collect more data to debug
resource allocations and...

Reviewed:  https://review.opendev.org/c/starlingx/utilities/+/792213
Committed: https://opendev.org/starlingx/utilities/commit/c4d042615e6fe8944a4628fa1a29e86e012a9bf5
Submitter: "Zuul (22348)"
Branch:    f/centos8

commit 557cada006fd5a3bd81ad5af387c37657801f8c5
Author: Fernando Theirs <Fernando.Theirs@windriver.com>
Date:   Thu May 13 16:21:47 2021 -0300

Collect is missing etcdctl output
    
    When the collect tool is run, it does not include the contents
    of the etcd database. Fixes have been made for this to dump the
    contents in "etcd_database.dump" file.
    
    Verify if etcd access is secured. In that case, certificates
    will be used.
    
    Closes-Bug: 1911935
    
    Signed-off-by: Fernando Theirs <Fernando.Theirs@windriver.com>
    Change-Id: Idbc60edffa978a7a6bead939a4eb54f4abae29a6

commit 6045b1b8a0d8ed6a94d06cdfc994bf1a5fa9dbb5
Author: Jim Gauld <james.gauld@windriver.com>
Date:   Thu May 6 11:58:34 2021 -0400

Provide utility script is-rootdisk-device.sh
    
    This provides a utility script to determine which disk contains the root
    filesystem. This can also be used as a helper function for io-scheduler
    udev rules that require specific configuration for root disk.
    
    Example usage:
    /usr/local/bin/is-rootdisk-device.sh
    ROOTDISK_DEVICE=sda
    
    /usr/local/bin/is-rootdisk-device.sh /dev/sda
    ROOTDISK_DEVICE=sda
    
    /usr/local/bin/is-rootdisk-device.sh /dev/sdb
    (i.e., no output)
    
    Partial-Bug: 1927515
    Signed-off-by: Jim Gauld <james.gauld@windriver.com>
    Change-Id: Ib0d4a161a407b08d294c5ff9aa0b7590961e18c9

commit 88a678f142cfe86c58b6405aae6babbc08de0e8f
Author: Chen, Haochuan Z <haochuan.z.chen@intel.com>
Date:   Fri Mar 26 09:09:41 2021 +0800

Add packages to stx-ceph-manager image
    
    This update installs ceph-mgr, ceph-mon, ceph-osd packages as part
    of stx-ceph-manager image.
    
    Partial-Bug: 1920882
    
    Change-Id: I4afde8b1476e14453fac8561f1edde7360b8ee96
    Signed-off-by: Chen, Haochuan Z <haochuan.z.chen@intel.com>

commit 09b3542fcc6cc0300a9cae0d302225e6977780f3
Author: Scott Little <scott.little@windriver.com>
Date:   Thu Mar 25 11:49:49 2021 -0400

Set SW_VERSION 21.05
    
    Prep for the StarlingX 5.0 release.
    SW_VERSION, also known as PLATFORM_RELEASE, uses YY.MM format.
    
    Story: 2008055
    Task: 42115
    Signed-off-by: Scott Little <scott.little@windriver.com>
    Change-Id: If7c91a2b523358269ae4850961cf4189ffcd7a75

commit ae4cefd0e2a0001476782c31e1003810da2b4838
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Thu Mar 4 18:04:12 2021 -0500

add dcmanager-audit-worker to patch restart script
    
    Need to add the new process to the patch restart script.
    
    Story: 2007267
    Task: 41999
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
    Change-Id: If5faa806bd0d52ddbf1343b064959f4207cf975a

commit 27fce5a52321f3014fa8ae9181d344bc774289da
Author: Enzo Candotti <enzo.candotti@windriver.com>
Date:   Mon Feb 1 12:47:38 2021 -0300

Add resource CPU and memory info in collect
    
    This adds commands to collect more data to debug
    resource allocations and usage for the system
    and containerization.
    
    The following command outputs were added:
    system host-cpu-list <hostname>
    ctr -n k8s.io images list
    kubectl describe nodes | grep -e Capacity: -B1
      -A40 | grep -e 'System Info:' -B13 | grep
      -v 'System info:'
    lscpu
    lscpu -e
    cat /sys/devices/system/cpu/isolated
    
    Partial-Bug: 1886868
    
    Signed-off-by: Enzo Candotti <enzo.candotti@windriver.com>
    Change-Id: Ib26b4bb22d787451cc96820b3a413b410f79d49d

commit a4365bf1b2124c9587d78eff82594ce798fb7e06
Author: Enzo Candotti <enzo.candotti@windriver.com>
Date:   Mon Jan 25 13:16:55 2021 -0300

Fix clear passwords presented in collected log files
    
    The --relative option used by rsync resulted in changes
    to the paths of some collected files in the tarball,
    which in turn resulted in failures to mask the passwords
    in those files (as they were no longer in the expected
    location).
    
    This update removes the --relative option, restoring
    the location to that expected by the masking.
    
    Closes-Bug: 1906524
    
    Signed-off-by: Enzo Candotti <enzo.candotti@windriver.com>
    Change-Id: I001d3e2b1d2ec0ff88a129fe7b31bee21928686b

commit 5c8c8434389d67a24758ea36b9439ef946c46dfd
Author: Jose Infanzon <jose.infanzon@windriver.com>
Date:   Wed Dec 16 17:25:18 2020 -0300

Add new logs to the inventory info files
    
    Now collect_tool implements a new param
    '-i' when inventory info is required to
    be retrieved. By default it will not be
    executed.
    
    The following command outputs were added:
    system show
    system host-show <hostname>
    system host-port-list <hostname>
    system host-if-list <hostname>
    system interface-network-list <hostname>
    system network-list
    system host-memory-list <hostname>
    system host-label-list <hostname>
    system host-disk-list <hostname>
    system host-stor-list <hostname>
    system host-lvg-list <hostname>
    system host-pv-list <hostname>
    
    the execution of:
    
    -system host-show
    -system host-port-list
    -system host-if-list
    -system interface-network-list
    -system host-ethernet-port-list
    -system host-memory-list
    -system host-label-list
    -system host-disk-list
    -system host-stor-list
    -system host-lvg-list
    -system host-pv-list
    
    on a simplex lab, took 12 seconds to complete
    
    Story: 2008452
    Task: 41428
    
    Signed-off-by: Jose Infanzon <jose.infanzon@windriver.com>
    Change-Id: I223a3ef239a00a1e9dddb86d04874f13c33163e9

commit 538d47cb81317976a7ed31024a05d2fc6bf373bb
Author: Jim Gauld <james.gauld@windriver.com>
Date:   Fri Jan 15 16:06:16 2021 -0500

collect is missing helmv2 output
    
    This adds helm v2 outputs that we used to have before helm v3 upversion.
    This gives 'helm status' and 'helm get values' per application using
    helmv2-cli.
    
    This changes helm v3 commands to run as user sysadmin since helm is
    configured for sysadmin and not root. Otherwise there is missing
    environment and helm files when run via collect sudo.
    
    This corrects one complex expression that does not properly evaluate
    since it requires being run through command line first.
    
    Expressions like CMD="command1 | command2" needs to be run as:
     eval ${CMD}
    instead of just:
     ${CMD}
    e.g.,
     eval "cat filename | python -m json.tool"
    
    Closes-Bug: 1911933
    
    Signed-off-by: Jim Gauld <james.gauld@windriver.com>
    Change-Id: Ie13c78ae325ce0c2f506b320db3e9b51df3f74f2

commit cb00a00c15896b8ab493354e14287cfec29bf2d8
Author: Nicolas Alvarez <nicolas.alvarez@windriver.com>
Date:   Fri Dec 18 18:40:49 2020 -0300

Add new log to the device and interface info files.
    
    collector.spec: modify to add the new scripts with permissions.
    collector_disk: new bash script with smartctl command.
    collector_interfaces: new bash script with ethtool command.
    
    Story: 2008452
    Task: 41434
    
    Signed-off-by: Nicolas Alvarez <nicolas.alvarez@windriver.com>
    Change-Id: I4c7b6e6e3d3fe990750c1fb40b1ab555a63edf83

commit a0736f3dc3f35c70e21eaf62fdb96675ac36ba5d
Author: Don Penney <don.penney@windriver.com>
Date:   Wed Jan 6 14:28:30 2021 -0500

Remove empty package from stx-extensions
    
    Packages defined in a spec with no files do not result in an RPM
    produced by the build. On a rebuild, the build tools scan the spec and
    sees the package defined but does not find a corresponding RPM, and so
    flags the package for a rebuild as a result.
    
    This commit removes the empty package definition from the spec.
    
    Change-Id: I48e725bee837cd27352d38732d6f7155827cfe32
    Partial-Bug: 1910439
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit 676df72f5bbd66e9e8d02cf23ce6721ca169e9f7
Author: pablo bovina <pablo.bovina@windriver.com>
Date:   Fri Dec 18 17:43:57 2020 -0300

Add parameters for list open files command.
    
    This is for performance improvements and usability
    issues for debugging and testing.
    
    Closes-Bug: 1906537
    
    Change-Id: I2e703882219fb986c16fc32e01865fa774094543
    Signed-off-by: Pablo Bovina <pablo.bovina@windriver.com>

commit 312c87a86fd84534c5acd4447efebfda99e16016
Author: Pablo Bovina <pablo.bovina@windriver.com>
Date:   Fri Dec 18 17:55:14 2020 +0000

Revert "Collect tool times out due to long running lsof"
    
    This reverts commit bdd8c02d84de416ab407cb2cbd39e70b216dc524.
    
    Reason for revert:
    <
    The following  command:
    
    "grep awk '($3 !~ /^[0-9]+$/ && /\/mnt\/huge/) || NR==1 {print $0;}'"
    
    after the command "lsof -lwX"
    
    doesn't adds value  for debugging and testing purposes.
    >
    
    Story: 2008452
    Task: 41483
    
    Change-Id: I2f81ff6ca4daa6956ad6d0dd210262df2f9e00e8
    Signed-off-by: Pablo Bovina <pablo.bovina@windriver.com>

commit dbd5dbd8e0b2153ffa04f24649d4bc946ef226d0
Author: albailey <Al.Bailey@windriver.com>
Date:   Sun Dec 13 19:09:00 2020 -0600

Turn off legacy resolver workaround in pip
    
    The legacy resolver was enabled when the new pip was released.
    The requirements have been cleaned up, and sysinv has been
    updated.
    
    pkgconfig is required to be able to install libvirt from pip.
    
    A bindep file has been added to ensure the components required
    to build libvirt-python are available on the zuul host.
    
    The upper constraints have been removed from tox/zuul since
    the headers for components installed by bindep are newer than
    what will work when compiling the older libvirt
    
    Depends-On: https://review.opendev.org/c/starlingx/config/+/766645
    Partial-Bug: #1907678
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: I5905f6cd0bb9bbe379835008ccf3f70102cfc17f

commit bdd8c02d84de416ab407cb2cbd39e70b216dc524
Author: pablo bovina <pablo.bovina@windriver.com>
Date:   Thu Dec 17 15:56:04 2020 -0300

Collect tool times out due to long running lsof
    
    lsof without these options:
    
    -l : conversion of user ID numbers to login names
    -w : suppress warning messages
    -X : skip the reporting of information on all open TCP,
          UDP and UDPLITE IPv4 and IPv6 files.
    
    can lead to timeout failure.
    
    Story: 2008452
    Task: 41465
    
    Change-Id: I688e024c39d8a56ad30e0e944aeb3e3a16aad2fc
    Signed-off-by: Pablo Bovina <pablo.bovina@windriver.com>

commit 467851e8d5128df6060399207843b78a9325c951
Author: Don Penney <don.penney@windriver.com>
Date:   Thu Dec 17 13:20:49 2020 -0500

Add auto-version for remaining stx/utilities packages
    
    Update remaining StarlingX packages with hardcoded TIS_PATCH_VER to
    use PKG_GITREVCOUNT where possible, with offsets as needed to ensure
    the version is incremented above the hardcoded version.
    
    Change-Id: Ia1ade77e12b948040f06806a5417d005488e3bb9
    Story: 2008455
    Task: 41460
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit dd97a2c3bb0f4c4dcc8f283b9e20cfedcc181737
Author: Lu Yao Chen <luyao.chen@windriver.com>
Date:   Wed Dec 16 15:05:08 2020 -0500

Fixing collect tool issue
    
    Fixing error found in patch 763859, missing \
    character in line 104.
    
    Review:
    https://review.opendev.org/c/starlingx/utilities/+/763859
    
    Closes-Bug: 1896116
    
    Signed-off-by: Lu Yao Chen <luyao.chen@windriver.com>
    Change-Id: I80ee960a2b8dab19443a45f8539a5cb602af4efb

commit e7e0ecfab90e1f8843a04dd6dc00e7cf00c2adb1
Author: pablo bovina <pablo.bovina@windriver.com>
Date:   Mon Dec 14 11:12:39 2020 -0300

Add new logs to the alarm info file
    
    Add new log to the alarm.info generated
    by the output of:
            fm event-list --nopaging
    
    Story: 2008452
    Task: 41435
    
    Change-Id: Ic7ea6c33e79b846058723dc1023797526ceeb7af
    Signed-off-by: Pablo Bovina <pablo.bovina@windriver.com>

commit b9b8ef2dc99009c18966c91f05bbe81a78b575f7
Author: pablo bovina <pablo.bovina@windriver.com>
Date:   Mon Dec 14 15:51:42 2020 -0300

Add new logs to the crash info file
    
    Add new log to the crash.info generated
    by the output of: 
            ls -lrtd /var/crash/*
            md5sum /var/crash/*
    
    Story: 2008452
    Task: 41433
    
    Change-Id: I19a532c7cead5b972a9c143860478ea128bfab3d
    Signed-off-by: Pablo Bovina <pablo.bovina@windriver.com>

commit 8c44a5a1065edc24e528c2d44b93c5bb7b5dd858
Author: albailey <Al.Bailey@windriver.com>
Date:   Thu Dec 10 08:53:23 2020 -0600

Enable legacy resolver for pip until requirements are updated
    
    pylint zuul jobs are failing due to incompatible dependencies that
    cause the new version of pip to abort.
    
    Enabling the legacy resolver (for now) so zuul can pass, while we
    fix all the requirements across the different repos.
    
    Related-Bug: 1907125
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: I59e29aadb291574307e8278c9351f961bfc0277f

commit 17c62bd5aa24459bde2894a876d170803f52db3e
Author: Andy Ning <andy.ning@windriver.com>
Date:   Thu Dec 3 09:57:12 2020 -0500

Remove secure hieradata files from collect
    
    Supporting controller puppet manifests apply following DOR introduces
    cached hieradata which will be included in log collect.
    
    This change updated collect to remove the secure hieradata files in the
    cache as they contain clear text passwords.
    
    Change-Id: I17542c9fd778107f065531d02c53c59581fc179e
    Partial-Bug: 1904739
    Depends-On: https://review.opendev.org/c/starlingx/config/+/765373
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit e3f78a9c0ad948b01fdd73a77604601e46289181
Author: Jackie Huang <jackie.huang@windriver.com>
Date:   Tue Oct 27 10:17:31 2020 -0600

cpumap_functions.sh: fix perl experimental feature issue
    
    An experimental feature added in Perl 5.14 allowed each, keys, push,
    pop, shift, splice, unshift, and values to be called with a scalar
    argument. This experiment is considered unsuccessful, and has been
    removed in 5.23 and later releases. So don't use this feature to
    avoid failure:
    localhost:~# platform_expanded_cpu_list
    Experimental keys on scalar is now forbidden at -e line 13.
    
    Closes-Bug: 1901642
    Change-Id: I5898145151e25538c745572da51af26b9251c285
    Signed-off-by: Jackie Huang <jackie.huang@windriver.com>

commit b58d9d20d3d1c5307f12706900645e1ce969b2e4
Author: Martin, Chen <haochuan.z.chen@intel.com>
Date:   Fri Oct 30 14:30:00 2020 +0800

Build image for ceph-manager
    
    For host-based ceph cluster, serivce manager launch service mgr-restful-plugin
    which launch ceph-mgr and daemon ceph-manager. Ceph-manager daemon polls
    ceph cluster status by ceph-mgr restful module and raise or clean alarm to
    to fault manager.
    
    For rook, build a image named stx-ceph-manager, use this image to make a
    deployment, which will take the same task polling containerized ceph cluster
    status and raise or clean alarm.
    
    Story: 2005527
    Task: 41338
    
    Change-Id: Iaaedfc0c7198e102eb4b8c94ab759e9b209e6bfd
    Signed-off-by: Martin, Chen <haochuan.z.chen@intel.com>

commit 43cd10d392e64abd810b8032a01bcbf4a29524b6
Author: Lu Yao Chen <luyao.chen@windriver.com>
Date:   Mon Nov 23 13:58:11 2020 -0500

Masking passwords with collect script
    
    Using collect script to mask cleartext password incidents
    in /var/log/user.log, done by grepping for -password,
    password: prefixes and headers and redacting password
    with xxxxxx string, used user.log with cleartext
    passwords to test
    
    Partial-Bug: 1896116
    
    Signed-off-by: Lu Yao Chen <luyao.chen@windriver.com>
    Change-Id: I3a3c02b61994d53589d673b2335d0eb023adfac6

commit ddd40f08f2086690e8439c9ce8183ead7d41761b
Author: Carmen Rata <carmen.rata@windriver.com>
Date:   Thu Oct 22 23:05:08 2020 -0400

Fix shared libraries file permissions
    
    Updated shared library files permission from
    /usr/lib/systemd/system/ to be non group-writable,
    to fix openscap security violation.
    Verified installation is successful in AIO-SX and
    Standard 2+2 system configurations.
    Ran successfully "taskset" command to check current
    affinity to platforms CPUs.
    
    Story: 2008037
    Task: 40694
    
    Change-Id: If8d7d3becba073ee827e988f1e651a9c8d31d773
    Signed-off-by: Carmen Rata <carmen.rata@windriver.com>

commit 0f0bcf58446cc854397e67c0f6577288f98decf6
Author: Eric MacDonald <eric.macdonald@windriver.com>
Date:   Fri Oct 16 21:44:10 2020 -0400

Exclude /var/log/crash from collect
    
    This update adds support to exclude
    /var/log/crash content from a collect
    operation.
    
    Additional /var/log content exclusions can be
    added to the new /etc/collect/varlog.exclude
    file added by this update.
    
    Change-Id: I657ce2552e36cc3ac296f11ecd7ed1331ec0f13e
    Partial-Fix: 1898602
    Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>

commit db2156eaddeafec0523aeccda45447b3069eb059
Author: Jim Gauld <james.gauld@windriver.com>
Date:   Fri Oct 16 15:41:34 2020 -0400

Affine kswapd* kernel threads to platform cores
    
    The kswapd* kernel tasks are per NUMA node and have floating
    cpu affinity masks spanning those nodes.
    
    On AIO low-latency systems, this affines the kswapd* kernel tasks to
    platform cores. This is a performance improvement for low-latency
    sensitive applications.
    
    Partial-Bug: 1900174
    Change-Id: I20db19978362997b23a69bf591b8e7c23096f492
    Signed-off-by: Jim Gauld <james.gauld@windriver.com>

commit dd5002831818c1a43b90cc3ca704648e6d3e889d
Author: Scott Little <scott.little@windriver.com>
Date:   Thu Sep 17 16:10:31 2020 -0400

Set SW_VERSION/PLATFORM_RELEASE to 20.12
    
    Story: 2008055
    Task: 40919
    Change-Id: If065e2967b3fbe0a5fce41b6cef3a99e5ffd17f3
    Signed-off-by: Scott Little <scott.little@windriver.com>

commit 5bc7a33773e947eef8caeb3f371afa816fbcb774
Author: Don Penney <don.penney@windriver.com>
Date:   Wed Sep 9 17:12:45 2020 -0400

Use newer flake8 to run on ubuntu-focal Zuul machines
    
    flake8 2.5.5 fails on ubuntu-focal zuul machines running python3.8
    with the following error:
    AttributeError: 'FlakesChecker' object has no attribute 'CONSTANT'
    
    The update removes the hacking constraint to use newer flake8. This
    also ignores new warnings/errors, which should be addressed in a
    future update to remove the ignores.
    
    Change-Id: Ib24639adeea4da3063fb403a8e8484937f9e1a9f
    Partial-Bug: 1895054
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit dd902cf3865f2147559b8c8baeecff70ffebe3cb
Author: Jim Gauld <james.gauld@windriver.com>
Date:   Tue Sep 8 11:13:24 2020 -0400

Add mariadb-cli, kubectl and openstack commands to collect
    
    This adds containerized related commands to collect:
    - containerized wrapper script mariadb-cli to access MariaDB mysql
    - various kubectl commands to understand resource usage
    - various containerized openstack commands to understand resource usage
    
    mariadb-cli is used to dump contents of all mariadb databases.
    
    Change-Id: I70a08e38adbba247152509a22a8b9beac9128ff9
    Closes-Bug: 1889678
    Closes-Bug: 1894103
    Signed-off-by: Jim Gauld <james.gauld@windriver.com>

commit 6573fbe80c04e7cf85c46238c508aff1f2a04454
Author: Don Penney <don.penney@windriver.com>
Date:   Wed Sep 9 15:27:51 2020 -0400

Fix missing log_error in update-iso.sh
    
    The update-iso.sh utility references an undefined log_error function.
    As a result, error messages are not properly reported, instead
    resulting in messages such as:
    stx-iso-utils.sh: line 53: log_error: command not found
    
    This error was introduced by code restructuring in a previous update:
    https://review.opendev.org/707673
    
    This update defines a log_error function for update-iso.sh that writes
    the message to stderr, as this is a user-run utility, so no need to
    log to syslog.
    
    Change-Id: Ifdc2a25c24cd86e61ccc12601a0bcbe6888ac349
    Closes-Bug: 1895042
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit f43e37b4ea5ed58a55a660cab0b76a3ce0b6532e
Author: Don Penney <don.penney@windriver.com>
Date:   Wed Sep 2 22:42:41 2020 -0400

Fix gen-bootloader-iso.sh error on unpatched controller
    
    On a system that has never been patched, the updates repository may
    not yet have a Packages directory. This update fixes the patch setup
    in gen-bootloader-iso.sh to check for this case and handle it without
    failing.
    
    This update also improves the platform-kickstarts-pxeboot check to
    look for a pxeboot kickstart, to ensure there isn't a false-positive
    match against an extracted pxe-network-installer patch, which would
    also have a /pxeboot directory.
    
    Change-Id: I8e24a3ff123f4c2649be52b1a0ce77d830e342f3
    Closes-Bug: 1893990
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit c0ab7c29c1e44bf448fcd89dbc6a8fa9e814818b
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Wed Sep 2 10:29:08 2020 -0400

Adding dcmanager-orchestrator process to patch restart script
    
    Process dcmanager-orchestrator has been added to the restart script
    to simplify no-reboot patches that updates that process.
    
    Story: 2007267
    Task: 40813
    Change-Id: I5d634f769e86ac4c79170bfd8476bd65a1ac2280
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>

commit fb70729a9e13f6b12d0a5376c637723701578721
Author: Don Penney <don.penney@windriver.com>
Date:   Mon Aug 31 14:37:59 2020 -0400

Improve DNF cache handling in gen-bootloader-iso.sh
    
    The "dnf repoquery" command is using and updating the cache for ad-hoc
    repos created when using the repofrompath option. This can result in
    gen-bootloader-iso.sh using stale cached information when looking for
    specific RPMs in the patch repo when generating bootloader ISOs for
    subcloud installs.
    
    In order to avoid such issues, this update asks DNF to mark the cache
    as expired after copying the patch repo for generating the new ISOs,
    as well as on deletion. This ensures the repoquery uses up-to-date
    information.
    
    Additionally, this update removes a duplicated patch repo setup step,
    which was not cleaned up during dev testing on the previous commit.
    
    Change-Id: I664edb346692d79cd13b23772686ea88f0aaaf9f
    Story: 2007994
    Task: 40798
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit 584b23bd1527bb2666952c6427013ce4e7728934
Author: Michel Thebeau <Michel.Thebeau@windriver.com>
Date:   Wed Aug 26 16:06:12 2020 -0400

stx-extensions: use the host's PID for coredump
    
    systemd-coredump running as root on the host wants to get process
    information using the process' PID, but %p refers to "the PID namespace
    in which the process resides" (the container).  systemd-coredump
    coredumps.  Dmesg shows "Failed to get EXE" and the segfault (sic
    'systemd-coredum').
    
    This segfault of systemd-coredump is intermittent, and so on occasion a
    core file may actually be dumped for the containerized process. This
    happens when the %p does not match a process from the host's
    perspective.  Dmesg shows "Failed to get COMM..." and "Failed to get
    EXE".  This becomes more likely as the PIDs in container's namespace
    become larger - the host's PIDs are more sparse as the numbers increase.
    
    Use %P instead, "as seen in the initial PID namespace" (host).
    
    Convert the package to use PKG_GITREVCOUNT for release increment.
    
    Closes-Bug: 1892951
    Change-Id: Ifa5017d5997d12891893fc97fac4487ddfbbbbb8
    Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>

commit a69c15f4065017bcacd1d8cefba40bd1e4c956ba
Author: Don Penney <don.penney@windriver.com>
Date:   Fri Aug 7 10:56:25 2020 -0400

Use host patches for subcloud install setup
    
    This update enhances gen-bootloader-iso.sh to use patches applied on
    the host controller when setting up the subcloud installation
    bootloader. This checks the release version from the ISO to look for
    patches for that release only, copying applied and committed patches
    to the installation setup.
    
    This update also adds cleanup of partial setup on failure.
    
    Change-Id: I4eeb0141b75e48ce53c132b89e19e07c68cc49d2
    Story: 2007994
    Task: 40631
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit 6fec9e9ec55af9c62ca03558766806710d4da0b6
Author: Carmen Rata <carmen.rata@windriver.com>
Date:   Thu Jul 16 15:08:50 2020 -0400

Verify-license plugin support
    
    Current StarlingX implementation does not support a proper
    license validation.
    Used the Stevedore plugin pattern to allow override
    of the verify-license implementation to add enhanced
    validation rules if necessary.
    
    Story: 2007403
    Task: 39644
    
    Change-Id: I6fa6626feabff06b832dd41a2778804a28956131
    Signed-off-by: Carmen Rata <carmen.rata@windriver.com>

commit 71314e4af1af0c9590908656882c7a4c68172399
Author: albailey <Al.Bailey@windriver.com>
Date:   Thu Jul 9 14:01:52 2020 -0500

Adding missing distributed cloud processes to patch restart script
    
    In order to simplify no-reboot patches that update any of these DC
    processes, they have been added to the restart script.
    
     - dcdbsync-api
     - dcorch-identity-api-proxy
     - dcmanager-audit
    
    Closes-Bug: 1887002
    Change-Id: Ida2cee5b3ca873888820e3de485e9877b64166b9
    Signed-off-by: albailey <Al.Bailey@windriver.com>

commit 7c076a390f99cb72623da7168ae64e2947a25080
Author: Eric MacDonald <eric.macdonald@windriver.com>
Date:   Mon Jul 6 11:24:10 2020 -0400

Exclude temporary subcloud install iso files from collect
    
    The collect tool is unnecessarily including temp subcloud
    install iso files and related other large tmp files.
    
    This update changes the raw 'cp ...' to 'rsync ...'
    with the --exclude option to filter out iso files from the
    the collect_dc and collect_sysinv collect scripts.
    
    Change-Id: I3575c3193a24f376dcd006c3e5015c551023c69a
    Closes-Bug: 1885778
    Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>

commit 7ed511a6a0b861573bd9224a8a2a2ff6a031f463
Author: Don Penney <don.penney@windriver.com>
Date:   Tue Jun 30 17:19:14 2020 -0400

Protect against parallel exec of gen-bootloader-iso.sh
    
    The bulk of the work done by gen-bootloader-iso.sh is not safe to be
    executed by two callers in parallel. If the initial execution is
    setting up shared files when a subsequent call starts, the subsequent
    call can end up with an incomplete set of symlinks to shared files.
    This, in turn, can result in installation failures.
    
    To protect against this, a file lock is added to gen-bootloader-iso.sh
    to prevent subsequent calls from accessing shared files before they're
    completely setup.
    
    Change-Id: Id6def4527b226b8746b2b8ff74059c0d3937e130
    Closes-Bug: 1885779
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit a10381203d920c55732109eb07aae2cf6d3345e8
Author: Jim Gauld <james.gauld@windriver.com>
Date:   Wed Jun 17 09:20:47 2020 -0400

Add kube-cpusets to collect
    
    This adds kube-cpusets tool to collect_hosts.
    This displays cpuset and numa node information per kubernetes pod
    on a given host.
    
    Change-Id: I1bd862cc0b1cef3b7997c032108c02b09f2885a1
    Story: 2006999
    Task: 40106
    Depends-On: https://review.opendev.org/723808
    Signed-off-by: Jim Gauld <james.gauld@windriver.com>

commit 12b7504a0a01fc3c74aa9f74f788422d8ecf45ee
Author: Andy Ning <andy.ning@windriver.com>
Date:   Tue May 26 10:03:32 2020 -0400

Install set keystone user option scripts
    
    This commit packs set_keystone_user_option.sh in platform-util rpm and
    install it in /usr/local/bin directory. The scripts can be used to set
    keystone user options such as "ignore_lockout_failure_attempts". It is
    currently used by keystone openstack::user::option puppet class to set
    admin user's "ignore_lockout_failure_attempts" to true to exempt it
    from auth fail lockout rule.
    
    Change-Id: I479cf2abb71fc57707d618d1cd110caf58d43394
    Closes-Bug: 1877179
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit 9064dd13a006b23395c292f31230c7d8c0bb0701
Author: Scott Little <scott.little@windriver.com>
Date:   Thu May 21 15:19:49 2020 -0400

Set PLATFORM_RELEASE (aka SW_VERSION) to 20.06.
    
    Planned release of STX 4.0 is June 2020.
    The convention for PLATFORM_RELEASE is YY.MM .
    
    Change-Id: I863a4f1d615d0476dd51a29cca6b0eb65f50d36d
    Signed-off-by: Scott Little <scott.little@windriver.com>

commit 7552f4a0f58e73a74e4506cb86674047b2d83c18
Author: Ran An <ran1.an@intel.com>
Date:   Fri May 15 01:34:37 2020 +0000

Update logmgmt to use python3
    
    This reverts commit 115332ef6b0016393e6af9a65eadc1911be2be33.
    
    Change python to python3 about script and spec
    fix log runtime errors in python3 env
    1. layer build OK
    2. logmgmt.service running and stop ok
    3. log rotate and force rotate ok
    4. logmgmt manage and delete monitored or unmonitored log ok
    5. logmgmgt.log back up ok
    
    Story: 2007106
    Task: 39288
    Depends-on: https://review.opendev.org/#/c/728323/
    Depends-on: https://review.opendev.org/#/c/728325/
    Signed-off-by: Long Li <lilong-neu@neusoft.com>
    Signed-off-by: SidneyAn <ran1.an@intel.com>
    Change-Id: I08d6396922cea2d8d4d485f3cc66f522f6579f2e

tags:

added: in-f-centos8

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-28: Fix proposed to config-files (f/centos8)

#16

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config-files/+/793634

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-02: Fix merged to config-files (f/centos8)

#17

Download full text (4.3 KiB)

Reviewed: https://review.opendev.org/c/starlingx/config-files/+/793634
Committed: https://opendev.org/starlingx/config-files/commit/03c3f68b2a1477da3dbc7d351e8bf9e2cff2acf1
Submitter: "Zuul (22348)"
Branch: f/centos8

commit e82d1b9e70dd50fbec76db7cfc51e433c5b6bf9e
Author: Jim Gauld <email address hidden>
Date: Thu May 6 12:14:39 2021 -0400

Configure io-scheduler udev rules for ETCD and HW-RAID

This configures io-scheduler udev rules for etcd and hw-raid
performance.

This sets the io-scheduler to 'cfq' tuned parameters for 'controller'
nodetype with root file-system disk.

This sets io-scheduler to 'noop' for HW-RAID Dell PowerEdge R720,
this was a missing commit from pre-starlingx.

    Partial-Bug: 1927515
    Depends-On: https://review.opendev.org/c/starlingx/utilities/+/790094
    Signed-off-by: Jim Gauld <email address hidden>
    Change-Id: Iaf1de8d962d1e8d253c72e680370666a2aed8c8e

commit efb718e03171580c43702a01f7c103e590832ab7
Author: Li Zhou <email address hidden>
Date: Tue Apr 13 04:48:46 2021 -0400

systemd: Upgrade to version 219-78.el7_9.3

Change the BuildRequires to refer to the new systemd version.

    Depends-On: https://review.opendev.org/c/starlingx/tools/+/786601
    Closes-Bug: #1924691
    Signed-off-by: Li Zhou <email address hidden>
    Change-Id: I76169b7fd85069e26cfb37de8889cea006c57238

commit 7877dbc6baec4e3214a12ac0ae44db5491a22e9d
Author: Andy Ning <email address hidden>
Date: Fri Apr 16 10:46:13 2021 -0400

Enforce "cannot reuse the last 2 passwords" password rule

    Currently the "remember" attribute in pam_pwhistory configuration
    is set to "2", which enforces "cannot reuse the last 1 passwords"
    in history instead of "cannot reuse the last 2 passwords" stated
    in security document.

This update changed "remember" attribute to "3" so that the rule
complies with the document.

    Closes-Bug: 1924772
    Signed-off-by: Andy Ning <email address hidden>
    Change-Id: I340152f8b8a572bc1e86f1eb4a14eb8e392f6334

commit e87383f6c328efeab2a9407daa33076a85739b96
Author: Eric MacDonald <email address hidden>
Date: Tue Apr 6 08:44:26 2021 -0400

Comment out 'dateext' setting in logrotate.conf file

This update comments out the 'dateext' setting to avoid
log files being rotated with date as a default.

Test Plan:

    PASS: Verify log rotation config files that don't
          specifically set dateext option are rotated
          by number rather than date.
    PASS: Verify system install

    Partial-Bug: 1918979
    Signed-off-by: Eric MacDonald <email address hidden>
    Change-Id: Ib68d86d1ec3f15abedce4c4059c3a8ec34b7d196

commit 35160afbdada2efe0ff567dd94ca1419903c87ad
Author: Nicolas Alvarez <email address hidden>
Date: Tue Dec 8 17:26:18 2020 -0300

Disable SNMP Host-Based from config-files repo.

Due to SNMP is going to be containerized, we disable
it from starlingx/config-files repo.

    Story: 2008132
    Task: 41381
    Depends-On: https://review.opendev.org/765381
    Signed-off-by: Nicolas Alvarez <nic...

Reviewed:  https://review.opendev.org/c/starlingx/config-files/+/793634
Committed: https://opendev.org/starlingx/config-files/commit/03c3f68b2a1477da3dbc7d351e8bf9e2cff2acf1
Submitter: "Zuul (22348)"
Branch:    f/centos8

commit e82d1b9e70dd50fbec76db7cfc51e433c5b6bf9e
Author: Jim Gauld <james.gauld@windriver.com>
Date:   Thu May 6 12:14:39 2021 -0400

Configure io-scheduler udev rules for ETCD and HW-RAID
    
    This configures io-scheduler udev rules for etcd and hw-raid
    performance.
    
    This sets the io-scheduler to 'cfq' tuned parameters for 'controller'
    nodetype with root file-system disk.
    
    This sets io-scheduler to 'noop' for HW-RAID Dell PowerEdge R720,
    this was a missing commit from pre-starlingx.
    
    Partial-Bug: 1927515
    Depends-On: https://review.opendev.org/c/starlingx/utilities/+/790094
    Signed-off-by: Jim Gauld <james.gauld@windriver.com>
    Change-Id: Iaf1de8d962d1e8d253c72e680370666a2aed8c8e

commit efb718e03171580c43702a01f7c103e590832ab7
Author: Li Zhou <li.zhou@windriver.com>
Date:   Tue Apr 13 04:48:46 2021 -0400

systemd: Upgrade to version 219-78.el7_9.3
    
    Change the BuildRequires to refer to the new systemd version.
    
    Depends-On: https://review.opendev.org/c/starlingx/tools/+/786601
    Closes-Bug: #1924691
    Signed-off-by: Li Zhou <li.zhou@windriver.com>
    Change-Id: I76169b7fd85069e26cfb37de8889cea006c57238

commit 7877dbc6baec4e3214a12ac0ae44db5491a22e9d
Author: Andy Ning <andy.ning@windriver.com>
Date:   Fri Apr 16 10:46:13 2021 -0400

Enforce "cannot reuse the last 2 passwords" password rule
    
    Currently the "remember" attribute in pam_pwhistory configuration
    is set to "2", which enforces "cannot reuse the last 1 passwords"
    in history instead of "cannot reuse the last 2 passwords" stated
    in security document.
    
    This update changed "remember" attribute to "3" so that the rule
    complies with the document.
    
    Closes-Bug: 1924772
    Signed-off-by: Andy Ning <andy.ning@windriver.com>
    Change-Id: I340152f8b8a572bc1e86f1eb4a14eb8e392f6334

commit e87383f6c328efeab2a9407daa33076a85739b96
Author: Eric MacDonald <eric.macdonald@windriver.com>
Date:   Tue Apr 6 08:44:26 2021 -0400

Comment out 'dateext' setting in logrotate.conf file
    
    This update comments out the 'dateext' setting to avoid
    log files being rotated with date as a default.
    
    Test Plan:
    
    PASS: Verify log rotation config files that don't
          specifically set dateext option are rotated
          by number rather than date.
    PASS: Verify system install
    
    Partial-Bug: 1918979
    Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
    Change-Id: Ib68d86d1ec3f15abedce4c4059c3a8ec34b7d196

commit 35160afbdada2efe0ff567dd94ca1419903c87ad
Author: Nicolas Alvarez <nicolas.alvarez@windriver.com>
Date:   Tue Dec 8 17:26:18 2020 -0300

Disable SNMP Host-Based from config-files repo.
    
    Due to SNMP is going to be containerized, we disable
    it from starlingx/config-files repo.
    
    Story: 2008132
    Task: 41381
    Depends-On: https://review.opendev.org/765381
    Signed-off-by: Nicolas Alvarez <nicolas.alvarez@windriver.com>
    
    Change-Id: I9da693f3389e16acff8f87866ab291376bf76f6c

commit 826a2b36b7902e2d92fdd75a165780f1a97bc2cb
Author: Don Penney <don.penney@windriver.com>
Date:   Thu Dec 17 13:27:31 2020 -0500

Add auto-version for remaining stx/config-files packages
    
    Update remaining StarlingX packages with hardcoded TIS_PATCH_VER to
    use PKG_GITREVCOUNT where possible, with offsets as needed to ensure
    the version is incremented above the hardcoded version.
    
    Change-Id: I70d0ee9dcff9dc3f6365a63d4926bc2a1b6f628b
    Story: 2008455
    Task: 41449
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit fb3d9b3dafb0d2a5b543a575406717ad68a457ca
Author: Carmen Rata <carmen.rata@windriver.com>
Date:   Mon Dec 14 15:41:48 2020 -0500

Fix IP Config security violations - phase 2
    
    Updated IPv4 and IPv6 config settings in "/etc/sysctl.conf"
    to resolve openscap security violations as part of phase 2
    fixes.
    The fixes have been validated by successful installs in
    IPv4 AIO-SX and IPv6 Standard configurations.
    
    Story: 2008037
    Task: 41431
    
    Change-Id: I9c4f01c5a5ba135c38d0e3c61618055cdfad18f9
    Signed-off-by: Carmen Rata <carmen.rata@windriver.com>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-03: Fix merged to stx-puppet (f/centos8)

#18

Download full text (48.0 KiB)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/792029
Committed: https://opendev.org/starlingx/stx-puppet/commit/2b026190a3cb6d561b6ec4a46dfb3add67f1fa69
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 3e3940824dfb830ebd39fd93265b983c6a22fc51
Author: Dan Voiculeasa <email address hidden>
Date: Thu May 13 18:03:45 2021 +0300

Enable kubelet support for pod pid limit

Enable limiting the number of pids inside of pods.

    Add a default value to protect against a missing value.
    Default to 750 pids limit to align with service parameter default
    value for most resource consuming StarlingX optional app (openstack).
    In fact any value above service parameter minimum value is good for the
    default.

    Closes-Bug: 1928353
    Signed-off-by: Dan Voiculeasa <email address hidden>
    Change-Id: I10c1684fe3145e0a46b011f8e87f7a23557ddd4a

commit 0c16d288fbc483103b7ba5dad7782e97f59f4e17
Author: Jessica Castelino <email address hidden>
Date: Tue May 11 10:21:57 2021 -0400

Safe restart of the etcd SM service in etcd upgrade runtime class

    While upgrading the central cloud of a DC system, activation failed
    because there was an unexpected SWACT to controller-1. This was due
    to the etcd upgrade script. Part of this script runs the etcd
    manifest. This triggers a reload/restart of the etcd service. As this
    is done outside of the sm, sm saw the process failure and triggered
    the SWACT.

    This commit modifies platform::etcd::upgrade::runtime puppet class
    to do a safe restart of the etcd SM service and thus, solve the
    issue.

    Change-Id: I3381b6976114c77ee96028d7d96a00302ad865ec
    Signed-off-by: Jessica Castelino <email address hidden>
    Closes-Bug: 1928135

commit eec3008f600aeeb69a42338ed44332228a862d11
Author: Mihnea Saracin <email address hidden>
Date: Mon May 10 13:09:52 2021 +0300

Serialize updates to global_filter in the AIO manifest

    Right now, looking at the aio manifest:
    https://review.opendev.org/c/starlingx/stx-puppet/+/780600/15/puppet-manifests/src/manifests/aio.pp
    there are 3 classes that update
    in parallel the lvm global_filter:
    - include ::platform::lvm::controller
    - include ::platform::worker::storage
    - include ::platform::lvm::compute
    And this generates some errors.

We fix this by adding dependencies between the above classes
in order to update the global_filter in a serial mode.

    Closes-Bug: 1927762
    Signed-off-by: Mihnea Saracin <email address hidden>
    Change-Id: If6971e520454cdef41138b2f29998c036d8307ff

commit 97371409b9b2ae3f0db6a6a0acaeabd74927160e
Author: Steven Webster <email address hidden>
Date: Fri May 7 15:33:43 2021 -0400

Add SR-IOV rate-limit dependency

    Currently, the binding of an SR-IOV virtual function (VF) to a
    driver has a dependency on platform::networking. This is needed
    to ensure that SR-IOV is enabled (VFs created) before actually
    doing the bind.

This dependency does not exist for configuring the VF rate-limits
however. There is a cha...

Reviewed:  https://review.opendev.org/c/starlingx/stx-puppet/+/792029
Committed: https://opendev.org/starlingx/stx-puppet/commit/2b026190a3cb6d561b6ec4a46dfb3add67f1fa69
Submitter: "Zuul (22348)"
Branch:    f/centos8

commit 3e3940824dfb830ebd39fd93265b983c6a22fc51
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Thu May 13 18:03:45 2021 +0300

Enable kubelet support for pod pid limit
    
    Enable limiting the number of pids inside of pods.
    
    Add a default value to protect against a missing value.
    Default to 750 pids limit to align with service parameter default
    value for most resource consuming StarlingX optional app (openstack).
    In fact any value above service parameter minimum value is good for the
    default.
    
    Closes-Bug: 1928353
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I10c1684fe3145e0a46b011f8e87f7a23557ddd4a

commit 0c16d288fbc483103b7ba5dad7782e97f59f4e17
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Tue May 11 10:21:57 2021 -0400

Safe restart of the etcd SM service in etcd upgrade runtime class
    
    While upgrading the central cloud of a DC system, activation failed
    because there was an unexpected SWACT to controller-1. This was due
    to the etcd upgrade script. Part of this script runs the etcd
    manifest. This triggers a reload/restart of the etcd service. As this
    is done outside of the sm, sm saw the process failure and triggered
    the SWACT.
    
    This commit modifies platform::etcd::upgrade::runtime puppet class
    to do a safe restart of the etcd SM service and thus, solve the
    issue.
    
    Change-Id: I3381b6976114c77ee96028d7d96a00302ad865ec
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>
    Closes-Bug: 1928135

commit eec3008f600aeeb69a42338ed44332228a862d11
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Mon May 10 13:09:52 2021 +0300

Serialize updates to global_filter in the AIO manifest
    
    Right now, looking at the aio manifest:
    https://review.opendev.org/c/starlingx/stx-puppet/+/780600/15/puppet-manifests/src/manifests/aio.pp
    there are 3 classes that update
    in parallel the lvm global_filter:
    - include ::platform::lvm::controller
    - include ::platform::worker::storage
    - include ::platform::lvm::compute
    And this generates some errors.
    
    We fix this by adding dependencies between the above classes
    in order to update the global_filter in a serial mode.
    
    Closes-Bug: 1927762
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>
    Change-Id: If6971e520454cdef41138b2f29998c036d8307ff

commit 97371409b9b2ae3f0db6a6a0acaeabd74927160e
Author: Steven Webster <steven.webster@windriver.com>
Date:   Fri May 7 15:33:43 2021 -0400

Add SR-IOV rate-limit dependency
    
    Currently, the binding of an SR-IOV virtual function (VF) to a
    driver has a dependency on platform::networking.  This is needed
    to ensure that SR-IOV is enabled (VFs created) before actually
    doing the bind.
    
    This dependency does not exist for configuring the VF rate-limits
    however.  There is a chance that the VF rate-limiting configuration
    happens before the VFs are actually created.
    
    This commit fixes the issue by creating a dependency on
    platform::networking from the sriov::config class, which ensures
    the VFs are created before both driver binding and rate
    limiting configuration occurs.
    
    Closes-Bug: #1927758
    Signed-off-by: Steven Webster <steven.webster@windriver.com>
    Change-Id: Ic452247eb8c980e1b18bdc54832eb635d7a9fc54

commit 0b429c7cb0c16e34755c1b1e146ebb8b006d44dc
Author: Jim Gauld <james.gauld@windriver.com>
Date:   Thu May 6 12:33:24 2021 -0400

Configure etcd service critical process nice and ionice
    
    The etcd server is a critical "interactive" process that requires
    low-latency. This process has many etcd threads, each worker does
    minimal work and wakes up frequently. The threads do small amount of
    writes to commit.
    
    The etcd server will start exceeding heartbeat interval of 100ms and
    the election timeout of 1000ms under load and independent disk stress,
    if not properly tuned as a critical process. This cascades into many
    failures.
    
    This requires io-scheduler 'cfq' to take advantage of io-nice policy
    and priority. This bumps up to best-effort/0 from best-effort/4.
    
    This sets nice -19 from nice 0. This helps tremendously with
    interactive processes for linux CFS (completely-fair-scheduler).
    
    With tuned settings, under application load and additional disk stress,
    we see a dramatic reduction of 'blocked_max' and no more kern.log
    etcdserver related errors for exceeding the timeouts.
    We see dramatic improvement to system responsiveness for kubectl,
    kube-apiserver. This prevents pods from failing when clients they
    cannot renew lease.
    
    Note that 'blocked_max' scheduler stats for this process represents
    involuntary wait for disk related delay, scheduling delay, etc.
    
    Testing coverage:
    - various root disk HW: RAID, NVMe, SSD, VBox
    - sanity on multiple labs: R730_1 with RAID, WFP13_14
    
    Configuration change used in testing:
    - baseline: deadline, best-effort/4,
    - system under test: cfq, best-effort/0, nice -19
    - dd stress was single writer to root disk:
      while true; do
        dd if=/dev/zero of=./test.dd bs=200K count=20000 conv=fsync
      done
    
    Compared results and observe system behaviour:
    - watch kern.log for etcserver 'took too long', and 'wal: sync'
    - watch fm alarms
    - watch kubectl pod status
    - observe performance with: iotop, schedtop, iostat
    
    Tests performed:
    - DRBD resync with and without dd writer stress
    - swact with and without dd stress
    - large application apply + dd writer stress
    - launch large number of pods (eg, scale nginx with 80 pods),
      watch systemctl status commands using strace to check for hang
    - copy very large files, create big tarballs, write mkisofs iso
    - host install
    
    Closes-Bug: 1927515
    Depends-On: https://review.opendev.org/c/starlingx/config-files/+/790098
    Signed-off-by: Jim Gauld <james.gauld@windriver.com>
    Change-Id: Ieeeba5c1375d8d99401f839c7409a9de356fda87

commit 9782bb104c07b4aed0876d88d1743d4816a34515
Author: Don Penney <don.penney@windriver.com>
Date:   Fri May 7 08:51:19 2021 -0400

Update dnsmasq.conf for UEFI pxeboot
    
    Due to recent grub2 update for CVE-2020-15705, pxeboot must use the
    shim.efi file for secure boot, rather than grubx64.efi directly.
    
    Change-Id: I864ff46f449e92dfd5f1667379bc56aaaf6dfe2c
    Closes-Bug: 1927730
    Depends-On: https://review.opendev.org/c/starlingx/metal/+/790253
    Depends-On: https://review.opendev.org/c/starlingx/integ/+/790254
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit c120fb798091db9fb756e51b895dccfa8d80a947
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Wed May 5 17:30:19 2021 -0400

AIO-SX reboots after change OAM ip address
    
    On HW tests, it was detected that openstack-endpoints restart was
    happening at the same as the service-manager restart, this creating
    a conflict that preventing SM services to reach enabled-active.
    This was provoking the reboot.
    
    The correction creates openstack::keystone::endpoint::runtime::post
    class to be executed the post stage and not on the main stage, to
    avoid conflict with service-manager
    
    Also marking platform::network::runtime to be run at the pre stage
    to avoid some encountered apply errors related to the delay of
    haproxy bringup due to the lack of the IP address on the interface
    as it was only configured later. This way the other restarted
    services will have the address on the interface as restart happens
    
    Tested on AIO-SX, by monitoring manifest apply and validating that
    no reboot happens
    
    Closes-Bug: 1927275
    
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
    Change-Id: Ia70a3395753e43b3c1e2c037818c8c23e4ec0fd6

commit cb7858c65982c250f07a5022719d4f2b6d547d64
Author: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>
Date:   Wed May 5 11:11:27 2021 -0300

Fix for failure during AIO-SX to AIO-DX migration on standalone system
    
    Fix drbd-cephmon mount error by manually remounting monitor DRBD after
    DRBD::Resource creation. Removed patching of Kubernetes Persistent
    Volumes from puppet manifest since Kubelet and kube-api are no longer
    available during puppet run.
    
    Partial-Bug: 1927224
    Signed-off-by: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>
    Change-Id: Id5565ac734499b617b470499cfc2aa1ae2972da3

commit 5695a29e6a5ed8ee5d211e937496384027d7fd4e
Author: Bin Qian <bin.qian@windriver.com>
Date:   Thu Apr 29 13:35:38 2021 -0400

Fix missing kubelet service enable for worker nodes
    
    Previous commit:
      https://review.opendev.org/c/starlingx/stx-puppet/+/780600/
    kubelet enable is skipped for the worker nodes.
    
    Change-Id: I7769aebb4a9e38404af0c883640e1a27bb1e9e84
    Closes-Bug: 1918139
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit 94ec35ff2d5363d3816f6d267a77a4efba6c6aa8
Author: Zhixiong Chi <zhixiong.chi@windriver.com>
Date:   Wed Apr 14 23:28:03 2021 -0400

Increase min_free_kbytes to 256M for storage to avoid OOM issue
    
    Help to prevent the OOM issue that it failed to allocate memory
    with error message 'page allocation failure: order:2, mode:0x104020'
    
    As the min_free_kbytes in the linux documentation shows:
    This is used to force the Linux VM to keep a minimum number
    of kilobytes free.  The VM uses this number to compute a
    watermark[WMARK_MIN] value for each lowmem zone in the system.
    Each lowmem zone gets a number of reserved free pages based
    proportionally on its size.
    
    Keeping more memory free in those zones means that the os itself is
    less likely to run out of memory during high memory pressure and high
    allocation events.
    
    Based on the issue occurs on the storage node so far, we only update
    the value for the storage node.
    
    Closes-Bug: #1924209
    
    Change-Id: Iae2e5a0787f69c62ba5da53663371fd2be148e15
    Signed-off-by: Zhixiong Chi <zhixiong.chi@windriver.com>

commit 736199af4106378b86b4cdca784105fe2cd8ed05
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Wed Apr 28 14:50:21 2021 -0400

On runtime, kube-sriov-device-plugin needs to be restarted
    
    The previous correction for bug 1918139 removed the sriov plugin
    restart necessary during runtime, done during the interface sriov
    assign to a datanetwork (allowed on an unlocked AIO-SX). Without
    it, the pod creation will not be able to use a datanetwork created
    on runtime.
    
    The correction bring back the platform::kubernetes::worker::sriovdp
    class to be used only on runtime
    
    Closes-Bug: 1918139
    
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
    Change-Id: Ied19bf3138b58b279b350d067ae0c1080e220f31

commit 69b9809465b5e7a837917cce7d0a731ddf257f0d
Author: Steven Webster <steven.webster@windriver.com>
Date:   Tue Apr 27 17:54:24 2021 -0400

Fix interface (re)configuration for single-nic system
    
    Currently, the apply-network-config manifest step launches a script
    that detects differences between puppet's view of what the
    ifcfg-* network scripts should be and what the value
    of the ifcfg files actually are in the /etc/sysconfig/network-scripts/
    directory.
    
    If there are differences, the puppet representation of the interface
    configuration is copied to the system network-scripts directory and
    the interface is brought down and up to apply the config.
    If there are no changes between the puppet view and the system view,
    the interface is left alone.
    
    An issue can occur in a single-nic system comprising a physical
    lower ethernet interface configured for SR-IOV with upper vlan
    interfaces (oam, mgmt, etc).  If the lower interface is
    re-configured, it is subsequently brought down/up to apply
    the changes.  This causes the upper vlan interfaces to also
    be brought down by the kernel.  In the case of an IPv6 system,
    the interfaces will lose their addresses as well as any configured
    default route.  In the case of an IPv4 system, the default route
    will be wiped out, which could cause issues in a distributed cloud
    environment.
    
    This commit addresses the issue by detecting whether any lower
    interface associated with a vlan interface has been marked for
    re-configuration.  If this is the case, the vlan interface is
    also added to the up/down list to cause it to re-apply the
    existing static configuration (if it is not already in the list).
    
    Closes-Bug: 1926366
    Signed-off-by: Steven Webster <steven.webster@windriver.com>
    Change-Id: I40177900ef58a9619fecb34ceffc412f31d1a965

commit 139ba4aa6c143e495b8b7136b359254ceb3ba296
Author: Bin Qian <bin.qian@windriver.com>
Date:   Mon Apr 26 14:59:51 2021 -0400

Reset N3000 fpgas only when it exists
    
    Remove calling reset n3000 fpga before detecting h/w exists.
    
    Closes-Bug: 1918139
    Change-Id: I81b7fbc9500fac7e86424537551c1e9aac7492ec
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit e6b1ae7d222f83625110d80a576b95f88f5ed04a
Author: Charles Short <charles.short@windriver.com>
Date:   Mon Apr 26 11:16:00 2021 -0400

Fix zuul errors due to changes in dependencies
    
    Pin hacking to < 4.0.1 to fix zuul gate issues.
    
    Test:
    Ran tox -e pep8 command to validate the pep8 job and result.
    
    Related-Bug: 1926172
    
    Signed-off-by: Charles Short <charles.short@windriver.com>
    Change-Id: Ia85b584d7ff4e5e7cb19a820d6f6323aa672f52e

commit 16f0b0cc66b23a9e74005a9cd9379de6a2d78234
Author: Yuxing Jiang <yuxing.jiang@windriver.com>
Date:   Fri Apr 23 10:02:35 2021 -0400

Rename the dnsmasq runtime class
    
    As the platform::dns:runtime class only referencing the resource of
    dnsmasq, this commit renames it as platform::dns::dnsmasq::runtime in
    order to indicate its function clearly.
    
    Story: 2008774
    Task: 42365
    
    Change-Id: I79dd23bf64abfd63906daa59ec59c4496dedda31
    Signed-off-by: Yuxing Jiang <yuxing.jiang@windriver.com>

commit 70971df9f35886f5ece04c82bfccee105d3d0861
Author: Bin Qian <bin.qian@windriver.com>
Date:   Tue Mar 30 15:58:15 2021 -0400

AIO manifest to start kubernetes once
    
    This change is to avoid restarting kubernetes.
    Also calling sysinv-reset-n3000-fpgas to reset N3000 FPGAS
    on host start up.
    
    Depends-On: https://review.opendev.org/c/starlingx/config/+/785683
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/780600
    Change-Id: I4a27840820fd45ad86cef4dfce6ea0389e583f68
    Partial-Bug: 1918139
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit f4694f8a30f1e5cbe0f7d354f95949a1601eb1e1
Author: Bin Qian <bin.qian@windriver.com>
Date:   Mon Feb 8 13:00:38 2021 -0500

Single puppet for AIO controllers
    
    This change includes:
    1. create aio.pp for AIO controller nodes
    2. execute aio.pp for nodes with subfunctions of 'controller,worker'
    3. remove sriov device plugin restart code as now kubelet starts
       after related config are applied.
    
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/784761
    Change-Id: I54b90a76454c6c545bf2891b81225bbf2ba15b03
    Partial-Bug: 1918139
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit accc39cefe9f54efa656b99bb3fad949ba030367
Author: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>
Date:   Sun Mar 7 19:38:23 2021 -0300

DRBD replication, rebuilding monitor and PVCs during migration to AIO-DX
    
    Given the system capability "simplex_to_duplex_migration" exists on the
    system to indicate that it is going through a migration from AIO-SX
    to AIO-DX, this commit will during the unlock process, create a DRBD
    replicated filesystem for the floating monitor, rebuild the monitor
    store.db from the existing Ceph OSDs on the system, recover the
    previously existing cephfs filesystems, updates the ceph crushmap
    and updates the Ceph monitor IP on existing PersistentVolume resources.
    
    Story: 2008587
    Task: 42078
    
    Signed-off-by: Pedro Linhares <PedroHenriqueLinhares.Silva@windriver.com>
    Change-Id: Iba6ec8bf812c9623724c357455a370d79ffd7b60
    Signed-off-by: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>

commit 569b457592d3f3c95aba72f5f52108316842b6fe
Author: Bin Qian <bin.qian@windriver.com>
Date:   Wed Apr 14 14:54:40 2021 -0400

Generate admin ep cert on subcloud controllers in puppet
    
    Enabled admin endpoint cert to be generated in manifest directly
    from k8s secret data (via secure hieradata). This operation is
    consistant to the system controller as well as admin endpoint cert
    renewal.
    
    Partial-Bug: 1923510
    
    Change-Id: I442f3c2c97cf83588aefa8b4fe808834a31fdcc5
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit ffddc103ca66f87fb96ae02e9cfbb656d39f38ab
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Thu Apr 15 09:59:55 2021 -0400

OAM IP change needs double lock/unlock controllers for IPV6 system
    
    Added IPv6 address fields on the list used to detect if the interface
    have changed on apply_network_config.sh. Without it was only copying
    the interface config file from /var/run/network-scripts.puppet/ to
    /etc/sysconfig/network-scripts/ which explains why it was working
    on the second reboot.
    
    Tested on:
    AIO-DX
    AIO-SX
    
    Closes-Bug: 1895555
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
    Change-Id: I25e60a04b4aec38c254ff3e3a7b2f0d80ce5daaf

commit f46c154188b5d90bdd19ba2a5952b4f8c565d5d3
Author: Jim Somerville <Jim.Somerville@windriver.com>
Date:   Wed Apr 14 17:13:59 2021 -0400

kdump config remove intel eth drivers from ramdisk
    
    Problem:
    On a kernel crash, such as the watchdog timer firing, kexec
    tries booting the crash recovery kernel in order to capture
    a vmcore so that the issue can be debugged. This normally
    succeeds unless the platform has ice network hardware. Why?
    Because the crash recovery kernel has only a small amount of
    memory set aside for it, and the ice driver allocates enough
    memory to cause memory exhaustion.  This causes the crash
    recovery kernel's startup to fail, leading to complete platform
    hang.  In order to break out of the hang, one needs to manually
    do a hardware reset or power cycle.
    
    Solution:
    Change kdump.conf to leave the ice driver module out of the
    initramfs that is used by the crash recovery kernel.  In
    fact, leave all of the intel ethernet drivers out since they
    are not needed and increase the risk of memory exhaustion.
    Upon changing kdump.conf, the kdump service is restarted to
    regenerate the initramfs.
    
    Verification:
    Install, check the kdump.conf file and unpack the initramfs file
    making sure that those modules are gone.  Check controller,
    worker, and storage node types.  Reboot node, make sure things
    behave as expected ie. no extra kdump.conf mangling and no
    unexpected kdump service restarts.
    Also crash a node with intel ethernet hardware on it and make
    sure it comes back up with a vmcore left in /var/log/crash.
    
    Change-Id: I9112f722cee8e199d94393bca887d3bb9bb89b39
    Closes-Bug: 1923879
    Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com>

commit f21842a2c46c656086234b9b006c224a41485acb
Author: Yuxing Jiang <yuxing.jiang@windriver.com>
Date:   Mon Apr 12 17:06:33 2021 -0400

Creates a LDAP client runtime class
    
    This commit creates a wrapper class platform::ldap::client::runtime to
    update the LDAP client in runtime.
    
    Tested with apply this class in runtime to update the LDAP server URI.
    
    Change-Id: Ia3e40617c9e628deeca839734bd3a3b41431f336
    Story: 2008774
    Task: 42248
    Signed-off-by: Yuxing Jiang <yuxing.jiang@windriver.com>

commit 2a80652598f399995edfc434f1aa0154f1b8299c
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Tue Mar 30 13:46:11 2021 -0400

Applying SRIOV VF configuration at runtime
    
    During runtime, if a user converts a non pci-sriov classed interface
    to a pci-sriov classed interface with type 'ethernet', or creates an
    SR-IOV interface of type 'VF', logic is implemented to enable and
    configure the interface'
    
    Story: 2008531
    Task: 42203
    Change-Id: I0edb4abf2cea6dc29b9485fa09d1fecab4b76c65
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>

commit 28ef813cda9fd0191d8cee9c1f2bd80d64175f6f
Author: Don Penney <don.penney@windriver.com>
Date:   Tue Mar 30 18:10:02 2021 -0400

Add aggregate to DX service group reprovisioning
    
    The following prior update added service group reprovisioning on DX
    nodes, but was missing the aggregate option necessary to ensure
    certain groups were active on the same controller:
    https://review.opendev.org/c/starlingx/stx-puppet/+/773277
    
    As a result, failures during swact could lead to these groups being
    assigned to different nodes, causing other failures in the system.
    
    This update adds the missing aggregate option.
    
    Partial-Bug: 1893669
    Signed-off-by: Don Penney <don.penney@windriver.com>
    Change-Id: I063d1549aa456bd4bb68c4c69c50dbc078ae7be0

commit 6a4907694c386aa6e85b0c51ac1963903f9092c8
Author: Robert Church <robert.church@windriver.com>
Date:   Sat Oct 24 03:30:47 2020 -0400

Add support for setting optional k8s cpu configuration flag
    
    If the host-label 'kube-ignore-isol-cpus=enabled' is added to a host,
    then file '/etc/kubernetes/ignore_isolcpus' will be created for kubelet
    to consume so that it can determine how to handle application-isolated
    CPUs.
    
    Story: 2008760
    Task: 42166
    Signed-off-by: Robert Church <robert.church@windriver.com>
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
    Change-Id: Ifbcc245d0e2716b7abb7726d38d3662e7b53d770

commit 5927be3eed92dee4192bf76af04b171c9758bfd5
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Thu Mar 11 05:59:50 2021 -0500

Added classes to restart service manager and vim-webserver
    
    For service manager, created a class to stop, modify the OAM IP and
    restart it. In the case of vim-webserver, a new class to only restart the
    service during runtime
    
    Story: 2008531
    Task: 42061
    Change-Id: I7846c5ab3f1f8d0adb741356164a20932f9ed25f
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>

commit e1552be5bcd4f32ae5d9c30a4158ca98368005a6
Author: Robert Church <robert.church@windriver.com>
Date:   Thu Mar 11 01:30:24 2021 -0500

Enabling Ceph MDS as part of adding Ceph at runtime
    
    A metadata server is assigned to every node that has a monitor.
    
    Restructure the metadataserver class to ensure that the metadata server:
     - is started after the Ceph monitor and the Ceph manager on controllers
     - is started after the Ceph monitor on a worker assigned a monitor
    
    If the metadata server is started prior to the monitor, it will not
    start properly.
    
    Future optimization may be to create a MDS SM service on the
    controllers, but based on current testing, it seems unnecessary.
    
    Tested:
     - Adding Ceph pre-controller-0 unlock
       - AIO-SX, AIO-DX, Standard 2+2, Storage 2+2+2
     - Adding Ceph at runtime after installed nodes are fully provisioned.
       - AIO-SX, AIO-DX, Standard 2+2
     - For all the above configs also added storage tiers and confirmed
       proper functionality
     - NOTE: No Ceph runtime option for labs with storage node
       configuration.
    
    Change-Id: I27b53b55738d0aec70db6a9e4004c920029869fa
    Closes-Bug: #1919276
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit 1aef5b8968d8ce28c4fbc42a5160f77a8ebff642
Author: Robert Church <robert.church@windriver.com>
Date:   Thu Mar 11 01:29:04 2021 -0500

Re-enable adding bare-metal Ceph storage backend at runtime
    
    Adding the bare-metal Ceph storage backend at runtime fails as
    $::platform::rook::params::service_enabled can not be resolved.
    
    This update explicitly includes the class to allow resolution and enable
    the Ceph storage backend to be added.
    
    Change-Id: I1bd12910784387c2a2d37a29d2f299e3cebb8cd2
    Closes-Bug: #1919274
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit a79bc74c31350fc05cf16d89b1c2cdf35af5ef5f
Author: Carmen Rata <carmen.rata@windriver.com>
Date:   Fri Mar 12 13:47:19 2021 -0500

Cleanup config.toml.erb of MARK_* comments
    
    The comments that contain strings "MARK_BEGIN" and "MARK_END"
    are not used anymore in ansible bookstrap and they need
    to be cleaned up from config.toml.erb template.
    
    Change-Id: Id53bc58d2624581b6e50ead1c77a5cd424631ae5
    Closes-Bug: 1892768
    Depends-On: https://review.opendev.org/c/starlingx/ansible-playbooks/+/779047
    Signed-off-by: Carmen Rata <carmen.rata@windriver.com>

commit 13ba2d4a7e6f9337eda22d98df872cb40ec983ac
Author: John Kung <john.kung@windriver.com>
Date:   Sun Feb 28 10:38:54 2021 -0600

puppet manifest apply check hieradata rsync
    
    Update the puppet manifest apply to check whether the hieradata
    has been rsync successfully.  Check return value and, in certain
    cases reattempt, before continuing.  This is needed because
    the hieradata is actually generated by the controller node,
    and this script may be running on another host.
    
    It has been observed that there are instances on worker host
    whereby some of the hieradata is missing (e.g. missing
    system.yaml openstack_host in puppet.log).
    
    Verified:
        install, deployment and sanity on multinode and AIO
        backup and restore with Ceph
        platform upgrade
    
    Change-Id: I9e7a0a02dd28c06d914fafe8234f4fee5e05247c
    Closes-Bug: 1917229
    Signed-off-by: John Kung <john.kung@windriver.com>

commit fab4ea75c03d96ece44f441725f7f385202e737c
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Fri Feb 26 16:23:35 2021 -0500

Increase haproxy timeout for patching
    
    Some patching operations can take a significant amount of time.
    Thus, in this commit the haproxy timeouts for
    patching-restapi-admin-internal and patching-restapi-internal
    are updated to be 600s.
    
    Change-Id: I1b73793c2963be2d1e40634ed6f85d747c6d6985
    Story: 2007267
    Task: 41944
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>

commit 8300408337a5051aba0c7106add4f0068ba7d461
Author: Babak Sarashki <babak.sarashki@windriver.com>
Date:   Wed Feb 17 15:56:40 2021 +0000

Add container runtime interface (CRI) placeholder to config.toml
    
    This commit extends containerd config.toml template file  to include
    placeholder for custom CRI entries. The custom CRI entries can be
    specified via service-parameter method (Change-Id: Icc5fd16 stx/config).
    
    Story: 2008434
    Task: 41389
    
    Depends-On: https://review.opendev.org/c/starlingx/config/+/776220
    
    Signed-off-by: Babak Sarashki <babak.sarashki@windriver.com>
    Change-Id: Ib1dd5bd2fbb5e386cf06ab4161226c3bf6f107ac

commit a8cf39d9d37d51869503cfa4d239faa4ced7e67f
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Thu Feb 11 13:23:23 2021 -0500

Device image repository
    
    The device images are stored in the drbd filesystem
    (/opt/platform/device_images) in the active controller.
    In order to allow the other worker hosts to retrieve the device images
    from the active controller over lighttpd, the directory
    /www/pages/device_images is created as a bind mount of the drbd
    directory. This mount resource is managed by SM.
    The 'device_images' is added to the lighttpd static content list.
    
    Tests performed on the following systems:
    AIO-DX, AIO-DX plus compute, Standard 2+1
    DC with AIO-DX plus subcloud
    DC with Standard subcloud
    
    Story: 2007875
    Task: 41877
    Depends-On: https://review.opendev.org/c/starlingx/ha/+/776489
    
    Change-Id: I4e7686ece49546d7ef84f5724370167afaf21375
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit 2e92d0ec7e3e39b69ac4838d54f5ec8e0ad752bc
Author: Litao Gao <litao.gao@windriver.com>
Date:   Thu Feb 18 07:59:17 2021 -0500

Add retry to tolerate 'ip link set' failure
    
    If 'ip link set' is executed too fast for X710, it is possible
    that some of them fails with 'Resource temporarily unavailable'.
    Add retry in puppet Exec resource to tolerate this failure case.
    
    Story: 2008470
    Task: 41936
    
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: Ib80ea77d36a0b0f63d3db2015dadb3911c56d1e9

commit 598427b294ee25fa817fb2de5e56bb18825c984e
Author: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
Date:   Thu Feb 25 07:49:26 2021 -0500

Increase timeout for sriovdp deletion
    
    In an AIO-SX, pods get launched by kubelet soon after puppet is done
    with controller's manifest but is still working with worker's manifest.
    When pods are several the concurrency may lead the SRIOV device plugin
    to not be deleted (then restarted) in the expected time frame.
    The final solution for this problem is on the way, by refactoring the
    current AIO to orchestrate between pods bring-up and worker setup. In
    the meantime, the workaround solution in this change is the increase
    of the original timeout for deletion of the SRIOV device plugin.
    
    Closes-Bug: 1916620
    Implements: increased timeout value to manage sriovdp in kubernetes.pp
    Signed-off-by: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
    Change-Id: I0f6fb20a0ed5086fc80794b35715eea8d3d74cb8

commit 98601d637cb8f421a0fdccb2acb63339309d0dbe
Author: albailey <Al.Bailey@windriver.com>
Date:   Fri Feb 12 11:33:36 2021 -0600

Use kubelet.conf instead of admin.conf on worker nodes during upgrade
    
    Specifying a config file that does not exist causes kubelet upgrades
    to fail on worker nodes when some of the commands return errors.
    
    admin.conf does not exist on worker nodes, but exists on controllers.
    The code has been updated to use kubelet.conf during worker kubelet
    upgrade actions.
    
    The worker init code has also been changed when pulling the pause
    image so that it does not try to contact k8s.gcr.io.
    The kubernetes-version needed to be passed in when querying for the
    pause image.
    
    Story: 2008137
    Task: 41828
    Change-Id: I6565132bd587927bd26c845c2ea56a995ac6da1c
    Signed-off-by: albailey <Al.Bailey@windriver.com>

commit 20f211cbef89be56ad7dd26e93cb720d81a93172
Author: albailey <Al.Bailey@windriver.com>
Date:   Fri Feb 19 12:14:38 2021 -0600

Add bindep target to tox
    
    bindep is a helpful tox target to assist in determining
    what components a test environment needs to have installed.
    
    For stx-puppet, puppet-lint needs ruby headers otherwise
    the tox linters target will fail.
    
    Partial-Bug: #1907678
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: Iaccd8d8f3af292ef29028cde59f0d344b94f1d72

commit c67bd455f8b8bd06ab9611d1b5d6ce7f2f948337
Author: albailey <Al.Bailey@windriver.com>
Date:   Fri Feb 19 10:01:19 2021 -0600

Fix running tox linters in a python2 env
    
    The bandit target is python3, and the package
    fails to be installed in a python2 env.
    
    Partial-Bug: #1907678
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: I9d683c99274dc3120995e0376ace53644dc2a050

commit 54be537f9edea23df45dc3221d9be41d83f13778
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Fri Feb 12 17:45:58 2021 -0600

Add support for dcmanager-audit-worker service
    
    We're moving the bulk of the dcmanager subcloud audits to separate
    worker processes, so we need to add a service for the main worker
    processes (which will then spawn additional workers).
    
    Story: 2007267
    Task: 41869
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
    Depends-On: https://review.opendev.org/c/starlingx/ha/+/775457
    Change-Id: I119d24ae67ec4a40c360ac721582b45388231cbf

commit 3c2f1530c9ee8ccd2c27cb757655a9c851b926ae
Author: Babak Sarashki <zbsarashki@gmail.com>
Date:   Thu Feb 4 23:52:37 2021 +0000

platform puppet: Config ACC100 bbdev with QMGR val
    
    The ACC100 PF and VF configuration takes the same puppet
    config code path as the N3000 except that the ACC100 does
    not require a reset, but requires bbdev config.
    
    This patch adds platform::devices::acc100::fec class to
    exec pf-bb-config to configure QMGR on the Intel ACC100
    (Mt. Bryce) with number of 5G UL/DL qgroups and configures
    the device with the number of VF's.
    
    Story: 2008440
    Task: 41530
    
    Depends-On: https://review.opendev.org/c/starlingx/integ/+/775252
    
    Signed-off-by: Babak Sarashki <zbsarashki@gmail.com>
    Change-Id: I7d42852009fedba5136d9d726092f273ef41c7fd

commit 5a555ad98eb4fb978c7b553d463dfedf4d9b3a25
Author: Eric MacDonald <eric.macdonald@windriver.com>
Date:   Wed Feb 3 12:11:16 2021 -0500

Change collectd plugin search path
    
    This update changes the collectd's plugin
    search path from /etc/collectd.d to
    /etc/collectd.d/starlingx to avoid loading
    the collectd default plugins.
    
    Partial-Fix: 1905581
    Depends-On: https://review.opendev.org/c/starlingx/monitoring/+/772516
    Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
    Change-Id: I1999f25244465430d9c1385a2fc3c002d0e108c9

commit fbb4cdef07c52acb66cfcaf91bc2d029ffb00ff1
Author: Don Penney <don.penney@windriver.com>
Date:   Sun Jan 31 22:02:43 2021 -0500

Reprovision SM services on duplex
    
    Update SM provisioning for duplex to reprovision services
    if needed. The default configuration in SM is duplex services,
    and a simplex node will reprovision these to be simplex. In
    order to support SX to DX migration, these services will also
    be reprovisioned on duplex to ensure the configuration
    is correct.
    
    Story: 2008587
    Task: 41743
    
    Change-Id: Ifb61a6046c680d0dee7c76660397c6fe8c2cbe73
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit b2e37caaeb90e5931cb1522d8d23e6258d506fdb
Author: Jerry Sun <jerry.sun@windriver.com>
Date:   Wed Feb 3 09:26:47 2021 -0500

Etcd parameters lost when changing kube-apiserver parameters
    
    Etcd parameters are getting lost when changing kube-apiserver
    parameters. This is due to no default values being present. The
    missing etcd parameters causes kube-apiserver to fail to start up.
    This commit makes the script for changing kube-apiserver parameters
    keep any existing etcd parameters in the previous config.
    
    Change-Id: I83eb5426ba72a36a5eed3ecbddcbbacdf38803c5
    Closes-bug: 1914291
    Signed-off-by: Jerry Sun <jerry.sun@windriver.com>

commit a4decc6fbc0f796e03f25119b635fad51962fbdd
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Wed Jan 27 14:29:59 2021 -0500

Added class to handle pci runtime config on kubernetes
    
    The new class adds a handler on puppet to trigger
    configuration on runtime when an SR-IOV interface is
    assigned to a data network on an unlocked host
    
    Story: 2008531
    Task: 41707
    Depends-On: https://review.opendev.org/c/starlingx/config/+/772759
    Change-Id: Iddbc272eb6b3321c987c2700e63734ee57244cf9
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>

commit a34cd954e92f37994d40a31ccc2777249598622d
Author: Eric MacDonald <eric.macdonald@windriver.com>
Date:   Mon Jan 25 10:01:29 2021 -0500

Modify collectd manifest to not start collectd
    
    The collectd puppet manifest auto starts the
    collectd process before a node's configuration
    is complete. This has been see to lead to a
    collectd process core dump in the collectd's
    network plugin due to being started before
    networking is setup or fully operational.
    
    Collectd has a service file that has been
    modified by the Depends-On update to start
    collectd after config is complete.
    
    Partial-Bug: 1872979
    Depends-On: https://review.opendev.org/772349
    Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
    Change-Id: I70ded9b745b7dadb7c50b1d5f9ba8bdcb5ffa2da

commit 389f37582a1568bc34089956dc52b6fe5c274b83
Author: John Kung <john.kung@windriver.com>
Date:   Thu Jan 21 09:41:41 2021 -0600

Adjust dcorch database pool size for dcorch scaling
    
    The dcorch database pool sizes are updated based upon
    the delivery of feature with multiple dcorch engine workers.
    
    As the workload is allocated amongst 5 multiple workers,
    the values for the dcorch database pools can be lowered
    from the defaults previously set for the single process case.
    
    The max theoretical database connections allowable per worker
    is based on 100 audit and 100 sync threads.
    
    Furthermore, the dcorch scaling feature audits based on
    audit timestamp so the peak loads are also likely more balanced.
    
    Testcases:
    In multiple subclouds environment, monitor each
    dcorch engine each workers database connections usage:
    subcloud add
    subcloud initial manage
    subcloud resource sync
    subcloud manage and unmanage
    
    Change-Id: Id3386df6289d42080a90b9d97cc0834054160805
    Story: 2007267
    Task: 41650
    Signed-off-by: John Kung <john.kung@windriver.com>

commit b112fce71e8ce69b7065fa3bf8f4da896cd637a3
Author: Litao Gao <litao.gao@windriver.com>
Date:   Tue Jan 12 10:18:08 2021 -0500

VF rate limiting support
    
    This commit implements puppet part logic to perform
    VF max_tx_rate setting according to the configuration.
    
    Story: 2008470
    Task:  41508
    
    Depends-On: https://review.opendev.org/c/starlingx/config/+/770135
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: Ic599f9ac70430529f31d57a74d0809f7077b98e5

commit ce66ffd30cb33f2b770587d7731732e34593e8f1
Author: Martin, Chen <haochuan.z.chen@intel.com>
Date:   Sun Jul 5 20:42:55 2020 +0800

Add puppet class for rook
    
    Create a new drbd device for rook. For duplex system, device mount to
    folder /var/lib/ceph/mon-a for mon data sync on two controllers.
    
    Story: 2005527
    Task: 40281
    
    Change-Id: Ic5edca16e2dce905aeb582b0359446bd222e5ad3
    Signed-off-by: Martin, Chen <haochuan.z.chen@intel.com>

commit 2680e463198cd75b01a8f04140e2d4f72e4844c9
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Tue Jan 19 10:47:17 2021 -0600

Migrate etcd after both controllers are upgraded
    
    The flag to trigger the data migration is now set by the conductor on
    controller-1 and the migration will be performed on controller-0. The
    flag is now set in a drbd synced filesystem so it is accessible to both
    controllers.
    
    Depends-On: https://review.opendev.org/c/starlingx/config/+/771668
    Story: 2008055
    Task: 41631
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>
    Change-Id: I761740f4de24f33f2d314ec1bc8fbc5941607900

commit 978dea28f21592ad4aa79e99821b70a1b07ab438
Author: Takamasa Takenaka <takamasa.takenaka@windriver.com>
Date:   Wed Jan 20 09:47:34 2021 -0300

Remove trap destination from fm.conf
    
    With the host-based SNMP removal,
    remove trap_destination entry from fm.conf
    
    Story: 2008132
    Task: 41350
    Change-Id: I3f0298233beedc3370fa8c4c2dbc65fe678b14a6
    Depends-On: https://review.opendev.org/765381
    Signed-off-by: Takamasa Takenaka <takamasa.takenaka@windriver.com>

commit 0f7418e761fa49b0f5a5edc9593ff9f6c0921206
Author: Angie Wang <angie.wang@windriver.com>
Date:   Mon Sep 28 11:39:17 2020 -0400

Configure SQL as helm storage backend
    
    Configmap is the default helmv2 storage backend to store
    release information but its 1MB resource limit prevents
    scaling up stx openstack worker nodes, so we want to use
    SQL as helm storage backend.
    
    Add class in helm puppet manifest to setup helm database
    during ansible bootstrap.
    
    This commit also fixes the IP address in postgres pg_hba.conf.
    
    Currently, we have the following rules for both IPv4 and
    IPv6 systems:
    Rule Name: allow access to all users with encrypted password
    from all IPv4 addresses.
    host  all  all         0.0.0.0/0   md5
    Rule Name: deny access to postgresql user.
    host  all  postgres    0.0.0.0/32 reject
    
    For the IPv6 system, the address of pods is IPv6. The CIDR
    address in the rule should be changed to corresponding
    IPv6 address (::0/0) to allow tiller running in container
    to access helm database.
    
    Depends-On: https://review.opendev.org/#/c/761645/
    Change-Id: Ifd072000e0680a59d5be0f2f1ef2ce1cbabc1e4f
    Partial-Bug: 1887677
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 4b97414655f5126ce65acf9b15be635483955c74
Author: Takamasa Takenaka <takamasa.takenaka@windriver.com>
Date:   Thu Jan 7 11:12:11 2021 -0300

Support trap_server_port configurable
    
    Add parameter for trap_server_port to make user can
    configure snmp trap server port number through
    user helm override.
    
    Story: 2008132
    Task: 41548
    Signed-off-by: Takamasa Takenaka <takamasa.takenaka@windriver.com>
    Change-Id: Iac44d813447881591efd7b4a088185f2d59986be

commit 777d5d0de78c97fdc223e56662f7d3db6def2768
Author: Zhipeng Liu <zhipengs.liu@intel.com>
Date:   Sat Oct 31 01:15:33 2020 +0800

Enable etcd with security setting.
    
    Update etcd puppet to support security settings.
    
    Partial-Bug: 1894870
    
    Change-Id: Ifb5bb2506a260186bf4e8caa487bbeaae04df80b
    Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>

commit 6182d3f94990ea282004245e2d821eae5ac573ea
Author: Don Penney <don.penney@windriver.com>
Date:   Thu Dec 17 13:21:50 2020 -0500

Add auto-version for remaining stx/stx-puppet packages
    
    Update remaining StarlingX packages with hardcoded TIS_PATCH_VER to
    use PKG_GITREVCOUNT where possible, with offsets as needed to ensure
    the version is incremented above the hardcoded version.
    
    Change-Id: I110ef3a10c3164f8edb706b9257f33178b4a2517
    Story: 2008455
    Task: 41456
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit f8397fe71bae28a4126bbdf38da0731ba529b4c0
Author: Nicolas Alvarez <nicolas.alvarez@windriver.com>
Date:   Thu Nov 26 16:51:32 2020 -0300

Delete SNMP Host-Based entries.
    
    Delete entries related with SNMP Host-Based.
    
    Story: 2008132
    Task: 41323
    Signed-off-by: Nicolas Alvarez <nicolas.alvarez@windriver.com>
    Depends-On: https://review.opendev.org/766094
    
    Change-Id: I2c4a89fd7c4bac9895311787663a6d693600b090

commit b1997248da4bcb1d3ec0ce15d423eb42d2219a3e
Author: Daniel Safta <daniel.safta@windriver.com>
Date:   Mon Oct 5 10:33:36 2020 +0000

Add mds support in puppet for CephFS.
    
    Mds configuration needs to be present on every node that
    has a ceph monitor in order for CephFS to be available.
    
    Change-Id: Ic4270e401b2c3e5123aecfab21af1e874b733830
    Story: 2008162
    Task: 40908
    Signed-off-by: Daniel Safta <daniel.safta@windriver.com>

commit 6f881cc84e3d3c922423441304e7157effc505e7
Author: Andy Ning <andy.ning@windriver.com>
Date:   Thu Dec 3 09:41:08 2020 -0500

Skip platform ceph osds puppet manifest following DOR
    
    ceph::osd puppet manifest will fail during controller puppet
    manifests apply following DOR, because as both controllers are
    booting up, there is no ceph monitor cluster so puppet is unable
    to validate or invalidate the existing configuration.
    
    This change updated platform::ceph::controller class to skip
    platform ceph osds in the case of DOR.
    
    Change-Id: I0254ce28869bc87c5e939ea8984d175244ebb65f
    Partial-Bug: 1904739
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit 8ba9e81db4e238c69edebdfec4738063aad7eb14
Author: Carmen Rata <carmen.rata@windriver.com>
Date:   Tue Dec 1 22:22:57 2020 -0500

Fix directory permissions for /var/log/rabbitmq
    
    Updated /var/log/rabbitmq directory permissions to 750 from 755
    to disallow world access to rabbitmq log files but at the same
    time to allow group access.
    The changes are made to comply as much as possible with
    openscap rules security requirements.
    Verified that installation is successful for AIO-SX
    and Standard 2+2 system configurations.
    
    Story: 2008037
    Task: 40694
    
    Change-Id: I1c0112575033c04983c56298e2131882911333de
    Signed-off-by: Carmen Rata <carmen.rata@windriver.com>

commit 3b7c55174aafffd8f35545ad8e20d928322de2f9
Author: Lu Yao Chen <luyao.chen@windriver.com>
Date:   Wed Nov 25 14:28:48 2020 -0500

Retain more puppet log files
    
    Increased max log directories to retain more
    debugging info from puppet.log.
    
    Was tested by looping system host-cpu-modify
    commands, /var/log/puppet caps at 50 log directories
    instead of 20.
    
    Closes-Bug: 1903994
    
    Signed-off-by: Lu Yao Chen <luyao.chen@windriver.com>
    Change-Id: Ia8458396867f988d5061d3aa49fa2a21ee6ebac2

commit 77d3382d2c63dba2e04cb92333a37b0370992cd5
Author: Carmen Rata <carmen.rata@windriver.com>
Date:   Mon Nov 23 18:22:12 2020 -0500

Fix permission of puppet saved logs tar file
    
    Changed the permissions of puppet saved logs tar file from
    644 to 600 to comply with openscap rules security requirements.
    Verified that installation is successful for AIO-SX
    and Standard 2+2 system configurations.
    
    Story: 2008037
    Task: 40694
    
    Change-Id: I1fe365e808a085999667e898788afacf61fd6612
    Signed-off-by: Carmen Rata <carmen.rata@windriver.com>

commit e5ff48c2ca6931eadff3566de33519a3496beeab
Author: Andy Ning <andy.ning@windriver.com>
Date:   Mon Nov 23 14:07:35 2020 -0500

Remove comments in keystone::upgrade class
    
    The TODO comments in keystone::upgrade class no longer applies.
    This update removed them.
    
    Change-Id: Id9f7b39c15db1f73428d4f23d93ef3e3b4ad50f5
    Partial-Bug: 1886064
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit c10b5897b9d972555228f6510803c48981050e5f
Author: Jerry Sun <jerry.sun@windriver.com>
Date:   Thu Nov 19 11:05:12 2020 -0500

Update dnsmasq config for slow DNS servers
    
    When a configured DNS server is taking a long time to respond to
    unknown domains or hosts, registry interactions like push, pull,
    and querying for images through system commands will fail due to
    hostname resolution for registry.local. This is because it attempt
    to resolve registry.local using the A record first, which times out
    since it is hitting the configured external DNS server. This
    prevents the process from looking up the AAAA record which would
    resolve to the dnsmasq CNAME record. This commit updates the dnsmasq
    config to prevent forwarding the local domain to upstream servers.
    
    Change-Id: Ic3cf6aae87f8f2d5c61a24db00a4cb814c20aac6
    Closes-Bug: 1904885
    Signed-off-by: Jerry Sun <jerry.sun@windriver.com>

commit 3ca2387ddbb455a081689be72632b408988c5d39
Author: Takamasa Takenaka <takamasa.takenaka@windriver.com>
Date:   Tue Nov 3 15:35:29 2020 -0300

Add variables for snmp in fm.conf
    
    Snmp trap client needs the following three variables
    to connect to snmp trap server.
    - trap_server_ip
    - trap_server_port
    - snmp_enabled
    Modify puppet to add these variables. trap_server_ip
    and trap_server_port are fixed. snmp_enabled takes
    True/False depends on snmp armada app is applied
    or not (True when applied).
    
    Change-Id: Ibedaf772153f49c6dfefe644044da07b5d32bb20
    Story: 2008132
    Task: 41207
    Depends-On: https://review.opendev.org/761213
    Signed-off-by: Takamasa Takenaka <takamasa.takenaka@windriver.com>

StarlingX

ETCD poor latency performance and failure under load

Bug Description

CVE References

Other bug subscribers

Remote bug watches