StarlingX

cert-manger and platform-integ-apps alarm 750.006 after controller-0 unlock

Bug #1923587 reported by Andrei Grosu on 2021-04-13

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Fix Released	Low	Andrei Grosu

Bug Description

Brief Description
-----------------

Applying applications intermittently fails because the postgres db cannot be reached.

Severity
--------
Minor

Expected Behavior
------------------

Apply should succeed and the logic should check/wait for the database service to be up and running , accepting connections.

Reproducibility
---------------
Intermittent , very low reproductibility.

System Configuration
--------------------

2 controllers, 2 storage, 1 worker nodes.

Logs
----

Armada apply for cert-manager at 18:19:16 fails

sysinv 2021-03-20 18:19:16.729 728680 INFO sysinv.conductor.kube_app [-] Armada apply command: 'armada apply --debug --enable-chart-cleanup /tmp/manifests/cert-manager/1.0-13/cert-manager-certmanager-manifest.yaml --values /tmp/overrides/cert-manager/1.0-13/cert-manager-cert-manager.yaml --values /tmp/overrides/cert-manager/1.0-13/cert-manager-psp-rolebinding.yaml '
sysinv 2021-03-20 18:19:16.881 728680 INFO sysinv.conductor.kube_app [-] Starting progress monitoring thread for app cert-manager
sysinv 2021-03-20 18:19:18.679 728680 ERROR sysinv.conductor.kube_app [-] Failed to apply application manifest /manifests/cert-manager/1.0-13/cert-manager-certmanager-manifest.yaml with exit code 1. See /var/log/armada/cert-manager-apply_2021-03-20-18-19-15.log for details.

Armada logs

get_results /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:215^[[00m
2021-03-20 18:19:18.581 69 INFO armada.handlers.lock [-] Releasing lock^[[00m
2021-03-20 18:19:18.587 69 ERROR armada.cli [-] Caught unexpected exception: grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "write tcp [abcd:206::a4ce:fec1:5423:e306]:37896->[abcd:204::1]:5432: write: connection timed out"
debug_error_string = "{"created":"@1616264357.608286155","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"write tcp [abcd:206::a4ce:fec1:5423:e306]:37896->[abcd:204::1]:5432: write: connection timed out","grpc_status":2}"
>
2021-03-20 18:19:18.587 69 ERROR armada.cli Traceback (most recent call last):
2021-03-20 18:19:18.587 69 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/cli/_init_.py", line 38, in safe_invoke
2021-03-20 18:19:18.587 69 ERROR armada.cli self.invoke()
2021-03-20 18:19:18.587 69 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/cli/apply.py", line 213, in invoke
2021-03-20 18:19:18.587 69 ERROR armada.cli resp = self.handle(documents, tiller)
2021-03-20 18:19:18.587 69 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py", line 81, in func_wrapper
2021-03-20 18:19:18.587 69 ERROR armada.cli return future.result()
2021-03-20 18:19:18.587 69 ERROR armada.cli File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
2021-03-20 18:19:18.587 69 ERROR armada.cli return self.__get_result()
2021-03-20 18:19:18.587 69 ERROR armada.cli File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
2021-03-20 18:19:18.587 69 ERROR armada.cli raise self._exception
2021-03-20 18:19:18.587 69 ERROR armada.cli File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
2021-03-20 18:19:18.587 69 ERROR armada.cli result = self.fn(*self.args, **self.kwargs)
2021-03-20 18:19:18.587 69 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/cli/apply.py", line 256, in handle
2021-03-20 18:19:18.587 69 ERROR armada.cli return armada.sync()
2021-03-20 18:19:18.587 69 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/handlers/armada.py", line 189, in sync
2021-03-20 18:19:18.587 69 ERROR armada.cli known_releases = self.tiller.list_releases()
2021-03-20 18:19:18.587 69 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py", line 252, in list_releases
2021-03-20 18:19:18.587 69 ERROR armada.cli releases = get_results()
2021-03-20 18:19:18.587 69 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py", line 220, in get_results
2021-03-20 18:19:18.587 69 ERROR armada.cli for message in response:
2021-03-20 18:19:18.587 69 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/grpc/channel.py", line 364, in __next_
2021-03-20 18:19:18.587 69 ERROR armada.cli return self._next()
2021-03-20 18:19:18.587 69 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 358, in _next
2021-03-20 18:19:18.587 69 ERROR armada.cli raise self
2021-03-20 18:19:18.587 69 ERROR armada.cli grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
2021-03-20 18:19:18.587 69 ERROR armada.cli status = StatusCode.UNKNOWN
2021-03-20 18:19:18.587 69 ERROR armada.cli details = "write tcp [abcd:206::a4ce:fec1:5423:e306]:37896->[abcd:204::1]:5432: write: connection timed out"
2021-03-20 18:19:18.587 69 ERROR armada.cli debug_error_string = "{"created":"@1616264357.608286155","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"write tcp [abcd:206::a4ce:fec1:5423:e306]:37896->[abcd:204::1]:5432: write: connection timed out","grpc_status":2}"
2021-03-20 18:19:18.587 69 ERROR armada.cli >
2021-03-20 18:19:18.587 69 ERROR armada.cli ^[[00m
command terminated with exit code 1

Comments
--------

It seems that the postgres db on active controller takes too long to accept requests.
In the logs, subsequent apply operations succeed, so the db eventually accepts connections.
The existing code simply checks that the pod is up and running, which might not mean that the postgres service in the pod is accepting connections.
The proposed fix is to add an extra explicit check for db connectivity.

Tags:

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2021-04-13:

lower priority as issue is intermittent, but would be nice to fix

Changed in starlingx:
assignee:	nobody → Andrei Grosu (agrosu1)
importance:	Undecided → Low
status:	New → Triaged
tags:	added: stx.containers

OpenStack Infra (hudson-openstack) on 2021-04-13

Changed in starlingx:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-04-29: Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/786021
Committed: https://opendev.org/starlingx/config/commit/5edd3bdbe588e2c2e7a58cb839f030305613c30f
Submitter: "Zuul (22348)"
Branch: master

commit 5edd3bdbe588e2c2e7a58cb839f030305613c30f
Author: Andrei Grosu <email address hidden>
Date: Tue Apr 13 08:52:40 2021 +0000

Check for connectivity to the tiller postgres backend.

    The existing code checks that the pod(s) are 'Running' but that
    might not be enough as the service inside the pod (postgres)
    might not be able to accept connections.

    Closes-Bug: 1923587
    Signed-off-by: Andrei Grosu <email address hidden>
    Change-Id: Ide49e4a38b805d5fc41d9f06d94393c69c6ed9d2

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

Frank Miller (sensfan22) wrote on 2021-05-03:

Re-opening this LP as the original commit needed to be reverted:
https://review.opendev.org/c/starlingx/config/+/789011

Some re-work is required before a new commit can be proposed and this LP moved back to Fix Released.

Changed in starlingx:
status:	Fix Released → Confirmed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-05: Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/789828

Changed in starlingx:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-06:

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/790011

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-07: Change abandoned on config (master)

Change abandoned by "Andrei Grosu <email address hidden>" on branch: master
Review: https://review.opendev.org/c/starlingx/config/+/790011

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-12: Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/789828
Committed: https://opendev.org/starlingx/config/commit/12fff41d7803c7cea2b34e356ac65d361ca57789
Submitter: "Zuul (22348)"
Branch: master

commit 12fff41d7803c7cea2b34e356ac65d361ca57789
Author: Andrei Grosu <email address hidden>
Date: Wed May 5 13:03:50 2021 +0000

Handle empty 'helm list' result when there is nothing deployed

    The existing code assumes that there are always applications deployed
    and the result is never an empty list.
    The previous implementation ignored the return code when the subprocess
    was killed by the timeout handler.
    Split the method in two submethods for helm v2 and v3 implementations.

    Closes-Bug: 1923587
    Signed-off-by: Andrei Grosu <email address hidden>
    Signed-off-by: Angie Wang <email address hidden>
    Change-Id: Ib547bdb20c39e35c1538e3abb90108f7e3cad228

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-27: Fix proposed to config (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793460

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-30:

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793696

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-03:

#10

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/794611

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-05:

#11

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/794906

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-05: Change abandoned on config (f/centos8)

#12

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/794611

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-07: Fix merged to config (f/centos8)

#13

Download full text (147.3 KiB)

Reviewed: https://review.opendev.org/c/starlingx/config/+/794906
Committed: https://opendev.org/starlingx/config/commit/75758b37a5a23c8811355b67e2a430a1713cd85b
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 9e420d9513e5fafb1df4d29567bc299a9e04d58d
Author: Bin Qian <email address hidden>
Date: Mon May 31 14:45:52 2021 -0400

Add more logging to run docker login

Add error log for running docker login. The new log could
help identify docker login failure.

    Closes-Bug: 1930310
    Change-Id: I8a709fb6665de8301fbe3022563499a92b2a0211
    Signed-off-by: Bin Qian <email address hidden>

commit 31c77439d2cea590dfcca13cfa646522665f8686
Author: albailey <email address hidden>
Date: Fri May 28 13:42:42 2021 -0500

Fix controller-0 downgrade failing to kill ceph

kill_ceph_storage_monitor tried to manipulate a pmon
file that does not exist in an AIO-DX environment.

We no longer invoke kill_ceph_storage_monitor in an
AIO SX or DX env.

    This allows: "system host-downgrade controller-0"
    to proceed in an AIO-DX environment where that second
    controller (controller-0) was upgraded.

    Partial-Bug: 1929884
    Signed-off-by: albailey <email address hidden>
    Change-Id: I633853f75317736084feae96b5b849c601204c13

commit 0dc99eee608336fe01b58821ea404286371f1408
Author: albailey <email address hidden>
Date: Fri May 28 11:05:43 2021 -0500

Fix file permissions failure during duplex upgrade abort

    When issuing a downgrade for controller-0 in a duplex upgrade
    abort and rollback scenario, the downgrade command was failing
    because the sysinv API does not have root permissions to set
    a file flag.
    The fix is to use RPC so the conductor can create the flag
    and allow the downgrade for controller-0 to get further.

    Partial-Bug: 1929884
    Signed-off-by: albailey <email address hidden>
    Change-Id: I913bcad73309fe887a12cbb016a518da93327947

commit 7ef3724dad173754e40b45538b1cc726a458cc1c
Author: Chen, Haochuan Z <email address hidden>
Date: Tue May 25 16:16:29 2021 +0800

Fix bug rook-ceph provision with multi osd on one host

    Test case:
    1, deploy simplex system
    2, apply rook-ceph with below override value
    value.yaml
    cluster:
      storage:
        nodes:
        - name: controller-0
          devices:
          - name: sdb
          - name: sdc
    3, reboot

Without this fix, only osd pod could launch successfully after boot
as vg start with ceph could not correctly add in sysinv-database

Closes-bug: 1929511

Change-Id: Ia5be599cd168d13d2aab7b5e5890376c3c8a0019
Signed-off-by: Chen, Haochuan Z <email address hidden>

commit 23505ba77d76114cf8a0bf833f9a5bcd05bc1dd1
Author: Angie Wang <email address hidden>
Date: Tue May 25 18:49:21 2021 -0400

Fix issue in partition data migration script

    The created partition dictonary partition_map is not
    an ordered dict so we need to sort it by its key -
    device node when iterating it to adjust the device
    nodes/paths for user created extra partitions to ensure
    the number of device node...

Reviewed:  https://review.opendev.org/c/starlingx/config/+/794906
Committed: https://opendev.org/starlingx/config/commit/75758b37a5a23c8811355b67e2a430a1713cd85b
Submitter: "Zuul (22348)"
Branch:    f/centos8

commit 9e420d9513e5fafb1df4d29567bc299a9e04d58d
Author: Bin Qian <bin.qian@windriver.com>
Date:   Mon May 31 14:45:52 2021 -0400

Add more logging to run docker login
    
    Add error log for running docker login. The new log could
    help identify docker login failure.
    
    Closes-Bug: 1930310
    Change-Id: I8a709fb6665de8301fbe3022563499a92b2a0211
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit 31c77439d2cea590dfcca13cfa646522665f8686
Author: albailey <Al.Bailey@windriver.com>
Date:   Fri May 28 13:42:42 2021 -0500

Fix controller-0 downgrade failing to kill ceph
    
    kill_ceph_storage_monitor tried to manipulate a pmon
    file that does not exist in an AIO-DX environment.
    
    We no longer invoke kill_ceph_storage_monitor in an
    AIO SX or DX env.
    
    This allows: "system host-downgrade controller-0"
    to proceed in an AIO-DX environment where that second
    controller (controller-0) was upgraded.
    
    Partial-Bug: 1929884
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: I633853f75317736084feae96b5b849c601204c13

commit 0dc99eee608336fe01b58821ea404286371f1408
Author: albailey <Al.Bailey@windriver.com>
Date:   Fri May 28 11:05:43 2021 -0500

Fix file permissions failure during duplex upgrade abort
    
    When issuing a downgrade for controller-0 in a duplex upgrade
    abort and rollback scenario, the downgrade command was failing
    because the sysinv API does not have root permissions to set
    a file flag.
    The fix is to use RPC so the conductor can create the flag
    and allow the downgrade for controller-0 to get further.
    
    Partial-Bug: 1929884
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: I913bcad73309fe887a12cbb016a518da93327947

commit 7ef3724dad173754e40b45538b1cc726a458cc1c
Author: Chen, Haochuan Z <haochuan.z.chen@intel.com>
Date:   Tue May 25 16:16:29 2021 +0800

Fix bug rook-ceph provision with multi osd on one host
    
    Test case:
    1, deploy simplex system
    2, apply rook-ceph with below override value
    value.yaml
    cluster:
      storage:
        nodes:
        - name: controller-0
          devices:
          - name: sdb
          - name: sdc
    3, reboot
    
    Without this fix, only osd pod could launch successfully after boot
    as vg start with ceph could not correctly add in sysinv-database
    
    Closes-bug: 1929511
    
    Change-Id: Ia5be599cd168d13d2aab7b5e5890376c3c8a0019
    Signed-off-by: Chen, Haochuan Z <haochuan.z.chen@intel.com>

commit 23505ba77d76114cf8a0bf833f9a5bcd05bc1dd1
Author: Angie Wang <angie.wang@windriver.com>
Date:   Tue May 25 18:49:21 2021 -0400

Fix issue in partition data migration script
    
    The created partition dictonary partition_map is not
    an ordered dict so we need to sort it by its key -
    device node when iterating it to adjust the device
    nodes/paths for user created extra partitions to ensure
    the number of device node/path for each extra partition
    is calculated correctly, otherwise the adjustments
    could be messy that causes the partition DB update to
    fail.
    
    Tested AIO-SX upgrade with three additional partitions.
    
    Change-Id: I1cb3bbfaf144a59d29633c1784b0fde80529cd71
    Closes-Bug: 1892554
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 204d5c36438979767a5c30a31da332cfffde4e66
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Mon May 24 10:45:43 2021 +0300

Expose ceph backend field over proxy endpoint
    
    The main storage backend endpoint acts like a proxy. It allows ceph
    backend to be configured, but doesn't return all the desired fields
    when doing a query.
    
    Fill the information about ceph backend network parameter.
    
    Story: 2008843
    Task: 42350
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I6521f3727ed96be33ade2419946c875d7ffb6e13

commit c6f0967086e61c7e55d8e0501f94168b82cb7040
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Sat May 22 19:04:01 2021 -0500

Apply sriov/FEC configuration during SX upgrade
    
    During the SX we also need to apply the sriov/FEC manifest in order to
    configure the FEC devices. This needs to be done before the sysinv agent
    starts to maintain the host configuration.
    
    Closes-Bug: 1929301
    Change-Id: I99bd891bf43fd5912d0297861fce819fe6ce678f
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit 9b8a1d074f85dc3e246d164e5d0356e44906287b
Author: Angie Wang <angie.wang@windriver.com>
Date:   Thu May 20 22:26:46 2021 -0400

Ensure the old app plugins enabled in app recovery lifecycle
    
    When an application update fails, the app should be recovered
    to the pervious version. In the case that the new app version
    is decoupled(which has the plugins infrastructure) but the old
    app version is not, the old app version is not considered to be
    a system knowledgeable app which causes the plugins activation
    to be skipped during the subsequent app update so that an error
    occurs because the lifecycle operator is still looking for the
    new app version's plugins.
    
    Update to ensure the old app plugins are enabled before armada
    process during recovery and after recovery is completed. For the
    apps decoupled in stx5.0 but not decoupled in stx4.0 like
    nginx-ingress-controller and portieris, only reload the operators.
    The particular handling for non-decoupled apps can be removed in
    the release stx6.0 as all apps in stx5.0 are decoupled.
    
    Change-Id: Ief79baac428af7f926f8721f15ded340e3cf1e44
    Closes-Bug: 1929149
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 8423e70fd04f07bbf6a22eb83d45c719663b0c51
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Fri May 21 12:56:47 2021 +0300

Fix pod max pids service parameter default value
    
    Openstack installation fails for rabbit-mq pods.
    
    Change the approach of how the default value is selected.
    Document recommended minimum values for apps instead of using them.
    Select the default value as high as possible, protecting against a
    rogue pod, protecting against platform slowdowns created by high number
    of processes in the system, but low enough such that platform is still
    responsive even on older hardware.
    User is free to decrease the limit to increase the degree of protection
    against slowdowns.
    
    Initially it was observed that openstack pods reach ~450 processes
    in steady state.
    New tests show even with the 2/3 extra room, 750 pid limit is not
    sufficient when deploying rabbit-mq pods. But 2000 is.
    Recommended minimum value for openstack pods pid limit becomes 2000.
    
    Partial-Bug: 1928949
    Related-Bug: 1928353
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I0d66173e2247fae15eda1ad0e83c7bcf858f0369

commit 7d740e60f41bf4ee57210caa7acddaa1e9c543fb
Author: Don Penney <don.penney@windriver.com>
Date:   Thu May 20 23:04:47 2021 -0400

Add coredns SX to DX migration manifest
    
    As part of the SX to DX migration, a runtime manifest class is added
    to update coredns configuration. This commit updates the system modify
    handler to apply this runtime manifest when modifying system mode from
    simplex to duplex.
    
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/792494
    Change-Id: I03b609de16d0273bf116f318affbe1f22867cfa9
    Partial-Bug: 1929148
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit 9d8cdc5bb33f9e8ae4dfd658a0c9b216e7557431
Author: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
Date:   Fri May 14 14:45:29 2021 -0400

Replace applying flag by dict for FEC device check
    
    This change replaces the latest solution to check FEC device before an
    unlock action, that relied on an '/APPLYING' flag. In certain asynchro-
    nous scenarios, that flag could be cleared before than expected if an
    inventory report not related to the FEC device configuration came late
    (that might happen when configuring a long queue of SRIOV port changes)
    or by periodic sysinv report.
    The solution still uses the 'extra_info' field of PCI devices, this
    time "stringifying" a dictionary entry for 'expected_numvfs' that will
    keep (without clearing) at that field the programmed number of VFs at
    FEC device. It is then compared with the actual sriov_numvfs of device
    from the inventory report, in a similar way of what is currently done
    for comparing SRIOV interfaces (from database) to ports (from device).
    
    Closes-bug: 1927089
    Signed-off-by: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
    Change-Id: I380bd66a8229a72ef1981cbefa3a0543c28d7f30

commit 6d262e1b4c80ef6567edf7ebcaa494e143b30a77
Author: albailey <Al.Bailey@windriver.com>
Date:   Thu May 20 08:43:15 2021 -0500

Fix zuul for bandit target
    
    Some zuul nodes running bionic do not consider the
    older version of pyflakes to be installable. This seems to
    be a cache issue.
    
    This fix updates the version of hacking defined in the top
    level test-requirements.txt file to use a more modern version.
    
    It also only imports yamllint if it is python 3, since the yamllint
    tox target is python 3 only.
    
    Partial-Bug: 1928978
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: Ia7aa6a296810adc0d9ba9eca701ec70f2c4be8cd

commit d55ca91b907c45dc73d4d44e39560e37a7b668ef
Author: albailey <Al.Bailey@windriver.com>
Date:   Wed May 19 13:51:15 2021 -0500

Specify the nodeset for zuul jobs
    
    The py2.7 jobs need to specify xenial
    The py3.6 jobs need to specify bionic
    The focal zuul nodes only have python 3.8 installed in them
    
    The copyright date was updated for some files in order to trigger
    the zuul jobs, as a no-delta type of change.
    
    Partial-Bug: 1928978
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: Ifc7904d4908a5dbe2ffbd9214e5e4c425932afad

commit f3edd0cd6a278d1cc537529ac366c23cc6d4919d
Author: Lucas Cavalcante <lucasmedeiros.cavalcante@windriver.com>
Date:   Thu Apr 22 07:20:05 2021 -0300

Add pull_image option to perform_app_upload
    
    Downloading images at application upload enables subclouds to use the
    central cloud registry even when the central does not have the same
    application applied. This way Central would only need to upload an app
    such as Openstack in order to have images available for the subclouds.
    
    This is done by adding an extra option `-i` or `--images` that
    defaults to False.
    
    Testing:
    * VirtualBox: AIO-DX (Central), AIO-SX (Subcloud)
    * Upload an app to the Central Cloud using `--image` or '-i'
    * option (without applying it) Upload and apply app at
    * Subcloud using the DC registry
    
    Signed-off-by: Lucas Cavalcante <lucasmedeiros.cavalcante@windriver.com>
    Change-Id: I706c2bdf233617aadae8506724dde1afbbc1b35b
    Closes-Bug: 1925844

commit 30e0d90d24df92e79a414e33c1779aed5e43eb2c
Author: Angie Wang <angie.wang@windriver.com>
Date:   Wed May 19 10:48:44 2021 -0400

Update k8s application upgrade script for corner case
    
    The upgrade script may report the application update incorrectly
    because there is a window when querying the application version
    and status during update.
    
    In the case that the application update to the new version is
    failed, the query for the app version is made before the
    application recovery is triggered, and the query for the app
    status is made after the application is recovered to the previous
    version, the script gets incorrect information and reports the
    application is updated successfully.
    
    Update the application query in one request to eliminate this
    possiblity.
    
    Change-Id: Icbf173214de591861c21841e22359ad981453e84
    Story: 2008055
    Task: 42246
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit e4bc9dc6026f76964e91809d3971e73160d653db
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Mon May 17 13:23:54 2021 +0300

Fix skipping recovery of the old app for app updates
    
    When app update fails and it is requested that rollback is skipped it
    still reverts to the old version of the app.
    
    Confusion here was created by the names of the building blocks for the
    update logic (perform_app_rollback, perform_app_recover). In fact it is
    desired to skip operations that recover the old app.
    
    Added the missing logic for path of failed update operation.
    Now both upgrades and downgrades of the app behave the same.
    
    Tested by changing the pvc claim to trigger the armada failure.
    
    Closes-Bug: 1928671
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I6792744257a9cb249e0b1bf99f9b78f3b27859d9

commit eec51d6a4a47737b1eba8585c412449292938ad9
Author: Vinicius Lopes da Silva <vinicius.lopesdasilva@windriver.com>
Date:   Thu Apr 29 10:55:40 2021 -0400

Delete ceph stor when removing controller
    
    When removing controller from cluster, its OSD is not being removed
    from ceph. This happens when removal is made by issuing "host-delete"
    command.
    
    This commit adds a call to remove the OSD from Ceph when controller is
    removed.
    
    Closes-Bug: #1926626
    
    Signed-off-by: Vinicius Lopes da Silva <vinicius.lopesdasilva@windriver.com>
    Change-Id: If0fd9260ab7b9c717ef0a4ae621da0ffc9d0e6ab

commit 32b4df542c4566ad08f057950f78dae7633a8233
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Tue May 18 16:02:29 2021 -0400

Notify dcmanager when k8s upgrade completes
    
    When a k8s upgrade has been completed, we notify dcmanager so that
    it can do a kubernetes audit of the subclouds immediately rather
    than waiting up to an hour for the normal audit to run.
    
    Change-Id: Ife2abdbc65ad4ee91441db8fa39cb80291cbe201
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>
    Partial-Bug: 1928864

commit f9a862b5d80943b76cd71dab84c765f9b17af89e
Author: albailey <Al.Bailey@windriver.com>
Date:   Tue May 18 11:06:17 2021 -0500

Fixing pylint failures in zuul.
    
    pylint running in python3.8 will complain about some
    code such as 'import contextlib'
    
    Rather than destabilize the code, specify python3.6 for
    the pylint job.
    
    When python2 is fully dropped, the changes can be addressed.
    
    The zuul nodeset for bionic sets up a venv for python3.6
    
    Closes-Bug: #1928841
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: Ib2e7f467ce20e5b31aef7242405ef034583d4e1a

commit bf547186d19a218c03e73f31c5d7cafd1a5d4bd3
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Wed May 12 15:16:32 2021 +0300

Add service parameter to control pod pids limit
    
    Create a config section for kubernetes service.
    Create a parameter named pod_max_pids to have similar name as
    the kubernetes parameter pod-max-pids.
    Store the value in the config section.
    
    This will create a system-wide entry in hieradata when unlocking:
    plattform::kubernetes::params::k8s_pod_max_pids
    
    This affects hosts with kubelet running, meaning controller and
    worker personalities. A config out of date will be raised for all hosts
    of both personalities, even for parameters that target only a specific
    personality.
    
    After modifying the parameter a host-lock then host-unlock is required.
    
    Platform pods use under 20 processes in steady state.
    Some openstack pods reach ~450 processes in steady state.
    Since StarlingX provides some optional apps we provide a default value
    that takes into account the most hungry app, that being openstack.
    The database entry will be populated considering openstack will be
    applied.(I707ddc4ca67595fbf809c6ffc15ecd4fb21f4661), but we shouldn't
    restrict the minimum based on optional apps, as this allows the user
    to set a lower minimum if there is no plan to use openstack.
    
    Tested on Standard+dedicated storage:
    - out of sync raised for controllers and workers when using
    service-parameter modify
    - alarm cleared after host-lock, host-unlock
    - new value correctly generated and used
    - add with system service-parameter-add
    - modify with system service-parameter-modify
    
    Tested on top of: I10c1684fe3145e0a46b011f8e87f7a23557ddd4a
    Partial-Bug: 1928353
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I74fcf2bd405c2a3811a4f27a55b28c0d001430e1

commit 82f01a8912567e48c0778689c80919e3fc6d9ce9
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Mon May 17 18:44:29 2021 +0300

Re-add wrongly removed function in helm plugins
    
    Some app plugins that are based on base.BaseHelm use a function wrongly
    removed in I681ccb3302b8f233424bc291e08675a4dc2b10f7.
    Re-add the function.
    
    Closes-Bug: 1928696
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I43fe2065714cbae2eebf43469424e812b8783218

commit cfe94d9dae6f55d5d44b23a57d742827f43b16ca
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Mon May 17 18:13:20 2021 +0300

Fix semantic check for N+1 app version
    
    Add the missing property 'relative_timing' for a LifecycleHookInfo
    object.
    
    Closes-Bug: 1928692
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I5c87aac9497f19f3a5adb8d383465c78cdad0a7b

commit 703b9dc6f97628271950f9b7352fdb8d4df8f74d
Author: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>
Date:   Wed May 12 16:51:19 2021 -0300

Refactor and expose logic to acquire a flock with retries
    
    When locking the file descriptor skip_udev_partition_probe was not
    handling errors thrown by fcntl.flock which was leading controller-0
    to degraded state after unlock. This change aims to strengthen that
    logic by handling the error properly, retrying the lock operation and
    improving logs.
    
    Re-implementation of commit cbb9121a289603ec003dec098b8fa5918ca98300.
    The original commit inadvertently replaced a shared lock with a
    exclusive lock on the decorator skip_udev_partition_probe which caused
    fd locking issues.
    
    This commit exposes utility functions to acquire shared or exclusive
    non-blocking locks of file descriptors.
    
    Tested on Standard (2 + 4) and AIO-Simplex configurations. Ran sanity
    load on both.
    
    Closes-Bug: 1922256
    Change-Id: Ifcddab027df955152f420fd7451f42167694a31a
    Signed-off-by: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>

commit d4f82539e0d421ab8a7f1cd466bdbc269727bddc
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Thu May 13 22:23:59 2021 -0400

Leave parameter_reselect as null if not specified
    
    The sysinv API for interface returns the optional parameter
    'primary_reselect' with its default value when the attribute
    is not specified.
    This update is to leave the parameter as null if it is not
    specified.
    
    Closes-Bug: 1928461
    
    Change-Id: I67629aec1e58c26b1ed76c0cd1e37cd53e74b0b2
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit ca1fbf08cbb2e0bdf3f98f7fa091af178768992d
Author: John Kung <john.kung@windriver.com>
Date:   Thu May 13 14:00:12 2021 -0500

Update config file update operations to be more atomic
    
    sysinv agent iconfig_update_file() was unconditionally deleting
    the file or symlink to be updated prior to updating it. This could
    render the file missing if sysinv-agent is restarted.
    
    This update copies the updated file contents to the file and
    only removes symlink, if any, when needed.
    
    A few minor logging improvements related to logs observed in LP.
    
    Change-Id: I3c97e778b17dd0e9693a156ffe6b1c0269413f20
    Closes-Bug: 1928368
    Signed-off-by: John Kung <john.kung@windriver.com>

commit b62a585ab191ac74e71ff6b4b57d716a3aa3ea2f
Author: Angie Wang <angie.wang@windriver.com>
Date:   Tue May 11 17:30:18 2021 -0400

Revert "Revert "Check for connectivity to the tiller postgres backend.""
    
    This reverts commit 46421279912daa62162e493c3455ae5b9b75cf69.
    
    One additional change made based on the orignal commit is calling
    retrieve_helm_v2_releases() instead of retrieve_helm_releases()
    to make helmv2 query only as only helmv2 is using postgres backend.
    
    Depends-On: https://review.opendev.org/c/starlingx/config/+/789828
    Change-Id: Ia3c52192cea7c3addec446b22436db7a028ec5bc
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 12fff41d7803c7cea2b34e356ac65d361ca57789
Author: Andrei Grosu <andrei.grosu@windriver.com>
Date:   Wed May 5 13:03:50 2021 +0000

Handle empty 'helm list' result when there is nothing deployed
    
    The existing code assumes that there are always applications deployed
    and the result is never an empty list.
    The previous implementation ignored the return code when the subprocess
    was killed by the timeout handler.
    Split the method in two submethods for helm v2 and v3 implementations.
    
    Closes-Bug: 1923587
    Signed-off-by: Andrei Grosu <andrei.grosu@windriver.com>
    Signed-off-by: Angie Wang <angie.wang@windriver.com>
    Change-Id: Ib547bdb20c39e35c1538e3abb90108f7e3cad228

commit 858aee342dcb94654c9fcd8f0731a2838d845eb1
Author: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
Date:   Thu May 6 07:35:56 2021 -0400

Check for FEC device configuration pending
    
    SRIOV configuration at Intel ACC100 device configuration was not being
    correctly applied when it followed a bunch of other configurations, like
    data networks, SRIOV interfaces, etc., and soon before the unlock of the
    host, because runtime manifest was not finishing before system shutdown.
    
    "This issue came about because of the way we handle 'pci_devices' in
    contrast to 'interfaces/ports'. For the interfaces/ports, when
    configuring SR-IOV, the interface sriov_numvfs will represent the user
    requested value, while the underlying port sriov_numvfs will represent
    the actual system value. Then, before a system can be unlocked, the
    values are compared for equality. This ensures that the runtime manifest
    that is setting the value of sriov_numvfs has a chance to run. For
    'pci_devices' like the FPGA cards we support, there is no concept of an
    'upper interface'. Therefore, the value of sriov_numvfs for a pci_device
    represents the system value rather than the user requested value. This
    can cause issues when performing and unlock right after configuring a
    pci_device with SR-IOV. There's a chance that the system populates its
    hieradata and unlocks before the runtime manifest has had a chance to
    configure the SR-IOV value for the pci_device." (Steven Webster)
    
    This change appends a "/APPLYING" string to 'extra_info' field of FEC
    device when it gets configured via API, and it will be removed only when
    the corresponding inventory is reported back by SYSINV agent. Until
    there, attempts of unlocking the host will find that sub-string at the
    field and this is subject of additional semantic check.
    
    Closes-Bug: 1927089
    Signed-off-by: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
    Change-Id: I175bc01a2a51808c4dc7b821905c7417660bf286

commit a6481bc4d169ab3c81c73b6807299d3febf7d591
Author: Andy Ning <andy.ning@windriver.com>
Date:   Wed May 5 09:34:00 2021 -0400

Fix invalid admin endpoint cert during subcloud upgrade
    
    cert-mon queues failed cert update tasks and retry them later on. But
    the retry periodic function is not started in subcloud so it doesn't
    really work. This commit fix it by starting these periodic functions if
    the system's DC role is subcloud.
    
    This commit also added unauthorization exception handling for platform
    cert update, so that the retry task will reattempt updating the cert
    with a new token. The other certs update already have such exception
    handling.
    
    Note, commit 862c1746abb8d8901d2acb4bcb43569210e55f3e is needed to fully
    fix Bug 1926788.
    
    Closes-Bug: 1926788
    Signed-off-by: Andy Ning <andy.ning@windriver.com>
    Change-Id: If7f631ee3e5f97db7a06b184f9e68cf901cc8344

commit 4e75297b58c4a488451d575819608ec203c89972
Author: Andy Ning <andy.ning@windriver.com>
Date:   Mon May 10 12:24:52 2021 -0400

Specify default connect_timeout for sysinv agent
    
    During bootstrap, for some reason rabbitMQ takes longer time to
    response to sysinv agent's AMQP request after TCP connection is
    established. This usually happens after multiple connection failures,
    and can be as long as 8s. But it's still within th 10s
    handshake_timeout configured on rabbitMQ. However the sysinv agent
    uses the default 5s timeout from kombu lib so it doesn't wait long
    enough before hanging up the connection. This causes the agent can't
    reconnect to the rabbitMQ. In turn, the defered manifests don't have
    chance to pass readiness checking and get applied.
    
    This commit specifies a 10s default connect_timeout for the agent
    t0 connect to rabbitMQ, aligning with the server side timeout.
    
    Closes-Bug: 1928008
    Signed-off-by: Andy Ning <andy.ning@windriver.com>
    Change-Id: I8cc476910f47fca687ddcfd5f3d20f451f70ffce

commit 55e70b52d73c0c7948fa186a741600c190f0cd2a
Author: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>
Date:   Mon May 10 12:27:04 2021 -0300

Fix bootstrap error on AIO-SX due to post SX to DX migration actions
    
    During start-up of sysinv-conductor we test if the AIO-SX to AIO-DX
    migration is occuring and it relies on an active ihost being available
    which is not the case for when the host is bootstrapping. This commit
    adds a check to whether host_uuid and ihost are available before usage.
    
    Closes-Bug: 1927984
    Signed-off-by: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>
    Change-Id: I3287852242579164604a7815a599c4f7e9f704f8

commit 6df2034a4e9e2a25cd0ba39af7074fec5a26466d
Author: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>
Date:   Wed May 5 11:34:47 2021 -0300

Adding AIO-SX to AIO-DX migration steps patching existing PVs
    
    Kubelet and kube-api are no longer available during puppet
    manifest run during unlock. Therefore, we moved the patching
    of Persistent Volumes from puppet tosysinv-conductor
    as a post-migration step during its start-up.
    
    Closes-Bug: 1927224
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/789844
    Depends-On: https://review.opendev.org/c/starlingx/fault/+/790183
    Change-Id: I9745b7f8547c82485353130156011650f2655317
    Signed-off-by: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>

commit 407a3e374815c7d325b650e62205e4898ab89b13
Author: Rafael Jordão Jardim <RafaelJordao.Jardim@windriver.com>
Date:   Fri Apr 23 13:26:49 2021 -0400

Send the binary data instead of path
    
    The issue is when it executes a system
    application-upload in the remote cli it returns
    an error "Application-upload rejected: application
    tar file /wd/custom_apps/hello-kitty.tgz does not exist"
    the reason is because the cgts-client sends the
    path of the tarbal, the proposal solution sends
    the binary data and save it in a path on controller
    
    Closes-bug: 1926308
    Signed-off-by: Rafael Jordão Jardim <RafaelJordao.Jardim@windriver.com>
    Change-Id: I6dadef9e86612328ae68fc90564e929646e93dba

commit 9e9979d96e99c7cf8484f28a93d0635f3012fc5d
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Thu May 6 15:30:43 2021 -0400

Specify timeout for _get_token REST API request in cert-mon
    
    In a DC env, when management network connectivity is disrupted
    for some time, cert-mon's audit greenthreads hang. This
    happens because of missing timeouts in REST API requests which
    cause the urlopen to remain hanging forever. As a result, the
    dc-cert sync status for the subcloud  remains unknown
    indefinitely and the subclouds remain out-of-sync.
    
    This commit adds a timeout to the _get_token method to avoid
    this issue.
    
    Change-Id: Idd41cfca6b28287de8328b1ace856cc391778cac
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>
    Closes-Bug: 1927735

commit 62c9e55efd1bc1f713924d3d08d7ba42e3a94af0
Author: Charles Short <charles.short@windriver.com>
Date:   Tue Mar 23 11:31:21 2021 -0400

Switch type(certificate) and type(obj)
    
    Switch type(certificate) and type(obj) to instance since its accords
    with python style.
    
    Story: 2006796
    Task: 42442
    
    Test:
    - Ran unit tests.
    - Built iso and ran some smoke tests.
    - Tested on general install on controller+storage+workers system,
      bootstrapped ok.
    
    Signed-off-by: Charles Short <charles.short@windriver.com>
    Change-Id: I03d444f3ae67ab802881b8ede90f1e1c7e9694cc

commit ddc6b69dfc82487fc22f3c2eb1795d9bfdb2e155
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Tue Apr 20 12:16:17 2021 +0300

Allow configurable ceph storage backend network
    
    Ceph components are configured by puppet. Hieradata for it is generated
    by sysinv. Currently the configuration is generated using IPs from the
    management network.
    
    Allow ceph components to be configurable on cluster-host network.
    Create a 'network' parameter for strorage-backend-add command. The
    parameter is optional, the rest controller will default the value to
    management network.
    
    Example usage:
    system storage-backend-add --network cluster-host ceph --confirmed
    system storage-backend-add --network mgmt ceph --confirmed
    system storage-backend-add ceph --confirmed
    
    Updated unit tests.
    Added unit test for component that generates the ceph monitor ips for
    hieradata.
    
    Tests:
    1) AIO-SX: add ceph using --network cluster-host, unlock, ip is from
    cluster-host network.
    2) STANDARD 2+2: full deploy, no --network parameter, everything as
    before.
    3) storage-backend-modify and storage-backend-add allows only 'mgmt' and
    'cluster-host' network parameter.
    4) storage-backend-modify does modify the network before the first
    unlock; is rejected for network parameter after first unlock.
    
    Story: 2008843
    Task: 42350
    Task: 42351
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I681ccb3302b8f233424bc291e08675a4dc2b10f7

commit 6f15428e7565736ab600f4f0f01e1445987cc3a6
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Thu May 6 10:12:44 2021 -0500

Add logging to secured-etcd upgrade script
    
    Add logs to 70-active-secured-etcd-after-upgrade.sh. Redirect ansible logs to a named file under
    /root.
    
    Closes-Bug: 1927511
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>
    Change-Id: I86fd964cad02ebb8eeb8645fc9dea71a3251aef7

commit 37d348ae6db2bf8522737287d805a100ed8245e5
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Thu May 6 10:02:50 2021 -0500

Limit etcd migration to swact actions
    
    Currently the etcd migration will take place whenever
    upgrade_swact_migration.py is called. Typically this is called during a
    swact as part of etcd start. However the script can also be called
    during etcd restart, which occurs as part of activating secured etcd.
    
    The solution is to limit the etcd migration to controller-0. This will
    ensure the action will only be taken during the swact to controller-0.
    
    Closes-Bug: 1927508
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>
    Change-Id: I893eeaa7ddb0b600baa051498a30ed737c688151

commit 3e9982bab088df428e786d5fdf2515dc21f189fa
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Wed May 5 17:21:35 2021 -0400

AIO-SX reboots after change OAM ip address
    
    On HW tests, it was detected that openstack-endpoints restart was
    happening at the same as the service-manager restart, this creating
    a conflict that preventing SM services to reach enabled-active.
    This was provoking the reboot.
    
    The correction add a class to execute openstack-endpoint runtime
    restart on the post stage on puppet, avoiding to be run as SM is
    restarted.
    
    Tested on AIO-SX, by monitoring manifest apply and validating that
    no reboot happens
    
    Closes-Bug: 1927275
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/789946
    
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
    Change-Id: I9f547fbcc73ba5fea077c764a4a9282a02ac71c6

commit c43772ca5c8519254d2be4b16c3326a90091f806
Author: Adriano Oliveira <adriano.oliveira@windriver.com>
Date:   Wed May 5 00:36:59 2021 -0400

Memory adjustment to consider memory available
    
    In order to handle patch applications that might change the reserved
    constants defined for the platform memory, there is an auto adjustment
    routine that compensates the potential extra memory by decreasing the
    number of hugepages configured.
    However, the routine should take into account the memory available
    and only decrease in case a certain threshold is reached.
    For the sake of safety on memory allocation, it was considered 50%
    of total memory to be enough safe guard to avoid reaching the total
    memory limit as well avoiding hugepages to be removed (mainly 1G ones).
    Also, vswitch hugepages allocation have been added to the calculation
    of memory allocation.
    
    Testing:
    1. Patch that increases memory allocation constant
    2. AIO-DX upgrade from stx 4.0 to 5.0 (in which the platform memory
    reserved increased from 7000 MB to 8000MB)
    
    Closes-Bug: 1927172
    
    Signed-off-by: Adriano Oliveira <adriano.oliveira@windriver.com>
    Change-Id: I0bb29de83268709cdae07834f537a28010d884bc

commit 6ac2c0e3d0e74dd8d139dba06afe96b6f438955e
Author: Daniel Safta <daniel.safta@windriver.com>
Date:   Tue Apr 27 10:36:44 2021 +0000

check app progress before swact
    
    This update adds a new check which will
    reject a swact while an application
    apply is in progress.
    
    Closes-Bug: 1926405
    Change-Id: I3c683776f3ecaf9c78d111b5b1108e9582497aaa
    Signed-off-by: Daniel Safta <daniel.safta@windriver.com>

commit c9df464849cfe9f6ee769d1682f0968a7aa7af6f
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Tue May 4 09:15:28 2021 -0400

Fix missing sriov attributes in N3000 pci info
    
    The fpga-agent did not include all of the sriov attributes in the
    reporting of N3000 pci device info to the conductor resulting in
    fields being overwritten.
    This update ensures that the attributes sriov_vf_driver and
    sriov_vf_pdevice_id are copied.
    
    Closes-Bug: 1925513
    
    Change-Id: I891166b1a0966b3bd9bfe253f954ff22f7ea8677
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit 862c1746abb8d8901d2acb4bcb43569210e55f3e
Author: Bin Qian <bin.qian@windriver.com>
Date:   Fri Apr 30 12:14:31 2021 -0400

Remove subcloud admin endpoint data migration
    
    Admin endpoint cert upgrade will be handeled by manifest, so data
    migration is no longer needed in subcloud.
    On N+1 side, admin endpoint cert secret (key/cert) will be pulled
    directly from k8s resource for manifest to generate endpoint cert
    on first host unlock.
    
    Only need to update SAN of admin endpoint cert.
    
    Closes-Bug: 1923510
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/786666
    Change-Id: I4312abd6c767d6ba54c13ce1e90f2e25df9ed216
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit 67c214a4e4ac234ceab33d1514f550eea41ae928
Author: John Kung <john.kung@windriver.com>
Date:   Thu Apr 29 07:55:55 2021 -0500

Clear host config_target on upgrade migration
    
    The host config_target is taken as a snapshot on the upgrade-start.
    This can lead to config out of date condition if the N controller
    issues subsequent config requests.  This is more likely in duplex
    controllers, however, for consistency the config_target is reset
    to track when the N+1 controller is active.
    
    Clear the config target of all hosts on upgrade migration.  The N+1
    controller will resume tracking the config by generating config_target
    when configuration is issued.
    
    Condition for upgrade activation-complete is updated to account
    for potential None config_target.
    
    Verified upgrade complete on AIO-SX and Duplex systems.
    
    Change-Id: I4dd44e6548a45d32ab6a0b6735a04d624da7caad
    Closes-Bug: 1926512
    Signed-off-by: John Kung <john.kung@windriver.com>

commit 46421279912daa62162e493c3455ae5b9b75cf69
Author: Angie Wang <angie.wang@windriver.com>
Date:   Sun May 2 21:41:04 2021 +0000

Revert "Check for connectivity to the tiller postgres backend."
    
    This reverts commit 5edd3bdbe588e2c2e7a58cb839f030305613c30f.
    
    Reason for revert: It causes ansible bootstrap failed.
    
    Change-Id: I5f3640db8576eab1500b54f111a06a981c98b599

commit 5edd3bdbe588e2c2e7a58cb839f030305613c30f
Author: Andrei Grosu <andrei.grosu@windriver.com>
Date:   Tue Apr 13 08:52:40 2021 +0000

Check for connectivity to the tiller postgres backend.
    
    The existing code checks that the pod(s) are 'Running' but that
    might not be enough as the service inside the pod (postgres)
    might not be able to accept connections.
    
    Closes-Bug: 1923587
    Signed-off-by: Andrei Grosu <andrei.grosu@windriver.com>
    Change-Id: Ide49e4a38b805d5fc41d9f06d94393c69c6ed9d2

commit d225a0217bf3dae22c12e1266c4beab78f1032dd
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Sun Mar 28 14:02:32 2021 -0500

Run SRIOV manifest during controller-1 migration
    
    To address another issue we blocked manifests from applying on a
    controller with a different version from the active controller. This
    blocks the sriov manifest from running on controller-1 after it has been
    upgraded to N+1. To address this the data migration will check if
    controller-1 has sriov interfaces configured and apply the manifest if
    necessary.
    
    The previous change:
    https://review.opendev.org/c/starlingx/config/+/766637
    
    Closes-Bug: 1921788
    Change-Id: I1ec790fced00a4e973e546a260d28f52ef06fb3a
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit e53cf0cc2cb4490a93405789faf8fe497d8894ba
Author: Melissa Wang <melissa.wang@windriver.com>
Date:   Fri Apr 23 16:47:43 2021 -0400

SX-to-DX: Check host administrative state
    
    This update adds semantic checks to ensure that the controller is
    locked before starting simplex-to-duplex migration.
    
    Story: 2008587
    Task: 42369
    
    Change-Id: I1ebf3bb531073344b4d17a00cd1ee480051f3897
    Signed-off-by: Melissa Wang <melissa.wang@windriver.com>

commit b9130523851ef8f8f2f300d4b583c76c012c51d4
Author: Yuxing Jiang <yuxing.jiang@windriver.com>
Date:   Tue Apr 20 10:37:47 2021 -0400

Add RPCAPI calls to apply LDAP client and DNS runtime manifest
    
    This commit adds RPCAPI calls to invoke runtime manifest of LDAP
    client and DNS after adding system controller network and system
    controller OAM network in a subcloud after the initial network
    configuration completed.
    
    Tested:
    Delete the system controller network and system controller OAM network
    in a subcloud and add them with different values. The related
    configuration and Hieradata are updated.
    
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/787750
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/785977
    Story: 2008774
    Task: 42307
    
    Change-Id: I4ddf88efa16299c9415f4bf156f2be57e8cc826e
    Signed-off-by: Yuxing Jiang <yuxing.jiang@windriver.com>

commit a9743b3e05c61a0b738af4770eeff5727bc51622
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Tue Apr 27 07:47:21 2021 -0400

Change in oam ip fails to update the system endpoints, in AIO-SX
    
    With story 2008531 the OAM-IP update is not changing the IP addresses
    for the openstack endpoints, although the REST APIs are already
    responding with the new IP
    
    The correction adds the keystone's endpoints class to the puppet
    manifest class list in order to correct the situation
    
    Tested in AIO-SX setup, as described on the launchpad
    
    Closes-Bug: 1926288
    
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
    Change-Id: I57c4a3392879be3c94de89ee67af3c5a50095b54

commit a9fc0be4e500a8739db04b7555c1be50a877b21f
Author: Yuxing Jiang <yuxing.jiang@windriver.com>
Date:   Fri Apr 16 08:19:25 2021 -0400

Support configuration of system-controller address pools post-install
    
    During migrating a subcloud to a new central cloud, the system
    controller's network configuration needs to be re-configured in the
    subcloud. This commit removes the check of is_initial_config_complete
    flag during modification/deletion of the system-controller-subnet and
    system-controller-oam-subnet address pools. After this commit, the two
    types of address pools can be re-configured after the initial
    bootstrap.
    
    Test:
    Delete the system-controller-subnet and
    system-controller-oam-subnet in a subcloud after bootstrap.
    
    Change-Id: Ied68bbfd83a0cc1c3bb0fb31ee55f924353fb4b5
    Story: 2008774
    Task: 42291
    Signed-off-by: Yuxing Jiang <yuxing.jiang@windriver.com>

commit 754c6861ca33a581f3a32a7a353ec9672dd0c8b9
Author: Charles Short <charles.short@windriver.com>
Date:   Mon Apr 26 11:06:45 2021 -0400

Fix zuul errors due to changes in dependencies
    
    Pin hacking to < 4.0.1 to fix zuul gate issues.
    
    Test:
    Ran tox -e pep8 command to validate the flake8 job and result.
    
    Related-Bug: 1926172
    
    Signed-off-by: Charles Short <charles.short@windriver.com>
    Change-Id: I74fec1c352b8947b58498e32b7554e54c77aaeaa

commit 38ae08893bcb76142da1c1b1133a29f8a767bf16
Author: Sabeel Ansari <Sabeel.Ansari@windriver.com>
Date:   Thu Apr 8 14:13:28 2021 -0400

Verify cert chain after adminep update
    
    This commit checks the validity of the certificate chain
    after admin-ep certificate is renewed on subclouds. If
    the check fails, the update_admin_ep_cert deletes the secret
    and subsequent audits will recreate. This check fixes the
    scenario where old ICA was used when updating &
    renewing rootCA.
    
    Closes-bug: #1923071
    
    Signed-off-by: Sabeel Ansari <Sabeel.Ansari@windriver.com>
    Change-Id: I112dd52e220dcc8bdb72c2c772ede72fbb786c7b

commit 3c90e3b949c5533feee25efedda114376f43a0ff
Author: jgauld <james.gauld@windriver.com>
Date:   Mon Apr 19 17:21:09 2021 +0000

Add lifecycle hook to allow to_app application-update semantic checking
    
    This adds new lifecycle update operation so that a semantic check lifecycle
    step is run against the "to" application during application-update.
    
    The application specific lifecycle code requires a custom semantic check
    using the new hook (e.g., similar to the following):
    
    if hook_info.lifecycle_type == constants.APP_LIFECYCLE_TYPE_SEMANTIC_CHECK:
        if hook_info.operation == constants.APP_UPDATE_OP:
            if hook_info[LifecycleConstants.EXTRA].get(LifecycleConstants.TO_APP, False):
                return self.update_check(app_op._dbapi, app)
    
    Testing:
    * VirtualBox: AIO-SX
    * Created example application changes for platform-integ-apps using
      semantic check shown above, and prototype update_check() routine.
    * Tested the following for both pass and fail semantic check cases:
      system application-update platform-integ-apps-1.0-29.tgz
    * Tested that apps on the application-update are okay on an upgrade,
      as invoked in upgrade-scripts/65-k8s-app-upgrade.sh.
      i.e., system upgrade-activate
    
    Semantic check passes, update proceeds to 1.0-29 "to" version,
    the 1.0-27 "from" version is cleaned up.
    Semantic check fails, application recovery proceeds to "from" version,
    the 1.0-29 "to" version is cleaned up.
    
    Example when semantic check fails,
    system application-list
    | platform-integ-apps      | 1.0-27  | platform-integration-manifest     | manifest.yaml \
    | applied  | Application update from version 1.0-27 to version 1.0-29 aborted. \
                 Application recover to version 1.0-27 completed. Please check logs for details
    
    sysinv.log
    sysinv 2021-04-19 09:32:53.422 3810926 INFO sysinv.conductor.kube_app [-] \
    Starting recover Application platform-integ-apps from version: 1.0-29 to version: 1.0-27
    
    Story: 2008829
    Task: 42311
    
    Signed-off-by: jgauld <james.gauld@windriver.com>
    Change-Id: If0787e3e3806bdf5dc175fde64ac63e1f38fd852

commit 251afbcdfb5a3f050921275bca97cb9de9553572
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Mon Apr 12 14:51:50 2021 -0400

Handle "updating" and "recovering" app update states during
    upgrade activation
    
    During simplex subcloud upgrade, if the initial upgrade
    activation request fails, dcmanager orchestrator re-tries the
    activation by sending another request (current logic is to
    retry up to 10 times). On the second upgrade activation
    request, sysinv skips the oidc-auth-apps reapply as the
    app is in 'updating' state and proceeds to completing the
    remaining steps of the activation sequence. As a result,
    upgrade activation completes while platform apps were either
    in the incorrect version or incorrect state.
    
    This commit resolves the issue by skipping only for "uploaded"
    and "applied" states during upgrade activation.
    
    Change-Id: I4b0aa4897e83a47ccdcf58c37232301f3668de32
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>
    Story: 2008055
    Task: 42246

commit b1c8d95f2cf8965eccf3306d372a8c820edd6634
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Tue Mar 30 14:12:23 2021 +0300

Fix resize of drbd filesystems
    
    This commit adds the following modifications:
    
    - The drbd filesystems sizes are now calculated using 'dumpe2fs'
    utilitary because it gives better results for larger filesystems.
    
    - Added extra checks before and after executing 'resize2fs'.
    Before running 'resize2fs' check if the drbd device is resized using
    "/sys/block/{drbd_device}/size" and sector size.
    After running 'resize2fs', check if the filesystem is resized
    using 'dumpe2fs'.
    
    - The drbd filesystems were resized only if they were
    in SyncSource or PausedSyncS states. There are cases
    when drbd-overview showed “Connected” instead of this sync states
    and the filesystems would never be resized. Now
    the drbd filesystems will also be resized when they
    are in "Connected" state.
    
    Closes-Bug: 1921896
    Change-Id: I548300deb8916ce863bcd4bb70969cb9d51c9c2a
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit dd71459015c838dbd68ed90c5eb3ee16b5827004
Author: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>
Date:   Thu Apr 8 13:02:33 2021 -0300

Adding support for Ceph storage during Simplex to Duplex migration
    
    This change will allow users to migrate a simplex system with Ceph
    enabled to duplex via system modify command. During unlock it will
    get the cephfs filesystem pool names and generate the necessary hieradata.
    This file along with other puppet changes on stx-puppet will be used to
    perform necessary changes on the Ceph cluster to support a duplex
    configuration.
    
    Story: 2008587
    Task: 42079
    
    Signed-off-by: Pedro Linhares <PedroHenriqueLinhares.Silva@windriver.com>
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/783727
    Change-Id: Idaa7ebbf3a9c55658187e1d5ca6c357349659d43

commit ca6a15adffc06cd4979015dbe596458fdf309f9b
Author: Marcus Secato <Marcus.ViniciusCarvalhoSecato@windriver.com>
Date:   Mon Apr 19 20:32:54 2021 +0000

Revert "Adjust lock acquiring logic"
    
    This reverts commit cbb9121a289603ec003dec098b8fa5918ca98300.
    
    Reason for revert: reverting as this caused issues in DC upgrade
    
    Change-Id: Ie665b9c4e4d1280d7c8a0821cb7995c9374ce02f

commit a24cd707a5cafaa2d49b25d1d5285c5c8582eb5d
Author: Angie Wang <angie.wang@windriver.com>
Date:   Fri Apr 16 12:46:52 2021 -0400

AIO-DX: Controller-1 fails to be unlocked after downgrade
    
    During stx4.0 to stx5.0 upgrade, controller-1 fails to be unlocked
    after downgrade due to the incorrect disk partition and physical
    volume information stored in stx4.0 DB that causes the puppet
    manifest apply failed during unlock.
    
    This is because cgts-vg size is decreased in stx5.0 and after
    controller-1 is upgraded to stx5.0, additional partition and pv
    are created at stx5.0 side to match the size in stx4.0. However,
    controller-0 is still running stx4.0 DB and it gets updated with
    the new created partition and pv info sent from controller-1 sysinv
    agent audit.
    
    This commit updates to ignore the disk partition and physical volume
    information sent back from a different version during upgrade.
    
    Tested:
    - AIO-DX upgrade from stx4.0 to stx5.0, verified upgrade is completed
    - controller-1 downgrade after it is upgraded and unlocked, verified
      upgrade abort is completed
    
    Change-Id: I5d7858e4b29d096437a5ddf94cd78c74fadfacad
    Closes-Bug: 1924786
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 0271808d75fc12909904e27103a1fd3455df3f18
Author: Charles Short <charles.short@windriver.com>
Date:   Wed Mar 17 11:10:52 2021 -0400

cgts: Add missing dependencies
    
    Prettytable and six are required dependencies that were not listed
    in the requirements.txt.
    
    Story: 2006796
    Task:  42071
    
    tests:
    - Ran build test.
    - Ran basic smoke test with resulting RPMS and iso.
    
    Signed-off-by: Charles Short <charles.short@windriver.com>
    Change-Id: Ib047e61487ed26792930c80db167920afc21c66c

commit bc34618d83a5e38d765cbde0a91ebf384b8857b8
Author: Rafael Jordão Jardim <RafaelJordao.Jardim@windriver.com>
Date:   Tue Apr 6 07:44:06 2021 -0400

Python 2 to Python 3 compatibility
    
    The changes of https://review.opendev.org/c/starlingx/config/+/782575
    was used to test the cgts-client, so it is need to be merged first.
    Removing python-neutronclient because this dependency is unnecessary,
    it was removed by copying a few very small utility functions from
    python-neutronclient into the cgtsclient.
    
    Development: When I was trying to find things to modify I followed the
    approach of build the client, get the tar file, I set up 2 environments
    one based on python2 and another python3, I installed the tar client
    in both environments and i exported the env vars that the client expect
    to get to request the controller, and doing that I could switch between
    the two python and indentifying what I should modify.
    
    Test: After all the modification I built an ISO and I installed that
    to run some commands and check if my changes got any side effects. After
    that followed the procedure to update the remote CLI docker image and
    insert the updated client there and I test this new image in the
    remote CLI.
    
    Story: 2007106
    Task: 42268
    
    Depends-On: I5086832605752bdb00a40a24596494c8fd987692
    Signed-off-by: Rafael Jardim <rafaeljordao.jardim@windriver.com>
    Change-Id: Ibf919260693f1cbe99993d1de01ecf785d604839

commit 8cc522bed57aac4699fed1b2cdf7cf4b213a4dc7
Author: Adriano Oliveira <adriano.oliveira@windriver.com>
Date:   Tue Apr 13 19:21:54 2021 -0400

sysinv-api script to return ERROR
    
    Remove NOT_RUNNING return in case sysinv-api pid is still active but
    fails to ping. This scenario can happen in case sysinv-api is manually
    killed and SM triggers a start.
    Also, on the new routine to check if sysinv-api is properly replying,
    return ERROR instead of NOT_RUNNING for SM consistency.
    
    Testing:
    Tested double sysinv-api kill withing 90 seconds as per SM
    configuration. Upon first sysinv-api kill it should be restarted,
    upon second sysinv-api kill, if done within 90 seconds, a swact is
    triggered. Also verified that sysinv-api request routine was engaged.
    Also tested AIO-DX boostrap, manual swact via host-swact and patch
    application. After boostrap and patch application tested double
    sysinv-api kill and manual swact.
    
    Other fix related to this issue was addressed by this change:
    https://review.opendev.org/c/starlingx/stx-puppet/+/783980
    
    Closes-Bug: 1893669
    Signed-off-by: Adriano Oliveira <adriano.oliveira@windriver.com>
    Change-Id: I1b1ab0560237f602dadf074331f6a165d12330c7

commit 7ce3d16eeaa4c186e70aad56a1c92a8279dd0aae
Author: Bin Qian <bin.qian@windriver.com>
Date:   Thu Apr 8 11:08:27 2021 -0400

Add sysinv-reset-n3000-fpgas cmd
    
    When AIO runs single manifest, reset N3000 FPGA needs to complete
    without docker local registry and other SM managed services.
    
    This adds sysinv-reset-n3000-fpgas cmd for puppet to reset
    N3000 FPGAS at host start-up.
    The sysinv-reset-n3000-fpgas cmd separates the function of
    reseting n3000 fpgas from sysinv-fpgas-agent as
    sysinv-fpgas-agent has dependency to rabbit, which is not
    available until manifest completes.
    
    Change-Id: Ic3c4b2a00515d194793257729362f71e2951286c
    Partial-Bug: 1918139
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit 6acd2e3564d3d708e496c1a7e78b064419f1fdbf
Author: Bin Qian <bin.qian@windriver.com>
Date:   Tue Feb 23 12:59:28 2021 -0500

Single puppet manifest for AIO controllers
    
    Create a single puppet manifest for AIO controllers.
    This change includes:
    1. remove workerconfig from an AIO controller deployment
    2. running puppet based on subfunctions of the nodes
    
    Depends-on: https://review.opendev.org/c/starlingx/stx-puppet/+/780600
    Partial-Bug: 1918139
    Signed-off-by: Bin Qian <bin.qian@windriver.com>
    Change-Id: Ie3693219e3c19460ac5b617cc216cbc809ec2403

commit ad8567f06485a10edf3857fbc87ae7d3058a1dfc
Author: Gustavo Santos <gustavofaganello.santos@windriver.com>
Date:   Tue Apr 13 16:09:21 2021 -0300

Restart tiller on openstack pending install check
    
    This is another attempt at fixing the same bug as the merged review
    https://review.opendev.org/c/starlingx/config/+/783472 had tried, since
    there were reports indicating that the bug would still occur on certain
    setups.
    
    This patch explicitly forces a tiller restart when catching the first
    HelmTillerFailure exception caused by the broken pipe error, instead of
    only trying to rerun the 'helm list' command, which was believed to be
    a reliable workaround to the problem, but didn't solve it in every
    possible scenario.
    
    Closes-Bug: #1917308
    Signed-off-by: Gustavo Santos <gustavofaganello.santos@windriver.com>
    Change-Id: I38667609173ca5c6fed028f75742ae99efedf149

commit 5a9582ac5ca84db7dfaaa2474f05d373224f3d63
Author: jgauld <james.gauld@windriver.com>
Date:   Tue Mar 16 06:40:49 2021 +0000

Platform support for application upgrades
    
    This enhances the application management component of sysinv to
    support the upgrade of both platform and non-platform applications.
    
    This supports new application metadata:
      upgrades:
        update_failure_no_rollback: <true/false/yes/no>
        from_versions:
          - <version.1>
          - <version.2>
      supported_k8s_version:
        minimum: <version>
        maximum: <version>
      supported_releases:
        <release>:
          - <patch.1>
          - <patch.2> ...
    
    The processing of patch_dependencies is deprecated in favour of
    supported_releases.
    
    This enables applications to specify:
    * required patches such as a platform patch_level which
      the application can run on for a specfic release
    * supported K8s Versions which the application can run on
    * app releases that this version can upgrade from
    * application update failures do not rollback
    
    Application upload and updates check against this metadata.
    
    System health pre-upgrade checks make sure that:
    * all applications are in a valid state for the upgrade
      (i.e. uploaded or applied)
    * active controller is controller-0
    
    Testing:
    - Configs: AIO-SX, AIO-DX, Standard
    - new metadata format validation with valid/invalid input
    - upversion platform-integ-apps tarball with new metadata,
      and do an application-update
      e.g.,
      system application-update platform-integ-apps-1.0-22.tgz
      * applies if the new metadata not provided
      * blocks update if from_versions, supported_k8s_releases,
        or supported_releases criteria not met
    - normal operation of system application-remove/application-delete
    - system application-upload should apply/block based on metadata
    - trigger 'no_rollback' handling by forcing an application
      rollback using a small manifest.yaml timeout and too many
      replicas that cannot be deployed
    - AIO-DX after load-import, swact to controller-1, verify
      that upgrade is blocked
    
    - perform upgrade of current N to N+1 release
      * upversion platform-integ-apps in N+1 release with new metadata
      * import new load containing new metadata
      * system health-query-upgrade; should indicate fail and failure
        reasons based on new criteria
      * follow 'system upgrade' procedure; should block upgrade based
        on new criteria
    
    Story: 2008055
    Task: 42179
    
    Signed-off-by: jgauld <james.gauld@windriver.com>
    Change-Id: I93a6f2ada7ce52414190948cbc458f6295c50603

commit d1a13c9b3f3458e9744cd5eed675157befa45c77
Author: Gerry Kopec <gerry.kopec@windriver.com>
Date:   Thu Apr 8 19:48:13 2021 -0400

Increase platform memory reserve on AIO hosts
    
    Increase memory reserve for AIO by 1000 MiB.  Total memory reserve for
    AIO in MiB will be 5000 + 2000 + 1000 * number of numa nodes.  This will
    give more headroom for AIO-SX subclouds with a single numa node.
    
    Closes-Bug: 1923399
    Change-Id: I433548792504f783a44a80d5099d93c5bee15ed7
    Signed-off-by: Gerry Kopec <gerry.kopec@windriver.com>

commit 5b73ac5813e889ddcd0feb65f96eff0ac789e69a
Author: albailey <Al.Bailey@windriver.com>
Date:   Mon Apr 12 08:49:08 2021 -0500

Eliminate sdist step from sysinv zuul
    
    Zuul fails on setting up pbr randomly in the sdist step.
    It is unclear if the reason is that something is source
    packaged during tox which conflicts with another zuul job
    of a different interpreter, so simply disabling source
    dist generation.
    
    Most openstack projects are configured like this.
    
    Partial-Bug: 1922590
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: Ieb4cf4113c07a0e166c001f124d920ba7118afb3

commit cbb9121a289603ec003dec098b8fa5918ca98300
Author: Marcus Secato <marcus.viniciuscarvalhosecato@windriver.com>
Date:   Thu Apr 1 16:30:13 2021 -0400

Adjust lock acquiring logic
    
    When locking the file descriptor skip_udev_partition_probe was not
    handling errors thrown by fcntl.flock which was leading controller-0
    to degraded state after unlock. This change aims to strengthen that
    logic by handling the error properly, retrying the lock operation and
    improving logs.
    
    Closes-Bug: 1922256
    
    Signed-off-by: Marcus Secato <marcus.viniciuscarvalhosecato@windriver.com>
    Change-Id: I000367668744a4e92e20ff9d3f1f8cd717883a46

commit 0e1bf356139abce842e61ede936d3f9a8ddc122e
Author: Andrei Grosu <andrei.grosu@windriver.com>
Date:   Tue Feb 2 20:55:27 2021 +0200

Send application lifecycle notifications for backup and restore.
    
    Implements backup, etcd-backup and restore hooks.
    Operations can fail so there is a second parameter,
    'success' that is used to notify applications if an operation
    failed.
    Restore hooks are in place but not used by ansible playbooks.
    Separate semantic check action.
    Revert backup operations by keeping a list of all pre- operations
    with their associated 'revert' action and also, keeping in mind
    the logical order of the pre-backup and pre-etcd backup operations.
    
    Story: 2007960
    Task: 40769
    
    Signed-off-by: Andrei Grosu <andrei.grosu@windriver.com>
    Change-Id: I0ebab45f4846cbcd25fecac6bf99195d9047eb8a

commit d7909c05221ecf658fd92d63f0dfcfbe2a170e3b
Author: Cole Walker <cole.walker@windriver.com>
Date:   Tue Mar 30 14:09:25 2021 -0400

Function to create gnp after oam interface-network add
    
    Adds the initialize_oam_config function to manager.py which allows
    the correct flag to be set when creating a new oam interface.
    This flag is used to trigger the platform::firewall::runtime
    puppet manifest that sets up the hostendpoints and globalnetworkpolicy
    in kubernetes.
    
    Tested on:
    AIO-SX
    AIO-DX
    Standard
    
    Closes-Bug: 1911213
    
    Signed-off-by: Cole Walker <cole.walker@windriver.com>
    Change-Id: I34331ae6ad54ee4ce564467616c84931d2ae245a

commit 1e97fb2398f1ecda43bfd4efb182647b4d719e90
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Fri Apr 9 18:27:40 2021 +0300

Add code to handle Kubernetes validating webhooks
    
    This commit will add new functions to list and delete
    'ValidatingWebhookConfiguration' objects using the Kubernetes api.
    
    Partial-Bug: 1923185
    Change-Id: I648e940f8104307e111213afd511f8fca19e39ab
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit f84ae1ecefb1e9e8f3f05216e80b6e0cdc36ef0c
Author: Adriano Oliveira <adriano.oliveira@windriver.com>
Date:   Fri Mar 26 18:38:30 2021 -0400

sysinv-api OCF script API request
    
    In order to avoid other services requesting to sysinv-api before it is
    ready to handle request'; and to avoid individual services to
    implement their own retry logic, sysinv-api OCF script was changed
    to verify if sysinv-api is providing service before returning it is
    ready.
    
    Testing:
    1. Bootstrap of AIO-DX
    2. host-swact on AIO-DX
    3. Upgrade path on AIO-DX, 20.06 to 21.05 (including host-swact)
    4. subcloud bootstrap
    5. Double sysinv-api kill causing swact
    
    In all tests, the logs are verified to confirm the retry logic is
    engaged and sysinv-api is properly started, also cert-mon.
    
    Closes-Bug: 1913455
    Signed-off-by: Adriano Oliveira <adriano.oliveira@windriver.com>
    Change-Id: Ia17e9f7a15602c0cc52cb01896fac42ce4fcdcb9

commit 4314755481607a4aa7ea4ca25ccc5c22c2730956
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Tue Mar 30 13:40:43 2021 -0400

In AIO-SX, allow SRIOV VF parameters modify without host lock
    
    During runtime, user will be able to: modify an interface to be an
    SRIOV PF interface while specifying VF parameter --vf-driver and add
    new SRIOV VF interfaces while specifying VF parameters --vf-driver
    and --max-tx-rate
    
    Will not be permitted on an unlocked AIO-SX host to modify the VF
    driver and/or VF rate limit of an existing SR-IOV interface
    
    Story: 2008531
    Task: 42203
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/784761
    Change-Id: Icdb75df45ab2d6b31ecd60ab6c18e8042a7ee99b
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>

commit 38c0ae47c3302ce956c65478281fccd1d8a50ebb
Author: Robert Church <robert.church@windriver.com>
Date:   Thu Apr 8 00:59:50 2021 -0400

Deactivate app plugins when update fails prior to recovery
    
    When an application update fails and the previous version is recovered,
    the plugin infrastructure for the new app version is not fully cleaned
    up during recovery. If another application update is attempted, plugin
    activation/deactivation becomes out of sync and results in a KeyError
    when attempting to enable the new app plugins for update.
    
    This update will:
     - ensure that the failed app plugins are disabled prior to cleanup
       during application recovery
     - catch the KeyError if a similar situation occurs, log a message, and
       continue to cleanup the working set for the plugins
    
    Change-Id: If58942cd9342802bfd2055152c2f2d6289054084
    Closes-Bug: #1923004
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit 1c69d99d4ffb7d26296f0f3ef66830b4e64ee430
Author: Gustavo Santos <gustavofaganello.santos@windriver.com>
Date:   Fri Mar 26 18:11:47 2021 -0300

Restart tiller on openstack pending install check
    
    When the armada-api pod, which runs helmv2-cli, goes up, it connects
    tiller to a postgres instance running on the active controller. It does
    that using the active controller's floating IP address. On setups with
    more than one controller, this creates an issue where that connection
    is no longer valid after performing a host swact, since it still points
    to the old controller.
    
    After the swact, the first helmv2-cli command will fail with a broken
    pipe error. One of the steps in the Openstack installation involves
    checking the system for pending helm installations, which will run a
    helmv2-cli command, causing the application operation to fail if ran
    right after a host swact.
    
    This patch is a workaround to that problem. It forces a tiller restart
    by catching the first HelmTillerFailure exception caused by the broken
    pipe error and retrying the operation, which then will reestablish the
    connection between tiller and the correct instance of postgres.
    
    Closes-Bug: #1917308
    Change-Id: Ia09f9d2844611471314d5d3af70e9bbb0938437c
    Signed-off-by: Gustavo Santos <gustavofaganello.santos@windriver.com>

commit 3ad0dc17e7e5cd2de62f736db100a3fe6a38d928
Author: Yuxing Jiang <yuxing.jiang@windriver.com>
Date:   Fri Apr 2 14:05:57 2021 -0400

Accept additional attributes in addrpool-add CLI
    
    This commit adds additional attributes to enhance the current
    addrpool-add CLI. With this change, the addrpool-add CLI can create
    the addrpool with additional values including: 'floating_address',
    'controller0_address', 'controller1_address' and 'gateway_address'.
    
    Test:
    Fresh install on an AIOSX, add addrpool with or without these
    attributes successfully by system addrpool command.
    
    Story: 2008774
    Task: 42206
    Signed-off-by: Yuxing Jiang <yuxing.jiang@windriver.com>
    Change-Id: I80955180e90accdcb6ce0c65dded2986f4a9c8ec

commit 1128b30ce6abd878c1a3524fc584ddc42ba4cec4
Author: Isac Souza <IsacSacchi.Souza@windriver.com>
Date:   Thu Apr 1 12:20:35 2021 -0300

New method to calculate replicas for armada apps
    
    Introduces _num_replicas_for_platform_app as a replacement
    for _num_provisioned_controllers which is used to set the
    number of replicas used by an platform armada application.
    
    The new method will use the same unrderlying logic but will
    never return a value less than 1. This will prevent having
    the replicas set to 0 when there are no provisioned
    controllers.
    
    Tested with unit tests and by checking the replica count
    after a host lock/unlock cycle.
    
    Partial-Bug: 1922278
    Signed-off-by: Isac Souza <IsacSacchi.Souza@windriver.com>
    Change-Id: If322ff5d02996c9b853bc350244899c5e22431a2

commit ead91ced1d55c986ec0f828124bfe7b7beb5ad59
Author: Melissa Wang <melissa.wang@windriver.com>
Date:   Wed Mar 31 12:13:43 2021 -0400

AIO-SX to DX: Check cluster-host interface config
    
    This update add a semantic check to ensure that the cluster-host
    interface exists and is not configured on loopback before allowing
    the AIO-SX to DX migration to begin.
    
    Story: 2008587
    Task: 42188
    
    Change-Id: Ic86ef966536ec41e495fb47e8e83682c1d03402b
    Signed-off-by: Melissa Wang <melissa.wang@windriver.com>

commit bb6e8db56979f4ff741d6d7fe3304d1f9c45a99c
Author: Robert Church <robert.church@windriver.com>
Date:   Tue Oct 27 17:12:56 2020 -0400

Add sysinv support for kubernetes to ignore isolcpus
    
    Based on the kube-ignore-isol-cpus host label which will generate
    /etc/kubernetes/ignore_isolcpus, also adjust the k8s_all_reserved_cpuset
    passed to puppet if the label exists.
    
    Basically if the label is set, we want to pass the isolated CPUs to
    kubernetes as though they were regular application CPUs.
    
    Story: 2008760
    Task: 42167
    Change-Id: I065ca40fbd3395bf86a02a7822c1f9d46ee3fe06
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit d7c5a54ab94bce6635b83d91a807d28f97836a81
Author: Charles Short <charles.short@windriver.com>
Date:   Wed Mar 17 13:26:38 2021 -0400

sysinv: Cleanup requirements
    
    This commit does several things at once:
    
    - Drop argparse, its no longer required.
    - Add missing oslo.log, its been pulled in by other requirements
      but explicitly include it.
    - Use rfc3986 instead of Django, the requirement seem to be
      a bit heavy to me, to check for a valid URL. Added unittest.
    - Move run time requirements out of test-requirements.txt
      that are actually run time requirements for sysinv.
    - Also remove python2-mox from the centos7 spec file since
      its no longer required.
    - Updated debian/control and opensuse rpm as well.
    
    tests:
    - Did basic build test.
    - Ran smoke test with resulting RPMS and iso.
    
    Story: 2006796
    Task: 42072
    
    Signed-off-by: Charles Short <charles.short@windriver.com>
    Change-Id: I591ed52937092233c3ac7c2c2bf847b2b0c3f690

commit 9ed6740aebad0a6344e2490e89ade7afc3b26d9a
Author: Charles Short <charles.short@windriver.com>
Date:   Tue Mar 23 11:49:05 2021 -0400

py3: Fix for python2/python3 compatibility
    
    - Replace 'range' with six.moves.range.
    - Replace 'zip' with six.moves.zip.
    - Reaplace 'map' with six.moves.map.
    - Replace a/b with a//b to use interger division on python3.
    - Replace dict.keys() with list(dict.keys()) to get a list on Python
      3. On Python 3, dict.keys() now returns a view.
    
    test:
    - Changes were tested by running tox py27 and py36.
    - Build iso with added changes and ran some basic functionality
    
    Story: 2006796
    Task: 41887
    
    Signed-off-by: Charles Short <charles.short@windriver.com>
    Change-Id: I5086832605752bdb00a40a24596494c8fd987692

commit 4cc0fe7332af99a68d4a94115e3cb9ef8126a518
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Sat Oct 24 17:23:00 2020 -0600

add sysinv support for specifying cpu function by range
    
    In order to add flexibility we want to allow specifying CPU
    function by range, rather than just by count.
    
    This will allow us to run something like this on the CLI:
    
    system host-cpu-modify -f application-isolated -c 3-5,25 controller-0
    
    There are a couple complications to be aware of.  First, sysinv will
    NOT automatically add any missing SMT hyperthreads if the host has
    hyperthreading enabled.  Second, when specifying CPU function for
    a different function via the CLI the range specification is lost and
    gets converted to a simple count.  This implies that (for the CLI)
    only one function can support a range-based specification, and it
    must be specified last.
    
    Story: 2008760
    Task: 42180
    Change-Id: Id21d9968b6b0b59e163f42098be7a6f0e6ef739d
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit 012f5b82eee70fc838eba43729951491f8603f4e
Author: MCamp859 <maryx.camp@intel.com>
Date:   Fri Jan 22 17:05:07 2021 -0500

Remove SNMP APIs from user doc
    
    Deleted APIs for SNMP Communities and Trap Destinations.
    Added pointers to Fault Management guide.
    
    Story: 2008132
    Task: 41396
    
    Change-Id: Ic2c52bf1b11d1793d57c78264e757795af1deff3
    Signed-off-by: MCamp859 <maryx.camp@intel.com>

commit 5a9ea65db6e5749de61100713994b937e2010438
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Thu Mar 11 15:13:56 2021 +0200

Detect active/standby switch
    
    A swact can occur outside of a controlled maintenance action.
    An unexpected swact needs to be detected to let apps reapply evaluation
    happen.
    
    A swact is defined as follows:
    - sysinv-conductor runs only on active controller
    - hostname is used to determine where sysinv runs
    - sysinv-conductor switched controllers
    
    The detection can be run only once when conductor is restarted.
    The detection picks up both expected(system host-swact) and
    unexpected(reboot) swacts.
    A new trigger(APP_EVALUATE_REAPPLY_TYPE_DETECTED_SWACT) for evaluating
    apps reapply is used.
    
    Story: 2007960
    Task: 42053
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I48126887b0003a6a1a05ddd23f0429ef79a39832

commit 7343c40f99f861c30797b53b57c5afad58ac057d
Author: Melissa Wang <melissa.wang@windriver.com>
Date:   Tue Feb 9 08:56:12 2021 -0500

Add support for AIO-SX to DX migration on subcloud
    
    This update allows the user to change the system_mode on a subcloud
    from simplex to duplex using the system modify command. The sysinv DB
    and the platform.conf will be updated with the new system mode. The
    semantic checks were modified to ensure that changing from duplex to
    simplex is prohibited. The changes also include support for updating
    the OAM networking config using the oam-modify command.
    
    Story: 2008587
    Task: 41885
    
    Signed-off-by: Melissa Wang <melissa.wang@windriver.com>
    Change-Id: If7c14222ca66323225400ed88f214655f33fe615

commit 3d42b28c77e47c4ce217aa746c786503df9d0818
Author: Andrei Grosu <andrei.grosu@windriver.com>
Date:   Tue Mar 2 01:09:51 2021 +0200

Make the placeholder db entry unique for remote URLs.
    
    The original solution removed the  'app-name-placeholder' dummy
    entry if download failed but it was not approved because
    sysadmins expect this and manually removing the dummy placeholder
    app is advised.
    
    This patch simply sets an unique name for the placeholder using
    the first 16 characters for the URL md5sum as a postfix in case
    there are multiple application-upload performed in sequence (
    by a script, for example)
    
    Closes-Bug: 1917374
    Signed-off-by: Andrei Grosu <andrei.grosu@windriver.com>
    Change-Id: Ib5db12bb23a0e7cce52596532e661d12092ea1d1

commit f8f826f960ca0359d0dfb62d7237beaf3ea07bcb
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Thu Mar 11 05:52:11 2021 -0500

Allow modification of OAM IP addr, in AIO-SX, without locking the host
    
    Users will be able to change the OAM IP address without a lock/unlock
    cycle. To achieve this some services will be restarted (sm, sm-api,
    haproxy and vim-webserver) to reopen the L4 ports using the old IP
    address as part of the socket.
    
    Some config files in /etc are being updated also with the new address.
    
    Story: 2008531
    Task: 42060
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/780955
    Change-Id: I9e77fc60882f20d4f31c3e38b5305b1f207f40d9
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>

commit c11bd71fb1c7aadf9542cdd2a79d3c045c5c2b44
Author: Carmen Rata <carmen.rata@windriver.com>
Date:   Thu Mar 25 23:37:17 2021 -0400

Update psp migration script used in upgrade
    
    This commit updates the migration script used to apply the
    pod security policy (psp) configuration during an upgrade.
    The change is necessary to accommodate newly added psp
    ClusterRoleBinding configurations.
    
    Closes-Bug: 1885716
    Depends-On: https://review.opendev.org/c/starlingx/ansible-playbooks/+/782325
    
    Change-Id: Ibfdfd51e588eb2ad47c9f1c116875d01a2f06502
    Signed-off-by: Carmen Rata <carmen.rata@windriver.com>

commit 3f6c732939b8fc857c932ee05b3418545ccae8f1
Author: Andrei Grosu <andrei.grosu@windriver.com>
Date:   Fri Feb 12 16:20:37 2021 +0200

Regenerate the correct plugins path.
    
    For some reason, if the application to be uploaded is a (remote) URL,
    parsing the manifest if deferred (presumably not to block for large
    files and/or slow networks) and a dummy 'app-name-placeholder' is used
    until later, when the file is unpacked and its manifest read.
    
    Closes-Bug: 1915518
    Signed-off-by: Andrei Grosu <andrei.grosu@windriver.com>
    Change-Id: Ic3929965ea931b117c3e5aab6f8e3f128bbbeb56

commit e2e7d45d0e5a84f7ef345a5a6561c5370a85a5fd
Author: Tao Liu <tao.liu@windriver.com>
Date:   Wed Mar 24 20:35:22 2021 -0400

Fix dcorch subcloud audit issue after the upgrade
    
    Dcorch-engine stops auditing the subclouds after the upgrade.
    This is because audit_status of subcloud_sync data was not
    set during data migration.
    
    This update sets audit_status to initial state of “none”.
    
    Test: Upgrade controller-1, and then swact to controller-1.
          Verify the dcorch-engine audits subclouds.
    
    Closes-Bug: 1920962
    
    Signed-off-by: Tao Liu <tao.liu@windriver.com>
    Change-Id: If8fa6c5e1c1d1a81104976cb3e527c4095dd97f7

commit ed127df6ad9b8863080ef6cf5f44be2e5966518f
Author: Charles Short <charles.short@windriver.com>
Date:   Mon Mar 8 09:01:03 2021 -0500

Remove oslo-incubated version
    
    A long time ago oslo-incubated code was used to build the
    individual projects the same way. Now the openstack projects
    use pbr to build the python projects.
    
    Remove oslo-incubated version code, it is not being used anywhere
    so just remove it. Unit tests run fine when this module has been
    removed.
    
    Story: 2006796
    Task: 42010
    
    Signed-off-by: Charles Short <charles.short@windriver.com>
    Change-Id: Ib11d69210878f38febf2d031b083a1ad85fec30c

commit 08c14894f3bf7fb6645707f689394dded27800ea
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Wed Mar 10 09:45:24 2021 -0500

Add bond option primary_reselect
    
    This update is to allow the option primary_reselect configurable for
    aggregated ethernet interface. The option is to prevent reverting
    between the primary slave and other slaves.
    
    Story: 2008706
    Task: 42057
    
    Change-Id: Icacc0bd2d5e42bf2e5db1505fd676c628dbe3ed1
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit 7dcdfaae898960c8f6ea7dbee892c118a1b80318
Author: Andrei Grosu <andrei.grosu@windriver.com>
Date:   Mon Mar 1 20:45:06 2021 +0000

Progress adjust metadata refactoring
    
    Changed the name of the constant and the yaml key to better reflect
    the purpose. Now the value is an integer which represents the
    adjustment value used to compute the percentage completion when
    applying charts. Cleaned up the code around the usage of the value
    and computing the percentage.
    
    Story: 2007960
    Task: 41959
    
    Signed-off-by: Andrei Grosu <andrei.grosu@windriver.com>
    Change-Id: Ia3b07b83762cdf20f6809222dc687f67c15deee5

commit 368f5ce3217314acea4253cf2ade25d9d6684580
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Thu Feb 11 12:51:31 2021 +0200

Enhance maintenance semantic checks with app hooks
    
    Let apps run semantic checks for lock and unlock actions.
    Let forced actions not run the app semantic check.
    Create unit tests for allowing and rejecting the action by an app.
    
    Story: 2007960
    Task: 41842
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: Ibe35c917cd5702031a56baf3059b70e0e2e59480

commit b3b28f56e8b52727575667b62a9d10dd409d7d18
Author: Suvro Ghosh <suvrojeet.ghosh@windriver.com>
Date:   Fri Mar 5 10:40:14 2021 -0500

Force remove or delete application
    
    Adding the functionality of using the flag -f or --force with
    system application-remove or system application-delete
    
    Story: 2007960
    Task: 42016
    Signed-off-by: Suvro Ghosh <suvrojeet.ghosh@windriver.com>
    Depends-On: Ia1017b7eff0d9bd73b6448f2c4790f7e2b89c828
    Change-Id: If68d66d799addcd996da4b146d092c855b455aa3

commit 8f0312b5184f9292dbea878d8b4da2e3ef2e786c
Author: Daniel Pinto Barros <DanielPinto.Barros@windriver.com>
Date:   Fri Feb 19 10:31:43 2021 -0500

Introducing GEO location new fields for System
    
    New fields was created for the system object.
    Changes was made to include GEO location attributes (latitude,
    longitude) to the system object and adding a way to retrieve and
    modify those attributes using the API and CLI.
    Updates on: DB system model; DB migration; System object fields;
    API fields; CLI fields; API documentation.
    
    Story: 2008570
    Task: 41721
    
    Signed-off-by: Daniel Pinto Barros <DanielPinto.Barros@windriver.com>
    Change-Id: I86f124c44d80896427e3ac1bc799fe34588ae942

commit fa1622ef5f80950928f8f2447f9c0e178e797704
Author: Suvro Ghosh <suvrojeet.ghosh@windriver.com>
Date:   Fri Mar 5 13:23:14 2021 -0500

Prevents critical apps from being removed
    
    If an app has a metadata stating remove is prevented/forbidden then
    "system application-remove" for that app  will be rejected
    
    Story: 2007960
    Task: 42005
    Signed-off-by: Suvro Ghosh <suvrojeet.ghosh@windriver.com>
    Change-Id: Ia1017b7eff0d9bd73b6448f2c4790f7e2b89c828

commit da2bc95fbcc359f9ad4e719573f19fc2bcd0afd1
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Thu Mar 11 13:28:55 2021 +0200

Change enable_secured_etcd.yml variable in upgrade
    
    The enable_secured_etcd.yml playbook will use the
    cluster_floating_address variable instead of
    default_cluster_host_start_address. So we change
    the upgrade script accordingly to use the new
    variable.
    
    Closes-Bug: 1918130
    Depends-On: I8fecc1e5e54b5a9a9a72a54c069f79f5f2d434ba
    Change-Id: I8c9fd36e1104d4713bb748a57193530a0c4b458a
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit cc27551a8e797398527a18815838ff467c7c0a39
Author: Charles Short <charles.short@windriver.com>
Date:   Wed Feb 17 13:05:30 2021 -0500

Remove unsafe umask usage
    
    The sysinv code runs under eventlet that causes the
    running greenthread to swap out the original umask. This
    results in the sysinv code running with the incorrect umask.
    
    This can be demonstrated by the "system dns-modify" command,
    the agent process starts with a umask of 022, switched to 0,
    and is never restored.
    
    This simple fix is to audit where os.umask is being used and
    replace os.umask with os.chmod.
    
    Testing has been done locally by:
    
    1. Running the "system dns-modify nameservers=8.8.8.8,8.8.4.4" command
       and checking the results/permissions of /etc/resolv.conf. Also,
       cheked to see if the umask flag have been changed in /proc/XXX/status
       before and after running the command.
    2. Running an "system applicaton-upload" command on an installed helm
       armada package, these are located in /usr/local/share/application/helm.
       After the application upload, the application-apply should be
       "applied" without error/failure as shown in "system
       application-list".
    3. Running a distributed-cloud and checking for any errors. The command
       "dcmanager subcloud show <subcloudname>" should show the identity
       service in sync after the dcmanager subcloud manage <>".
    
    Closes-Bug: 1915955
    
    Signed-off-by: Charles Short <charles.short@windriver.com>
    Change-Id: I16ce695cfc4f6fb496ac0b3287906cc968ec5e98

commit c72417aedeb9d300ca256289a596a5797c863b51
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Mon Feb 8 13:27:35 2021 +0200

Implement algorithm for reapply evaluation priority
    
    Implement algorithm to determine app priorities for reapply evaluation.
    Use information provided in metadata to create a directional graph.
    Detect cycles and abort.
    Unit tests added.
    
    Tests: AIO-SX, AIO-DX
    Apps are correctly ordered for reapply evaluation.
    Applications reapply order: [u'cert-manager', 'rook-ceph-apps',
    'platform-integ-apps', 'oidc-auth-apps', u'stx-openstack']
    
    Story: 2007960
    Task: 41781
    
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I375a90b746a0ff4c970305a26c2e3e061b14454e

commit 4face8a656ac63d110a2c1d8ce4efd98af771c47
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Tue Mar 2 22:03:53 2021 +0200

Rework platform managed apps
    
    At the moment the managed apps are hardcoded.
    This behavior needs to be changed.
    
    Let apps specify in the metadata if they want to be managed or not.
    Let apps specify in the metadata the state they want to achieve.
    Create column in kube_app table to store metadata. This will be read
    when conductor is restarted.
    
    Tests:
    Install AIO-SX and AIO-DX, apps achieve the state described in their
    metadata file.
    Restart conductor, metadata gets picked up from the database.
    Do system application-remove, app gets auto-applied.
    Do system application-delete, app gets auto-uploaded.
    
    Story: 2007960
    Task: 41780
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I881716049471183cfd1179ab0558a557c8d104d8

commit a87694bf5e0362a5466b84c4c6638ffd453e9520
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Thu Mar 4 11:10:31 2021 -0500

Use http_port from conf file for fpga-agent
    
    The sysinv-fpga-agent is modified to use the http_port parameter
    from the platform.conf file.
    If a device image update operation is in progress, the http_port
    service parameter modification cannot be applied.
    
    Story: 2007875
    Task: 41969
    
    Change-Id: I41e795606535d91131b96a014b07bf18f0032d57
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit ff3ce494eea4fb37a97897a7cceac70e60b2f727
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Fri Feb 26 17:01:26 2021 -0600

Notify dcmanager when upgrade completed
    
    When an upgrade has been completed we want to notify dcmanager
    so that it can do a load audit of the subclouds rather than
    waiting up to an hour for the normal load audit to run.
    
    Story: 2007267
    Task: 41967
    Depends-On: https://review.opendev.org/c/starlingx/distcloud/+/778338
    Change-Id: I0c03bbfa16745fa297e159256a284e8862ff926a
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit b9cd8ec6de2b86a966c062c0db45a8b90c041526
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Thu Mar 4 09:11:30 2021 -0600

Upgrade activation interrupted by host-swact
    
    During an upgrade-activate the upgrade scripts can be interrupted by a
    swact. We need to block the swact during the activation. If a swact does
    occur we need to reset the upgrade state so the activate can be
    attempted again.
    
    Closes-Bug: 1917779
    Change-Id: I9274319375296b2334533e386629d185e2b472ac
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit 69b62362c75cba883446499c5a97a984293d2eca
Author: Litao Gao <litao.gao@windriver.com>
Date:   Thu Mar 4 03:19:06 2021 -0500

Update api-ref with modified interface configuration
    
    1. allow creation of ethernet interface using sriov interface
    2. max_tx_rate options for sriov vf interface configuration
    
    Story: 2008470
    Task: 41987
    
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: Id64248060b57d1778a455637ba9cf70d680456e5

commit 505e93b4c0d34ddf9e3d24769300d9adeace8da7
Author: Isac Souza <IsacSacchi.Souza@windriver.com>
Date:   Wed Feb 17 20:38:44 2021 -0300

Fix handling of expired watch event in cert-mon.
    
    The old code did not account for a type='ERROR'
    being received by the watch stream. The new code
    checks if the received event is an error and returns
    from the infinite loop to start the watch from scratch.
    
    Closes-Bug: 1914408
    Signed-off-by: Isac Souza <IsacSacchi.Souza@windriver.com>
    Change-Id: I7feabe5b550979d3761427ae501f1a94903a8983

commit 38cc2e8f4f266125fb08a71e3531a774b433f485
Author: Tao Liu <tao.liu@windriver.com>
Date:   Tue Mar 2 16:17:18 2021 -0500

Allow routes to be configured on oam interfaces
    
    This update adds NETWORK_TYPE_OAM to ALLOWED_NETWORK_TYPES
    for configuring routes.
    
    Test: Configured routes(add/delete) through the system CLI
          against the OAM interface.
    
    Story: 2007267
    Task: 41971
    
    Signed-off-by: Tao Liu <tao.liu@windriver.com>
    Change-Id: I053ce52a760fb2b20d81d0b250bde6d9902ddaa2

commit 240be6fee8cedec4a0f36d1d624716b0766bcea2
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Mon Mar 1 17:50:38 2021 -0600

move rest_api to common code
    
    The rest_api.py function is potentially useful for all sysinv code,
    so move it to the "common" subdirectory to make this more clear.
    
    Story: 2007267
    Task: 41967
    Change-Id: I3ef2b4144f85ad6e3533e0236f4afb83bdae707e
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit 890b1208ca2ff67ad32228be4be7671f673f4a90
Author: Babak Sarashki <babak.sarashki@windriver.com>
Date:   Wed Feb 17 15:28:19 2021 +0000

config: Add global service parameter to set cri handler
    
    This commit adds global service parameter "container_runtime" to
    allow setting container runtime interface (CRI) entries in the
    containerd configuration file for custom runTimeClass.
    
    An example usage to set the cri:
    
    system service-parameter-add \
      platform container_runtime \
      custom_container_runtime=my_crihandler:/absolute/path/to/my_criBinary
    
    Story: 2008434
    Task: 41390
    
    Signed-off-by: Babak Sarashki <babak.sarashki@windriver.com>
    Change-Id: Icc5fd16682f4cf47abff16e20a5332fc195c4afc

commit 68bb5ecc2b1de4ad1ece4280f538d9ad01442bb8
Author: Litao Gao <litao.gao@windriver.com>
Date:   Fri Feb 12 11:45:15 2021 -0500

Fix miscalculation of the available link speed bandwidth
    
    Previous implementation has two issues:
    1. forgot to multiply the max_tx_rate by vf number
    2. missed the case of vf subinterface modification
    
    Story: 2008470
    Task: 41508
    
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: I4d62ed177124fb2c4ddb547a37d9ea988a332dbd

commit 790b758fe4a87227cd6b76a97b301df744f6b53f
Author: Litao Gao <litao.gao@windriver.com>
Date:   Sat Feb 13 02:40:15 2021 -0500

Need to check MTU setting for the ethernet type subinterface
    
    Previous implementation only need to check the MTU for VLAN
    and VF type subinterface, since we've introduced ethernet type
    subinterface which can be created on pci-sriov interface, also
    need to add the MTU check for ethernet type subinterface.
    
    Story: 2008470
    Task: 41505
    
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: I1e38cb63496013539b91c749076e4cbb5a951bfc

commit e86cc068932f953b34ba212e21dc7849f0f75325
Author: Litao Gao <litao.gao@windriver.com>
Date:   Tue Feb 9 10:59:45 2021 -0500

Remove the checks which restricts vf subinterface creation/modification
    
    Allow pci-sriov class and vf type subinterface creation/modification using
    the pci-sriov interface when it has already been used by other VLAN interface
    
    Story: 2008470
    Task: 41505
    
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: I51d936530a93e3a9e6ac81586c16df4b7342a180

commit b27f224803741b7e9b236671643acbde238578ac
Author: Suvro Ghosh <suvrojeet.ghosh@windriver.com>
Date:   Thu Feb 11 13:40:47 2021 -0500

Prevents critical apps from being deleted.
    
    If an app has a metadata stating deletion is prevented then system
    application-delete will be rejected
    
    Story: 2007960
    Task: 41882
    Signed-off-by: Suvro Ghosh <suvrojeet.ghosh@windriver.com>
    Change-Id: I4401d3af6e7af354783edc945c1a5cdb72c1d0a1

commit 0304082c8918470d536064157f9cad5902823a1b
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Mon Mar 1 19:12:17 2021 +0200

Fix logging during reapply evaluation
    
    If an app is in applied-failed or applying states the evaluation of
    reapplies the flow of execution is stopped. Not all applications might
    get the evaluation done.
    
    Introduced by I023eb3bce9061e0ccfcf10ebeeaef91bcb39cff1.
    
    Story: 2007960
    Task: 41760
    Closes-Bug: 1884770
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I604beb78cc7112c9d05bec358787b74aa28ebe14

commit ea1cec6cd5812a383153b07ba6b31a219118fa8b
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Fri Feb 26 14:58:42 2021 -0500

Upgrade script to create device_images directory
    
    Added an upgrade migration script to handle the case where the
    N-side does not have the /opt/platform/device_images directory.
    
    Upgrade testing was performed.
    
    Story: 2007875
    Task: 41942
    
    Change-Id: I42bd944b831243ddfc35a76309be095008ec749d
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit a86f69a5b401f81b91fb4f9feb4ce95ee03510ff
Author: Matt Peters <matt.peters@windriver.com>
Date:   Mon Mar 1 09:12:37 2021 -0600

Config API documentation for Kubernetes cluster
    
    API Reference documentation for the Kubernetes Cluster
    API for cluster access information.
    
    Story: 2008630
    Task: 41914
    
    Signed-off-by: Matt Peters <matt.peters@windriver.com>
    Change-Id: Id0942dafeb2273e271d145f20517d16b1f409560

commit 6add4f2dfbb162dc21815489bf5f24b0b9eb6e7f
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Fri Feb 19 09:21:32 2021 -0600

Support background runtime manifests during upgrade-activate
    
    In distributed cloud environments runtime manifests can be applied in
    the background. This can cause hosts to become config out-of-date after
    the upgrade-activate completes. This is due to the large window between
    setting the host's config_target and updating the config_applied. If a
    manifest is run in this window the host will remain config out-of-date
    until a lock/unlock is performed.
    
    To address this the config_target changes will be limited to hosts that
    apply a runtime manifest as part of the upgrade-activate process.
    Further the config_target will be updated immediately before the
    _config_apply_runtime_manifest is called.
    
    Story: 2008055
    Task: 41917
    Change-Id: I2e60c7557e8d398eeef2a407a0552f5e8f4a1f18
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit b08ad875ebc9f1c6383b1ad48e9bee19b7425f94
Author: Matt Peters <matt.peters@windriver.com>
Date:   Tue Feb 23 08:33:18 2021 -0600

Config API for Kubernetes cluster access information
    
    Introduces a new sysinv config API for retrieving the Kubernetes
    cluster access information and security credentials (if configured).
    
    The information may be used to configure a remote Kubernetes client
    with the required configuration to use the Kubernetes API with
    administrative privileges with either client certificate authentication
    or token authentication for the kubernetes-admin service account.
    
    The following information is available for each cluster:
      Kubernetes Cluster Name (kubernetes)
      Kubernetes Release Version
      Cluster API Endpoint URL
      Cluster Root CA Certificate
      Admin Client Certificate
      Admin Client Key
      Admin User Name (kubernetes-admin)
      Admin service account token
    
    Story: 2008630
    Task: 41836
    Signed-off-by: Matt Peters <matt.peters@windriver.com>
    Change-Id: Ib81c7fcc3a577c1209ab3a0dd882552ba3d2b9db

commit c4affcaf43ff936f974b3650097bbd6beac93a0c
Author: Litao Gao <litao.gao@windriver.com>
Date:   Tue Feb 9 04:36:12 2021 -0500

Remove the checks to enable flexible pci-sriov interface config ordering
    
    1. Allow pci-sriov class interface creation without mgmt. interface configured
    2. Allow interface transition from platform to pci-sriov class
    
    Story: 2008470
    Task: 41505
    
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: Icd22ed5ae8314e8849dc07e3fc94baac26f49079

commit df6a3386544a2c1defcf4a9cc35f3eda0daa1e4f
Author: John Kung <john.kung@windriver.com>
Date:   Mon Feb 15 15:05:24 2021 -0600

Ensure agent is ready before issuing runtime config
    
    Update the conductor to ensure at least its own config agent
    on the active controller is ready to handle the config.  Otherwise,
    append runtime config to deferred config list until signalled ready.
    
    Prior to this commit, runtime config can be missed on startup
    on the active controller due to the agent not being ready
    to handle the config request.
    
    It does not defer runtime config application until all other hosts
    are ready as the config target is still persisted to track required
    config for host target in the event the rpc request is missed due
    to unexpected event or agent not ready on other hosts.
    
    Test Performed:
      Installation and deployment of AIO-SX
      Installation and deployment of multinode system
      Verify that deferred runtime configs are applied in order
      when agent becomes available.
    
    Change-Id: I7388844d048453d302409eea36a939d81c9447ec
    Closes-Bug: 1915343
    Signed-off-by: John Kung <john.kung@windriver.com>

commit 96a51f34fafcc71fba2cbec25067a822120c4a6c
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Wed Feb 17 16:32:40 2021 -0500

Retrieve device image over lighttpd
    
    Modify the fpga-agent to fetch the device image from the controller
    via http instead of the drbd directory.
    
    Tests performed on the following systems:
    AIO-DX, AIO-DX plus compute, Standard 2+1
    DC with AIO-DX plus subcloud
    DC with Standard subcloud
    Upgrades on duplex system
    
    Story: 2007875
    Task: 41879
    Depends-on: https://review.opendev.org/c/starlingx/stx-puppet/+/776490
    
    Change-Id: I9a53eb2131c5ce2c2b87c1740e234af65ffabf78
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit 61dfb4c0e61fc7aa3a8c33e6a5984525b4c9d608
Author: Andy Ning <andy.ning@windriver.com>
Date:   Fri Feb 19 13:40:12 2021 -0500

Enhance upgrade script 85 to handle empty data
    
    Upgrade script 85-update-sc-admin-endpoint-cert.py extracts admin
    endpoint certificate and private key from cert-manager. But
    intermittently the certificate data returned is empty, causing an
    empty /etc/ssl/private/admin-ep-cert.pem generated and outage
    of many services. Eventually the upgrade is failed.
    
    This update enhanced the script to detect invalid data returned
    and retry to extract the data from cert-manager.
    
    Change-Id: Ib2c7da9147f28bf10dcb8a053412a8f94af42353
    Closes-Bug: 1916279
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit 5cc9dc9c86b9214803ebba1231e227af862db77a
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Fri Jan 29 11:10:43 2021 +0200

Rework evaluating apps reapply
    
    At the moment reapply evaluation is performed only for hardcoded apps.
    This behavior needs to be changed.
    
    Allow apps to specify what triggers they subscribe to in their metadata
    file.
    In sysinv a trigger is a dictionary composed of at least the 'type'
    field and other optional fields.
    
    Easy use of filters on the triggers by specifying key:value pairs to be
    searched in the trigger information.
    
    Keep a map between sysinv triggers type and the trigger types specified
    in the metadata.
    
    Existing triggers are: host-delete, runtime-apply-puppet,
    platform-update, unlock, force-unlock.
    Add new triggers: host-add, host-delete, host-reinstall, lock, swact,
    force-lock, force-swact, system-modify.
    
    Introduced a lifeycle semantic check hook for reapply evaluation where
    apps can run complex logic to reject the evaluation.
    
    Modified unit tests. These are basic unit tests querying the number of
    calls.
    Unsupported unit test for host-add.
    Missing unit test for system-modify.
    
    Tests: AIO-SX, AIO-DX
    Triggering the events, events are seen in the logs as expected, apps
    respond to events as specified in their metadata.
    
    To keep backwards compatibility, this work depends on updating the
    metadata file for each of the 5 apps that use the existing triggers.
    Depends-On: Ia7bbca906e343ffffa019885a790befdf5ccb565
    Depends-On: I588648090e82ac573db5112a2704d22fa45a049f
    Depends-On: Ie02743cdf056dda3feb66911c74f9dabe69d98dd
    Depends-On: I4778c6c8232fffb5fafca95b450e590fbb1b0f64
    Depends-On: I0a76ef10fe3958634d714f1484d79763f98a0d4e
    Story: 2007960
    Task: 41760
    Closes-Bug: 1884770
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I023eb3bce9061e0ccfcf10ebeeaef91bcb39cff1

commit fb5956ed3aa068a2b6cd7851dcefd2bdce0244ff
Author: Angie Wang <angie.wang@windriver.com>
Date:   Tue Feb 23 18:18:53 2021 -0500

Update the minimum small root disk size
    
    Dc-vault filesystem should be counted into the minimum
    root disk calculation.
    
    Partial-bug: 1916797
    Depends-on: https://review.opendev.org/c/starlingx/metal/+/777464
    Change-Id: I65ac2cc5bda3a94728a7f593b6aadbafca7a3af6
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 2b7a745b7f033de13ca6fb42afcb6a8929afdc45
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Fri Oct 9 15:07:16 2020 +0300

Migrate to database backend for backup and restore
    
    Drop the use of a flag file for restore.
    Create the database table for backup and restore.
    Use the database to store information.
    
    CREATE TABLE backup_restore (
            created_at TIMESTAMP,
            updated_at TIMESTAMP,
            deleted_at TIMESTAMP,
            id serial PRIMARY KEY,
            uuid VARCHAR ( 36 ) UNIQUE NOT NULL,
            state VARCHAR ( 128 ) NOT NULL,
            capabilities TEXT
    );
    
    This is an improvement to allow backup and restore functionality to be
    extended in an easier manner. Not required to fix the the bug.
    
    Added unit tests.
    
    Depends-On: I7b7fab99d457056032dbbd612363cd5036736cda
    Depends-On: I44fc4aaa528e372a84115714f271b4f5e063f86e
    Partial-Bug: 1887648
    Change-Id: Ibb96696a35fe7b560aa002c442e2af735d08ec24
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>

commit fca3000f725bb1ec447a7546c7d3805723544352
Author: Isac Souza <IsacSacchi.Souza@windriver.com>
Date:   Wed Feb 17 22:06:21 2021 -0300

Fixes the time calculation in the is_expired method of the Token class
    
    The previous implementation calculated an absolute delta between the
    expiration time and the current time and compared it to a time window.
    The calculation should be done without getting the absolute value
    to account for an expiration time older than the time window.
    
    Added unit tests for the affected code.
    
    Closes-Bug: 1915952
    Signed-off-by: Isac Souza <IsacSacchi.Souza@windriver.com>
    Change-Id: I46bb36993d6f02978a593a1c1d49692e627d2a9b

commit 00d26663ad0141277620d438fb5904d3de715077
Author: Charles Short <charles.short@windriver.com>
Date:   Tue Feb 16 11:17:32 2021 -0500

Deprecate sysinv.openstack.common.rootwrap
    
    Deprecate sysinv.openstack.common.rootwrap in favor or oslo_rootwrap.
    This was done so we maintain less code and we worry about Python3
    a little less.
    
    Story: 2006796
    Task: 41868
    
    Signed-off-by: Charles Short <charles.short@windriver.com>
    Change-Id: I979ff1c8045030cfcaf8b88678121c4d0d684743

commit 063d9f84487407242ac1eb6cded2062abfdc01f0
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Thu Feb 11 10:01:52 2021 -0600

Limit rootfs size constraints to controller nodes
    
    Previously rootfs size restrictions were limited to controller nodes.
    Return to this model to better support lab and virtual installs.
    
    Closes-Bug: 1915215
    Change-Id: I6dd53f74de54c971c59c5d821ec33a33c7e152f5
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit 6ab90d747a953f5a9f75bc05d0b55d3e2ac74fd2
Author: Babak Sarashki <Babak.SarAshki@windriver.com>
Date:   Tue Jan 5 16:08:43 2021 -0500

sysinv: Intel ACC100 (Mt Bryce) enablement
    
    This commit adds SR-IOV device plugin support for forward error
    correction (FEC) devices that are enabled on an Intel ACC100 (Mt.
    Bryce). The Intel ACC100 is mounted on Lisbon ACC100 Card.
    
    The FEC device is intended for use by a DPDK application. It is
    presented to the system under resource name: "intel_acc100_fec."
    
    An example usage to modify the device:
    
    system host-device-modify <host> <device_name> \
      -e true \
      --driver igb_uio \
      --vf-driver <driver> \
      -N <num_vfs>
    
    And example assignment to a pod:
    
     resources:
          requests:
            memory: 4Gi
            intel.com/intel_acc100_fec: '16'
            cpu: 6
          limits:
            hugepages-1Gi: 2Gi
            memory: 4Gi
            intel.com/intel_acc100_fec: '16'
            cpu: 6
    
    Story: 2008440
    Task: 41403
    
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/775253
    Depends-On: https://review.opendev.org/c/starlingx/integ/+/775252
    
    Signed-off-by: Babak Sarashki <Babak.SarAshki@windriver.com>
    Change-Id: I831fd16a0410ee988365c067789f760139274ec8

commit a45569eda4daf7f323294a116a76d1bb75e1e5ed
Author: Cole Walker <cole.walker@windriver.com>
Date:   Tue Feb 16 20:54:11 2021 +0000

Revert "Add api call to trigger creation of gnp"
    
    This reverts commit cd344669d381103b8e065617811b928eb895eacb.
    
    Reason for revert: Config out of date alarm is present after install. Code rework is required.
    
    Change-Id: Ic71eeb8b9d95436b1d1999bf3a1d28022a972077

commit 0786d37bebf244c918dd227fe3112af80870cc79
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Wed Feb 10 08:43:48 2021 -0500

Allow SR-IOV interface modify on AIO-SX without locking the host
    
    On an unlocked AIO-SX host, it will be possible to modify a non-SR-IOV
    interface to SR-IOV PF, and will also be possible to add and delete
    SR-IOV VF interfaces
    
    But the user will not be able to modify parameters on an existing SR-IOV
    interface as well modify an SR-IOV PF to a different class
    
    Story: 2008531
    Task: 41802
    Change-Id: I1b8b214816404fb0893e77acd55798017fa836bc
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>

commit 625017798a93d2fe57740c981fe3ba7c6acaebae
Author: Bart Wensley <barton.wensley@windriver.com>
Date:   Wed Feb 10 15:06:03 2021 -0600

Add unit tests for _create_host_filesystems
    
    Adding unit tests for _create_host_filesystems in the sysinv
    AgentManager class. This points out the issue described in
    LP1915215 and will ensure any future changes don't break
    existing behaviour.
    
    Also clamping the version of astroid to allow pylint to pass
    properly in local test environments.
    
    Change-Id: Ifd4da0e044c3262281e52826854c20e0d161a3bc
    Partial-bug: 1915215
    Signed-off-by: Bart Wensley <barton.wensley@windriver.com>

commit 500d4e250c49c92b1b3440d068ab654483a437fc
Author: Carmen Rata <carmen.rata@windriver.com>
Date:   Mon Feb 8 10:35:26 2021 -0500

Fix out-of-date alarm wrongful removal
    
    When updating docker service parameters by either adding or
    deleting them, a "Configuration is out-of-date" alarm will
    automatically appear for both active and standby nodes.
    The alarms should only be cleared by a manual lock/unlock
    of each node.
    The fix addresses the scenario where a configuration update
    that triggers a runtime manifest clears the existing out-of-date
    alarm (e.g.: system certificate-install -m ssl_ca <cert>).
    The issue is reproduced no matter what configuration update with
    runtime manifest is used or if the existing alarm was due to a
    “service-parameter-add” or a “service-parameter-delete” command.
    
    Closes-Bug: 1878018
    
    Signed-off-by: Carmen Rata <carmen.rata@windriver.com>
    Change-Id: I1e545327ae394385c995aee885590779432c60b0

commit f2908be215b6c86fb273587c34ec23360791d276
Author: Isac Souza <IsacSacchi.Souza@windriver.com>
Date:   Fri Feb 5 22:25:45 2021 -0300

Backward compatibility changes for registry cert managed by cert-manager
    
    If a user tries to install a docker_registry certificate and that
    certificate is being managed by cert-manager (i.e. secret is present),
    we will refuse to install it unless a force=true parameter is passed
    in.
    
    Similar to the logic already present for mode=ssl.
    
    Story: 2007361
    Task: 41782
    Signed-off-by: Isac Souza <IsacSacchi.Souza@windriver.com>
    Change-Id: Ief4ba51054e6fec951f4c44dbf40c2b3e2e8b292

commit 983add3417b2b5dbeadc22be1f1997ca9e0ab16e
Author: Isac Souza <IsacSacchi.Souza@windriver.com>
Date:   Thu Jan 28 12:44:09 2021 -0300

Docker registry certificate management by cert-manager
    
    Cert-mon changes to monitor 'system-docker-local-certificate' k8s secret
    and install StarlingX docker_registry certificate for
    registry.local:9001.
    
    Changes include:
    - New thread to watch registry certificate changes
    - Refactored the code to reduce code duplication
    - Call sysinv api for 'certificate_install'
    
    Design testing completed:
    - When k8s secret is added/modified (initiated by cert-manager),
      certificate installation is completed
    - sysinv api 'certificate_install' installs & confirmed via openssl
    s_client -connect registry.local:9001
    - When certificate is renewed, keys get regenerated (no changes
      needed. Confirmed that existing infrastructure takes care of it)
    
    Story: 2007361
    Task: 41717
    Change-Id: Iffa68486764287a1b82a183ab9801a53c1e4885b
    Signed-off-by: Isac Souza <IsacSacchi.Souza@windriver.com>

commit c69304a998f6bc5b76b31c161ae4b6815ad60b54
Author: Gonzalo Gallardo <gonzalo.gallardo@windriver.com>
Date:   Thu Feb 4 16:29:26 2021 -0300

Restore system modify command for SNMPv3
    
    Restores "system modify" command for --(contact|name|location) options.
    Also, this affects the "system_name" variable in [fm.conf] configuration
    file.
    
    Story: 2008132
    Task: 41761
    Signed-off-by: Gonzalo Gallardo <gonzalo.gallardo@windriver.com>
    Change-Id: I143873953603b36c5e79f5247801b33efe464022

commit 4cd77f035a812d8fbc6f060ca483b8050c65cb91
Author: John Kung <john.kung@windriver.com>
Date:   Mon Feb 1 15:23:04 2021 -0600

Update config tracking for reboot required config
    
    The reboot config tracking needs to be updated to ensure
    it aligns the target with the actual applied config.
    It must also initialize and only clear the config for
    the active controller if a reboot has actually occurred.
    
    On host-swact, the target controller is checked for config
    up to date condition, rather than both source and target
    controller; as the source active controller would still
    tracked the config requirement in the persisted database.
    
    Improve traceback of config requests.
    
    Tests Performed:
      Perform host-swact with active controller config out of date
      Perform host-swact with standby controller config out of date
      Verify AIO-SX and Duplex config operations
      Restart sysinv with reboot config set and perform runtime config
      Perform runtime config while reboot required config set
      Perform runtime config while reboot required config cleared
    
    Change-Id: I339ad82a2c7b37ac1c97c3eb790a231a40914250
    Closes-Bug: 1914085
    Signed-off-by: John Kung <john.kung@windriver.com>

commit 36a4ff4fd2194bf876c1ce68c17f16b047bff7ae
Author: Adriano Oliveira <adriano.oliveira@windriver.com>
Date:   Mon Feb 1 19:48:28 2021 -0500

Fix migration scripts execution sequence
    
    The following changes have been introduced as a fix for this issue:
    
    1. Changed the sorting on the migration script file names to be based
    on the first number on the file name.
    2. Added file name format validation: "nnn-*.*", where "nnn" string
    shall contain only digits.
    3. Fixed the name of two migration scripts that were not following the
    correct format (not using "-" separator).
    4. Added set of unit tests to test and validate the execution of
    migration scripts code.
    
    Manual upgrade testing to STX 5.0 has been executed.
    
    Closes-Bug: 1887985
    Signed-off-by: Adriano Oliveira <adriano.oliveira@windriver.com>
    Change-Id: I04fdb8a3b3e177c609c4037825810a531954d99c

commit d64f6f59244c56f26899aa5f2b8fb8a33714f3b7
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Wed Jan 27 13:43:59 2021 -0500

Support data network assignment on an unlocked host
    
    For SR-IOV interfaces, when operating in AIO-SX, it will be
    possible to assign it to a datanetwork without host lock
    
    Story: 2008531
    Task: 41705
    Change-Id: Ia4b5670ffe09255845823dc3ae2c3fc19c709fc9
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>

commit cd344669d381103b8e065617811b928eb895eacb
Author: Cole Walker <cole.walker@windriver.com>
Date:   Tue Jan 12 10:25:18 2021 -0500

Add api call to trigger creation of gnp
    
    Creating an oam network was not correctly setting the flag to trigger
    the runtime puppet manifest to create the default globalnetwork policy
    and hostendpoint resources.
    
    This change adds the correct api call to ensure that the flag is set
    when an oam network is created. This ensures that the default
    globalnetworkpolicy and hostendpoint kubernetes resources are created as
    intended.
    
    This issue was previously masked by the fact that the relevant runtime
    manifest was being run on every single unlock. This behaviour was
    changed and revealed that the required flag was not being set upon oam
    creation.
    
    Closes-Bug: https://bugs.launchpad.net/starlingx/+bug/1911213
    
    Signed-off-by: Cole Walker <cole.walker@windriver.com>
    Change-Id: I5b570e82246d82a4243ee68a871333876d231c85

commit a4a969c94f2ef4dc1531592b6563e90d6465d6ac
Author: Mingyuan Qi <mingyuan.qi@intel.com>
Date:   Tue Nov 24 02:13:09 2020 +0000

Add host command support for the edgeworker node
    
    This is an experimental feature in stx5.0.
    
    This commit enables following commands for the edgeworker node:
      system host-add/system host-update/system host-delete
    After the host being added/updated, the mgmt ip of an edgeworker
    node will be assigned during the configuration process of it.
    
    There will be limitations of edgeworker nodes before the final
    phase of the feature finished:
    - The Kubernetes provisioning requires ansible playbook triggered
      manually.
    - Gather node HW information is not supported.
    - Configure node from controller is not supported.
    - Manage node lifecycle is not supported.
    - Update/upgrade node is not supported.
    
    Story: 2008129
    Task: 40862
    
    Change-Id: I7e6de65ba848d9468a4e5afddd16b1cd9e3cd7dd
    Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com>
    Depends-On: https://review.opendev.org/c/starlingx/config/+/761716

commit aa851db31fa3ea25337da819d6fd9fc874f7fcfd
Author: Martin, Chen <haochuan.z.chen@intel.com>
Date:   Fri Apr 17 11:02:08 2020 +0800

Introduce rook ceph
    
    1, add storage backend rook ceph for container-based ceph cluster
    2, create puppet rook.py
    3, update sysinv-agent audit function
       rook ceph provision disk with bluestore, which create a vg and lv
       update sysinv-agent audition function, so after rook-ceph applied
       add rook-ceph created vg named ceph-xxxxx to sysinv-db
    4, update lvm filter for rook-ceph provisioned osd
    
    Story: 2005527
    Task: 39452
    
    Depends-On: https://review.opendev.org/#/c/713084/
    Depends-On: https://review.opendev.org/c/starlingx/rook-ceph/+/716792
    
    Change-Id: If8c1204dd3c7cc25487b2f645ace9aa680d32d59
    Signed-off-by: Martin, Chen <haochuan.z.chen@intel.com>

commit 8671cb8f5a8340bdd5f623343bde39f76e9dc3fc
Author: John Kung <john.kung@windriver.com>
Date:   Thu Jan 21 09:14:18 2021 -0600

Upgrades: data migration for dcorch scaling
    
    The dcorch scaling feature introduced new tables, of
    which the subcloud_sync table needs to be populated.
    
    The subcloud_sync data is populated with initial
    values similar to what would have been
    done on subcloud add.
    
    Change-Id: Ib880ea19cb7f69cb004d58557ad07535c1e45774
    Story: 2007267
    Task: 41648
    Signed-off-by: John Kung <john.kung@windriver.com>

commit b40f98b57e3274892c893d30139e03bb6b21d6d2
Author: Litao Gao <litao.gao@windriver.com>
Date:   Mon Jan 25 03:46:08 2021 -0500

Interface profile support for VF rate limiting configuration
    
    This feature is part of 'Single NIC support'.
    This commit eanbles interface profile creation and apply of
    interface configuration which includes VF max-tx-rate config.
    
    Story: 2008470
    Task: 41683
    
    Depends-On: https://review.opendev.org/c/starlingx/config/+/770135
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: Ib8a8a1844d5a151c15454f446129642ed4f4dfe8

commit 69664edc58bbf062d0628b0d283e66104f8d0a0e
Author: Litao Gao <litao.gao@windriver.com>
Date:   Mon Jan 11 08:51:24 2021 -0500

VF rate limiting support
    
    This feature is part of 'Single NIC support'.
    This commit implements the rate limit on VFs in terms of
    max_tx_rate leveraging hardware NIC driver capability.
    And adjust sriov-device-plugin config.json to make it easy
    to allocate rate limited VFs to the target Pods.
    
    Story: 2008470
    Task: 41508
    
    Depends-On: https://review.opendev.org/c/starlingx/config/+/770132
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: I3b609296b6b5b872f0bb0bd9733740b01e9d421c

commit 52677deb2c56038d108edbd5ae7dd788b9501f77
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Wed Jan 20 11:31:22 2021 -0600

Remove upgrade flags during abort
    
    If the upgrade flags are present during controller-1 downgrade, the
    controller will fail to unlock. Previously these were removed as part of
    the host-downgrade command, but this requires root access. These flags
    will now be removed by the conductor as part of the upgrade-abort
    command.
    
    Story: 2008055
    Task: 41633
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>
    Change-Id: I7bb60e42d956140e06e44145ab8b168d1f3b72be

commit a875f32e57b05420d3bc018bc14718a95591ba53
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Sat Jan 23 13:16:22 2021 -0600

Remove host manifest during downgrade
    
    During the host-downgrade action the host manifest needs to be removed.
    This is similar to the process taken during host-reinstall. The manifest
    needs to be removed to prevent the host from running kubeadm
    prematurely.
    
    Story: 2008055
    Task: 41674
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>
    Change-Id: I2242e8ab95228549ea3c1b3f6924025ae66ecb0f

commit ac284046e449b3998db861b16db59dabbe8eb7bc
Author: albailey <Al.Bailey@windriver.com>
Date:   Mon Jan 25 14:27:52 2021 -0600

Fix tox failure when certain tmp folder un-writable
    
    When /tmp/device_images folder is not writeable, tox
    fails for three device_image API unit tests.
    
    This change mocks open and fdopen when calling that POST
    API endpoint, to allow the unit tests to pass.
    
    Closes-Bug: 1911997
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: Ib6132a9b95b214e60ab34faa8a9615e1a2381fef

commit c991e1e122db066f386c8ba4295f206fbae818e5
Author: Bin Qian <bin.qian@windriver.com>
Date:   Wed Jan 20 14:23:47 2021 -0500

cert-mon secret data migration for upgrade to stx5
    
    Add data migration code to populate secret data for cert-mon
    service. The secret data is stored in static secret data
    file. The data in static secret is only configured in initial
    bootstrap.
    
    Closes-bug: 1913173
    Change-Id: I9ddb1aca9b2ba136facf1b3c294a273010e2a26b
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit c0fadef2c1aa8a0335b69a27d6c41db757a78131
Author: Litao Gao <litao.gao@windriver.com>
Date:   Thu Jan 7 01:43:37 2021 -0500

Single NIC support
    
    This feature is to allow user to configure CaaS and Data networks
    to single VF capable NIC during ‘Host interface configuration’
    before unlock operation.
    
    This commit introduce a new subinterface type which is used to
    support for ethernet subinterface creation over pci-sriov class
    ethernet interface.
    
    Story: 2008470
    Task: 41505
    
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: I2c148d24c73854e59ebb2be122e97017c6235c89

commit 6d4d5e384727548a68d1509d94fe8afc22b16e53
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Wed Jan 20 10:25:33 2021 -0600

Migrate etcd after both controllers are upgraded
    
    The etcd database can go out of sync if we swact back and forth between
    controllers running different versions. This can happen if we need to
    abort the upgrade after swacting to controller-1. Since we are not
    upversioning etcd in this release we will address this by waiting to
    migrate the etcd data when both controllers are running the new release.
    The migration will now take place during the swact to controller-0
    before upgrade-activate.
    
    This solution will present some problems when we do upgrade etcd, so
    further development will be required at that time.
    
    Story: 2008055
    Task: 41630
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>
    Change-Id: I02b82bfe1a4b4b69aaa85d5f0d20246b9cda5629

commit 81df87b5b675c34d3216b5057153f510fcfa9baf
Author: Sabeel Ansari <Sabeel.Ansari@windriver.com>
Date:   Fri Jan 22 17:48:28 2021 -0500

Fix urlparse call statement
    
    During previous commit of cert-mon, import urlparse statement
    was changed, which introduced a bug. This fixes the bug.
    
    Closes-Bug: 1911057
    
    Change-Id: Ib90c89c3681ea97d77b5cdad4298cd7a08b42345
    Signed-off-by: Sabeel Ansari <Sabeel.Ansari@windriver.com>

commit 44d4774372d422d95ae198e778c5ddee32992f5a
Author: Cole Walker <cole.walker@windriver.com>
Date:   Fri Jan 22 16:55:10 2021 -0500

Add required vars for ptp-notification app
    
    Story: 2008529
    Task: 41671
    
    Signed-off-by: Cole Walker <cole.walker@windriver.com>
    Change-Id: I807b03203a7589e5b1a04885dab09476323c5293

commit 6a51c9420a01ad48c8c8eec4835d9ed01a83d36c
Author: Yuxing Jiang <yuxing.jiang@windriver.com>
Date:   Thu Jan 14 09:37:11 2021 -0500

Upgrade: fix hieradata software version mismatch
    
    This commit disables updating puppet hiera data in runtime if the
    software load of a host is different from the active controller, to
    prevent the hiera data software version mismatch during an
    upgrade/rollback procedure.
    
    Tested with an upgrade and rollback an AIODX system.
    
    Closes-Bug: 1911457
    Change-Id: I42a953c0a31b80f2536de9292ed8b17789f6328c
    Signed-off-by: Yuxing Jiang <yuxing.jiang@windriver.com>

commit 3477d1e3ed3266c091756acc07128277c31ee6db
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Thu Nov 19 04:19:48 2020 -0500

Remove identity entry on upgrade
    
    Identity services are no longer a shared service in DC.
    This commit updates the i_system table on subclouds to
    remove identity from the shared service list.
    
    Change-Id: I9a93c1aa364413d77af60a67c70a16ecbd546356
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>
    Closes-Bug: 1904675
    Depends-On: https://review.opendev.org/#/c/763127/

commit f2dbdddcd160b643ccdc89b3d7b8e98acb25f6db
Author: Angie Wang <angie.wang@windriver.com>
Date:   Wed Jan 20 15:42:06 2021 -0500

Add Armada pod ready check in sysinv application audit
    
    In rare cases, after controller-0 is up from the initial
    unlock, armada pod is not running and ready yet when
    uploading platform-integ-apps. This will cause application
    upload failed.
    
    For the sysinv managed applications, update to ensure
    Armada pod is running and ready before attempting to
    upload/apply applications.
    
    Change-Id: I176bd1bbdb2ecf6285bd680091812aac43ea0ae3
    Closes-Bug: 1912520
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 93a86019cf68de0069702e7cf0ab7776f10c015f
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Fri Nov 6 17:40:49 2020 +0200

Introduce lifecycle operator
    
    The goal of this commit is to:
      - move most of the app specific logic outside conductor
      - introduce an operator for this purpose
      - keep the same logic as before
      - keep compatibility: apps that don't implement the new operator
      behave in a default way (same way as before)
    
    Create a lifecycle operator.
    Move lifecycle hooks from manifest operator to lifecycle operator.
    Move rbd actions from manifest operator to lifecycle operator.
    
    Move app specific code outside perform_app_operation to be
    executed by the app through lifecycle hooks.
    
    Conductor:
      - add lifecycle_hook_info param to all rpc perform_app_operation
      - introduce lifecycle hooks in perform_app_operation
      - remove some (de)activate_plugins in perform_app_operation
      - spawn only conductor.manager.perform_app_operation:
        greenthread.spawn(self.perform_app_upload
        greenthread.spawn(self.perform_app_apply
    
    Rest controller:
      - use manual mode hook in all rpc perform_app_operation
      - add semantic check hook for rpc operations:
        apply, remove, delete, abort, update
    
    Tested:
     - throw failed to download images (stx-openstack, snmp)
     - auto-apply hook throws exception on ceph down (platform-integ-apps)
     - upload (platform-integ-apps, stx-openstack, nginx-ingress-controller,
       snmp)
     - apply (platform-integ-apps, stx-openstack, cert-manager,
       nginx-ingress-controller, snmp)
     - remove (platform-integ-apps, stx-openstack, cert-manager,
       nginx-ingress-controller, snmp)
     - delete (platform-integ-apps, stx-openstack, cert-manager,
       nginx-ingress-controller, oidc-auth-apps, snmp)
     - auto apply platform-integ-apps
     - auto upload cert-manager, platform-integ-apps, oidc-auth-apps
     - create and delete cinder-volume in stx-openstack
     - application update (snmp)
    
    Change-Id: Ic83fbd25d23ae34889cb288330ec448f920bda39
    Depends-On: Ibe994411fee55c84fa86770fad5497040f13b78f
    Depends-On: I41858c831a4af564dbdf38934d51d34489bf8a9a
    Depends-On: I533f2bd41c5627f83a004ffbf4543dd3c26d06b7
    Story: 2007960
    Task: 40463
    Task: 41291
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>

commit ad1188203e29dfe7d106efe449b08f222d0ecf1a
Author: Sabeel Ansari <Sabeel.Ansari@windriver.com>
Date:   Tue Dec 22 12:27:56 2020 -0500

Use deployment namespace for platform certificates
    
    Instead of using kube-system namespace, we want to use
    deployment namespace for platform certificates. This commit
    includes:
    - Creation of namespace during upgrades
    - Changing namespace that cert-mon monitors
    
    Depends-on: https://review.opendev.org/#/c/768241/
    
    Change-Id: Ieac392ed6d560be77327dd9e713fae01b17fda04
    Story: 2007361
    Task: 41165
    Signed-off-by: Sabeel Ansari <Sabeel.Ansari@windriver.com>

commit 20a761a073f5034d2548ee5edb4b04de63edd462
Author: Angie Wang <angie.wang@windriver.com>
Date:   Wed Jan 13 12:35:18 2021 -0500

Clean up helm and armada directories from old release
    
    Delete the /opt/platform/helm and /opt/platform/armada
    directories from the old release at the stage of
    completing upgrade.
    
    Tested: AIO-SX and AIO-DX upgrades
    
    Change-Id: Ie2a582877135ac387c20bae4df7a6a6f244a0c3e
    Closes-Bug: 1910801
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 576bdca7406712d2465ff36e216970e3da87152a
Author: Angie Wang <angie.wang@windriver.com>
Date:   Tue Jan 12 13:23:18 2021 -0500

Ensure Armada lock is released
    
    If the apply/remove of an application is failed due to
    an exception of the Armada request or an abnormal exit
    of the Armada request, the Armada lock cannot be released
    by Armada which causes the subsequent re-apply of the
    application to fail as it cannot acquire the lock.
    
    Update to delete the lock if an exception occurs during
    Armada request or Armada request exits abnormally.
    
    Closes-Bug: 1911243
    Change-Id: I6ab02474831152b5b2b7302f0799c48d05fef64d
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 2ab82b7262e4881be8ca6997dfcf99a8931911d6
Author: albailey <Al.Bailey@windriver.com>
Date:   Wed Jan 6 19:18:09 2021 -0600

Support passing an ignore alarm list to kube upgrade start API
    
    Health utils support an ignore alarm list.
    The kube_upgrade API makes use of those commands.
    
    Story: 2008137
    Task: 41559
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: I19db852f2e87273551d8a30f4bab470afa420de2

commit 84936cf189215ca60b9210ce27783f6331a98679
Author: Zhipeng Liu <zhipengs.liu@intel.com>
Date:   Mon Jan 11 22:53:55 2021 +0800

Fix etcd AIO-DX upgrade failure issue
    
    The data migration is failed after controller-1 is upgraded
    (system host-upgrade controller-1) which is due to the error
    “/opt/platform/puppet/20.12/hieradata/static.yaml doesn’t exist.”
    when migrating the hieradata.
    
    Closes-Bug: 1894870
    
    Change-Id: I71a5ef6d43da13039d0607b2457bd5f561704dfe
    Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>

commit efa5f521c30a0d9bce4a8b0381d633f6f32fb4a1
Author: Angie Wang <angie.wang@windriver.com>
Date:   Mon Sep 28 11:25:50 2020 -0400

Configure SQL as helm storage backend
    
    Configmap is the default helmv2 storage backend to store
    release information but its 1MB resource limit prevents
    scaling up stx openstack worker nodes, so we want to use
    SQL as helm storage backend.
    
    To configure SQL backend, generate helm database hieradata
    that will be used in puppet to create helm database. The
    helm database password is stored in keyring which can be
    retrieved in ansible playbook to configure database connection
    address.
    
    System upgrade support:
    The helm DB is new in the release stx5.0, so a password is
    generated for helm user. Helm user and password are written
    into the hieradata of release stx5.0. For AIO-SX upgrade,
    helm DB is created when applying bootstrap puppet manifest
    during ansible upgrade playbook. For two controllers upgrade,
    helm DB is created when applying upgrade puppet manifest
    during controller-1 upgrade. A migration script is created
    to migrate helm releases from configmap to postgresql.
    
    Partial-Bug: 1887677
    Depends-On: https://review.opendev.org/#/c/761642/
    Change-Id: I2f4f414068af297b5f4a3792c061443b7d3bdb32
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit f9b6c1c0a648abcdd6cdbc5a040895b0a3d7d4c9
Author: Sabeel Ansari <Sabeel.Ansari@windriver.com>
Date:   Mon Jan 4 16:49:16 2021 -0500

Update certificate pem files for unit tests
    
    Tox/Zuul failed unit tests due to expiring certificate pem files.
    This commit updates the expired (&extends other certs)
    for a long duration (10yrs).
    
    Closes-Bug: 1909817
    
    Change-Id: I85cfb1896ae62947fb868315ecf1ff6b53fd4e08
    Signed-off-by: Sabeel Ansari <Sabeel.Ansari@windriver.com>

commit b6d64bef3935a58ac7ad946822f29fe7ed0fc316
Author: Takamasa Takenaka <takamasa.takenaka@windriver.com>
Date:   Thu Jan 7 11:05:23 2021 -0300

Support trap_server_port configurable
    
    Add parameter for trap_server_port to allow sysadmin to
    configure snmp trap server port number through
    user helm override.
    
    Story: 2008132
    Task: 41547
    Signed-off-by: Takamasa Takenaka <takamasa.takenaka@windriver.com>
    Depends-On: https://review.opendev.org/769732
    Change-Id: Iac33ae11412852b59f6375abccb81988e85040e1

commit 06ae123d180648eb7b35c12c591198160d646db9
Author: Takamasa Takenaka <takamasa.takenaka@windriver.com>
Date:   Fri Dec 18 17:06:17 2020 -0300

Remove REST API for host-based snmp
    
    -Remove controller related to icommunity/itrapdest API
    -Remove object related to icommunity/itrapdest API
    -Remove endpoint related to icommunity/itrapdest API
    -Remove db accessor related to icommunity/itrapdest API
    -Remove tables icommunity/itrapdest from sysinv
    -Remove data related to icommunity/itrapdest from dcorch
    
    Story: 2008132
    Task: 41380
    Change-Id: Ib1811a1bc507ee88e9a551a055068a3cd9d73a61
    Depends-On: https://review.opendev.org/765381
    Signed-off-by: Takamasa Takenaka <takamasa.takenaka@windriver.com>

commit bdf1888386094ee4a61fb4c78d268e187b1c3677
Author: Zhipeng Liu <zhipengs.liu@intel.com>
Date:   Sat Oct 31 00:54:25 2020 +0800

Enable etcd with security setting.
    
    After upgrade from StarlingX 4.0 to 5.0, etcd will still keep
    insecurity before sending "system upgrade-activate".
    During upgrade activate stage, it will create etcd server/client
    certs and distribute them to all controllers before restart etcd
    and kube-apiserver with security enabled.
    
    Upgrade test pass on both simplex and duplex.
    
    Closes-Bug: 1894870
    
    Depends-on: https://review.opendev.org/#/c/760510/
    Change-Id: I27733a881a267e61502b36627dcab4136de23e3f
    Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>

commit f4dc4a0c54bfacc0f30cf7c5b52fcd1b4ee4e025
Author: Takamasa Takenaka <takamasa.takenaka@windriver.com>
Date:   Wed Dec 2 14:51:43 2020 -0300

Remove puppet entry related to host-based snmp
    
    With the host-based SNMP removal:
    -Remove trap destination entry
    -Remove snmp config entry
    
    Story: 2008132
    Task: 41349
    Change-Id: Ibd0faafe2255f6d325b11ab3383dc5f08f181955
    Depends-On: https://review.opendev.org/765381
    Signed-off-by: Takamasa Takenaka <takamasa.takenaka@windriver.com>

commit 12a73ed1126b3577d6f5fe450b0cfb6c81d8aab4
Author: Andy Ning <andy.ning@windriver.com>
Date:   Wed Dec 23 13:59:04 2020 -0500

Update local puppet cache during runtime manifests apply
    
    This change added support to update local puppet hieradata cache
    during runtime manifests apply. The local cached puppet hieradata
    is used by controller_config during DOR to configure controllers.
    
    Change-Id: Ia5748230854ba54a9d73c7d5b44fdcb87d7cfe95
    Closes-Bug: 1904739
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit 97ad36382ae7abeba0cf1118f3e80683f2d6215a
Author: albailey <Al.Bailey@windriver.com>
Date:   Fri Jan 1 10:40:58 2021 -0600

Update expired unit test certificate
    
    The sysinv unit test certificate expired Dec 22, 2020
    This extends the date to Dec 22 2021.
    
    The unit test header was updated to trigger zuul to verify
    the changes are valid.
    
    A Readme file has been provided which lists the expiry dates
    for all the pem files used for the unit tests. Information
    on how to update the pem files will be added to the readme
    when all the expiry dates are aligned.
    
    Change-Id: I63cdd865a1818756f9ec3f20ebd724d2ad2d5729
    Partial-Bug: 1909817
    Co-Authored-By: Sabeel Ansari <Sabeel.Ansari@windriver.com>
    Signed-off-by: albailey <Al.Bailey@windriver.com>

commit 760e103ee4bf1190d907aa3532797018f832f9d1
Author: XinxinShen <shenxinxin@inspur.com>
Date:   Tue Dec 8 08:51:14 2020 +0800

Update the PCI device section in the API document
    
    An error occurred while using the API interface
    (/v1/devices/{device_id}) in the API documentation.
    After analysis code, I found that the API interface should be
    /v1/pci_devices/{device_id}, and the URL of "Modifies a
    specific PCI device" also has this problem, so I also update
    this part of the API document.
    
    Closes-Bug:#1907074
    
    Signed-off-by: XinxinShen<shenxinxin@inspur.com>
    Change-Id: I636b830f34f9a74e0173d4a53224ec3f0249bba2

commit 596742595fe9cef9359e08c1c668cc1629de0351
Author: YuehuiLei <leiyuehui-s@inspur.com>
Date:   Thu Dec 31 16:32:25 2020 +0800

API documentation error for partitions
    
    An error occurred when using the partitions part
    of the API documentation
    (POST /v1/ihosts/{host_id}/partitions). After analyzing the
    code, I found that the URLs set in the code are /v1/partitions.
    This is inconsistent with the API documentation,
    so update the partitions section of the API documentation.
    
    Closes-Bug: #1897231
    
    Change-Id: I9fc61b86bdfc4cb12acfd4f4cd8ecf8853ab338e
    Signed-off-by: YuehuiLei<leiyuehui@inspur.com>

commit 8f89ba2e0eca77ffd417e0e7f549546b073ffe3c
Author: YuehuiLei <leiyuehui-s@inspur.com>
Date:   Wed Dec 30 11:21:03 2020 +0800

Document error for post /V1/ilvgs interface.
    
    When I was using the /v1/ilvgs POST API, I found that
    the request parameters are mandatory rather than optional.
    
    Closes-Bug: #1897866
    Change-Id: Iffb01013b16e62e1f3d3deb08f2b75132d62fde7
    Signed-off-by: YuehuiLei<leiyuehui@inspur.com>

commit fd2f1b105d2e46a05a60ce4efe650e94a5d26ce7
Author: Pablo Bovina <pablo.bovina@windriver.com>
Date:   Mon Dec 7 09:23:15 2020 -0500

Remove system CLI for Net-SNMP commands
    
    The following commands are removed
    from CLI shell.
    
    snmp-comm-add: Add a new SNMP community.
    snmp-comm-delete: Delete an SNMP community.
    snmp-comm-list: List community strings.
    snmp-comm-show: Show SNMP community attributes.
    snmp-trapdest-add: Create a new SNMP trap destination.
    snmp-trapdest-delete: Delete an SNMP trap destination.
    snmp-trapdest-list: List SNMP trap destinations.
    snmp-trapdest-show: Show a SNMP trap destination.
    
    All the Net-SNMP configuration is responsibility
    of starlingx/snmp-armada-app.
    
    Story: 2008132
    Task: 41367
    
    Change-Id: I649d91d7e43f8b471434be5b433b47e8189600ce
    Depends-On: https://review.opendev.org/766267
    Signed-off-by: Pablo Bovina <pablo.bovina@windriver.com>

commit 41b5fa0b507d85e77912d790fabb854ab788461b
Author: Andy Ning <andy.ning@windriver.com>
Date:   Tue Dec 15 13:31:39 2020 -0500

keep and reuse ssl certificate
    
    Currently when https is disabled, the installed ssl certificate
    is removed from the system. The default self signed certificate
    is installed again once https is enabled.
    
    This change enhanced ssl certificate handling in that:
    - The very first time https is enabled, the default self signed
      certificate is installed not only in fs but also in sysinv.
    - When https is disabled, installed ssl/tpm certificate is no longer
      deleted.
    - When https is enabled, the existing ssl/tpm certificate will be
      used if there is one installed. Otherwise the default self signed
      certificate will be installed (this is the case that https is
      enabled for the very first time).
    
    Change-Id: Iaef7b4acc4badaab617c05dcbd6654ea3d1e126a
    Closes-Bug: 1908437
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit e7c99304ce5acfc97ac29fe4b12fd020aac88bb7
Author: Lu Yao Chen <luyao.chen@windriver.com>
Date:   Wed Dec 16 14:43:40 2020 -0500

Resolving unpacking-non-sequence error
    
    Pylint error dealt with by per line suppression,
    error caused no staticmethod that breaks.
    
    Story: 2007082
    Task: 37996
    
    Signed-off-by: Lu Yao Chen <luyao.chen@windriver.com>
    Change-Id: I8d56ba47ee58edb070e99c5240afac1fd7942d7a

commit 7312b1bf1de72e47bdea9d628f070afa1f4df8a1
Author: albailey <Al.Bailey@windriver.com>
Date:   Thu Dec 10 18:25:11 2020 -0600

Turn off the legacy pip resolver for sysinv
    
    The requirements that had conflicts have been updated.
    
    The local upper-constraints is no longer being used by tox
    since two constraints files with different values for the same
    requirements are not supported.
    
    Additional pylint error codes are being suppressed as
    pylint is now running in python3 and has some newer checks.
    
    Cleans up some linter warnings related to yaml.
    
    Change-Id: I9b5158d59b0791e49c3037a8e823b67fb30c8292
    Related-Bug: #1907678
    Signed-off-by: albailey <Al.Bailey@windriver.com>

commit b5999ed6d180aa499252e869c436e0f2abf8338b
Author: Andy Ning <andy.ning@windriver.com>
Date:   Tue Dec 15 10:40:51 2020 -0500

Enhance puppet hieradata copy during controller config
    
    This change enhanced the copying of puppet hieradata during controller
    config by using rsync to "sync" hieradata to a temp cache directory,
    then rename it to the final cache directory. This is more atomic and
    minimize the chance to have incomplete or corrupted hieradata.
    
    Change-Id: I062ea54507a377e73102f29f40babc3d4fc214a6
    Closes-Bug: 1904739
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit 51f1c359ae36ae5bf0dae0b69ebb0ecdf1235a72
Author: Lu Yao Chen <luyao.chen@windriver.com>
Date:   Tue Dec 15 11:14:35 2020 -0500

Resolving no-value-for-parameters pylint error
    
    With sqlalchemy documentation the following:
    insert(), update() and delete() objects do not
    take in arguments despite pylint errors for E11020
    when calling objects from the same sqlalchemy library
    other instances work without arguments. Using per line
    suppression for the instances that raise pylint errors
    
    Story: 2007082
    Task: 37996
    
    Signed-off-by: Lu Yao Chen <luyao.chen@windriver.com>
    Change-Id: I6639bbca3ef7f1a48c4e242523ee6e62e624cbce

commit 83445495d95fba3fff16d8845e60a487a0ffc4af
Author: Lu Yao Chen <luyao.chen@windriver.com>
Date:   Mon Dec 14 12:17:32 2020 -0500

Resolving bad-except error
    
    Pylint catches bad-except error, in
    api/controllers/v1/interface.py, this part of code was
    not reached, resolving by removing except statement.
    
    Story: 2007082
    Task: 37996
    
    Signed-off-by: Lu Yao Chen <luyao.chen@windriver.com>
    Change-Id: I60ae5b8622e96c43d8f28febb0405259aadad0c0

commit 202902f60bf3483ae898b6f97979565f0c418c07
Author: Tee Ngo <tee.ngo@windriver.com>
Date:   Wed Dec 9 16:34:11 2020 -0500

Consistent creation of registry secret
    
    In this commit, the default-registry-key is created from sysinv
    credentials during the apply of the very first platform app
    (nginx-ingress-controller) in kube-system namespace. This key
    can no longer be removed. It will be used to create registry secret
    for all helm-based applications in their namespaces upon
    applying the application.
    
    Tests in AIOSX:
      - Install the system.
      - Remove cert-manager app, prune containerd image cache to force
        repull of the images from the registry, reapply the app.
      - Remove platform-integ-apps, verify that the default-registry-key
        in kube-system ns is not removed, reapply platform-integ-apps
      - Apply oidc-auth-apps, remove the app, verify that the
        default-registry-key in kube-system ns is not removed.
    
    Closes-Bug: 1906337
    Change-Id: I042b1bf81f44a8c5498661c0bfca219c7150c57a
    Signed-off-by: Tee Ngo <Tee.Ngo@windriver.com>

commit dfe0de2a0b2aecf4e5ea5c5fbbf1a77db4996f8a
Author: Jim Gauld <james.gauld@windriver.com>
Date:   Tue Jul 14 14:06:47 2020 -0400

Support upgrade to Helm v3 with containerized armada
    
    This provides an upgrade script for containerized armada.
    
    This launches armada using Helm v3. This does cleanup of the previous
    tiller-deployment, serviceaccounts and clusterrolebindings.
    
    Story: 2007927
    Task: 40357
    Depends-on: https://review.opendev.org/#/c/741024/
    
    Change-Id: I4b3ba29c5210bcad269ed8dd25e00acafa1c8bb4
    Signed-off-by: Jim Gauld <james.gauld@windriver.com>
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit b339eb000517a2627658da170b8aec0f1c131d7f
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Fri Dec 4 11:35:23 2020 +0200

Fix upgrades that use restore procedure
    
    The restore procedure was recently changed,
    and it has additional steps.
    Upgrades that are based on backup and restore
    procedure don't automate the `complete the restore`
    additional step.
    In this fix, we automate this step when we do a
    `system upgrade-complete` command.
    
    Closes-bug: 1906557
    Change-Id: I93db335ff058f987ed5f10ecac1aa402fe82032f
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit 2b211cf44f8bccd091206103ab72680dfc2cfac2
Author: Takamasa Takenaka <takamasa.takenaka@windriver.com>
Date:   Tue Nov 3 15:17:00 2020 -0300

Add variables for snmp in fm.conf
    
    Snmp trap client needs the following three variables
    to connect to snmp trap server.
    - trap_server_ip
    - trap_server_port
    - snmp_enabled
    This commit is to update sysinv puppet hieradata with
    the value of snmp_enabled based on whether the snmp
    app is applied.
    trap_server_ip and trap_server_port are fixed.
    snmp_enabled takes 1(True)/0(False) depends on snmp
    armada app is applied or not (1 when applied).
    
    Change-Id: I76f9ef3e9ca6a98c0cd37a72f7aaccd10353b26b
    Story: 2008132
    Task: 41205
    Signed-off-by: Takamasa Takenaka <takamasa.takenaka@windriver.com>

commit 045e6c69e822369266831850f50d80da9b86b22c
Author: Martin, Chen <haochuan.z.chen@intel.com>
Date:   Fri Nov 6 15:50:00 2020 +0800

Remove rook-ceph app constants
    
    Change-Id: I72f2dc74d8891e04ff693bac455f7940ddae0fb1
    Closes-Bug: 1896628
    Signed-off-by: Martin, Chen <haochuan.z.chen@intel.com>

commit 083a073332d1654ee65f9e608e2974cedee0902f
Author: Robert Church <robert.church@windriver.com>
Date:   Mon Oct 19 00:41:40 2020 -0400

Add support for disk wiping without GPT formatting
    
    To support a Rook based Ceph cluster, provide an option when wiping
    disks to skip adding a partition table. This allows preparing disks on
    all hosts during provisioning from the active controller that will be
    formatted and run as OSDs when the Rook k8s application is applied.
    
    Change-Id: I93a9770b0d78ddddc01fe7956d7ad7058acc8e71
    Story: 2005527
    Task: 41127
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit 42fde43da5a4af5bba5363dde72beee71435f69f
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Fri Sep 11 18:53:49 2020 +0300

Trigger reapply of apps with reapply support only if VIM is up
    
    When stx-openstack needs to be automatically reapplied,
    it has to wait for the VIM services to be enabled in order
    to be applied successfully.
    
    Closes-Bug: 1879018
    Depends-On: https://review.opendev.org/#/c/751358/
    Change-Id: I4d310f2faba71e6102dc6d6c7ea98ddf63d82633
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit 8c8890eecda59d4c5cdf288d8d5b2e59b1488d5b
Author: YuehuiLei <leiyuehui-s@inspur.com>
Date:   Tue Oct 13 10:10:49 2020 +0800

update API documentation for Ceph Monitors
    
    An error occurred when using the Ceph Monitors
    of the API documentation
    (DELETE /v1/ceph_mon/{ceph_mon_id}). After analyzing the
    code, I found that the URLs set in the code are
    /v1/ceph_mon/{host_uuid}.
    This is inconsistent with the API documentation,
    so update the partitions section of the API documentation.
    
    code:https://github.com/starlingx/config/blob/
    99616a7f72125c6fd5c86e5dd90af3d87ef3902c/sysinv/
    cgts-client/cgts-client/cgtsclient/v1/ceph_mon.py#L55
    
    Closes-Bug: #1899550
    Change-Id: Ie1cc900a20d37246afeed740c9f67d5be405bae7
    Signed-off-by: YuehuiLei<leiyuehui@inspur.com>

commit 44d7f1eea99050e4e8e7c6de34fb2a17b8875a46
Author: YuehuiLei <leiyuehui-s@inspur.com>
Date:   Mon Oct 12 15:08:31 2020 +0800

Add file storage backend parameter error
    
    When I used the /v1/storage_file interface to
    create a file storage backend, I found that
    the request parameters backend and services
    are mandatory and not optional.
    
    Closes-Bug: #1899402
    Change-Id: I9e78dc3f2623521185b7c6605015490ff0e09098
    Signed-off-by: YuehuiLei<leiyuehui@inspur.com>

tags:

added: in-f-centos8

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-14: Change abandoned on config (f/centos8)

#14

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793696

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-14:

#15

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793460

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1920653

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.