Bug #1853017 “After keystone admin password changed, user accoun...” : Bugs : StarlingX

Revision history for this message

Peng Peng (ppeng) wrote on 2019-11-18:

#1

ALL_NODES_20191117.044912.tar Edit (21.1 MiB, application/x-tar)

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-11-18:

#2

Marking as stx.3.0 / high priority - appears to have been broken in the last week.

Changed in starlingx:
importance:	Undecided → High
status:	New → Triaged
tags:	added: stx.3.0 stx.security

Ghada Khalil (gkhalil) on 2019-11-18

Changed in starlingx:
assignee:	nobody → yong hu (yhu6)
tags:	added: stx.distro.openstack

Yang Liu (yliu12) on 2019-11-19

tags:

added: stx.retestneeded

Revision history for this message

yong hu (yhu6) wrote on 2019-11-19:

#3

The issue was reproduced at:

the password for admin was indeed changed by following command: openstack user set --password 'newpassword' admin, and it also updated to "keyring". However it was not timely reflected to sysinv, so auth for "system" commands would fail if the user name and password were not *explicitly" set by:
--os-username 'admin' --os-password 'newpassword'

Will look into the cause and what recent change led to this issue.

Revision history for this message

yong hu (yhu6) wrote on 2019-11-19:

#4

The root cause was found:
After changing the password for "admin", it took effect in keyring. That's why "keyring get CGCS admin" returns the correct password.
However, the local environment OS_PASSWORD (which was set by "source /etc/platform/openrc") still held the old password.

The solution is to re-apply "source /etc/platform/openrc", which should update OS_PASSWORD by
```
export OS_PASSWORD=`TERM=linux /opt/platform/.keyring/19.09/.CREDENTIAL 2>/dev/null`
```

In addition, I checked STX.2.0, the same behavior was there as what we are seeing now.

So, this won't be an issue.

Changed in starlingx:
assignee:	yong hu (yhu6) → Peng Peng (ppeng)

Revision history for this message

Peng Peng (ppeng) wrote on 2019-11-19:

#5

The TC does not use "source /etc/platform/openrc" prior to run system cmd and it used new password as log shows,
system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-11-19:

#6

Marking as Invalid based on Yong's investigation.

Changed in starlingx:
status:	Triaged → Invalid
assignee:	Peng Peng (ppeng) → yong hu (yhu6)

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-11-19:

#7

Assigning back to Yong since our policy is to keep the bug assigned to the development prime

Revision history for this message

yong hu (yhu6) wrote on 2019-11-20:

#8

collected-bash-history.png Edit (390.2 KiB, image/png)

@peng, by specifying the updated password in commands explicitly, did your TCs work or not?

I tried this way as well on my side and the command worked.

In addition, in the bash history from the log tarball you attached, I saw the new password was "xxxxxx". Was it expected?

Revision history for this message

Peng Peng (ppeng) wrote on 2019-11-20:

#9

ALL_NODES_20191120.190415.tar Edit (22.7 MiB, application/x-tar)

Reproduced on 2019-11-19_20-00-00 (wcp_63-66)

[sysadmin@controller-1 ~(keystone_admin)]$ openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password '!Li69nux*9' admin
[sysadmin@controller-1 ~(keystone_admin)]$ keyring get CGCS admin
!Li69nux*9
[sysadmin@controller-1 ~(keystone_admin)]$ sudo vi /var/log/bash.log
[sysadmin@controller-1 ~(keystone_admin)]$ openstack user list
The request you have made requires authentication. (HTTP 401) (Request-ID: req-59069eea-2bf5-43db-88f8-1bc6e08277a6)
[sysadmin@controller-1 ~(keystone_admin)]$ system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list
The account is locked for user: 3480356374d4409bab26d72d1fdf4bee. (HTTP 401) (Request-ID: req-1ff0cc2c-c4ef-4c61-af75-f5bef496e62b)

And did not see "xxxxx" in bash.log

2019-11-20T19:00:22.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password '!Li69nux*9' admin
2019-11-20T19:00:34.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 keyring get CGCS admin
2019-11-20T19:00:58.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 sudo vi /var/log/bash.log
2019-11-20T19:01:22.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 openstack user list
2019-11-20T19:01:37.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list
2019-11-20T19:01:51.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 sudo vi /var/log/bash.log

Reproduced on 2019-11-19_20-00-00 (wcp_63-66)

[sysadmin@controller-1 ~(keystone_admin)]$ openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password '!Li69nux*9' admin
[sysadmin@controller-1 ~(keystone_admin)]$ keyring get CGCS admin
!Li69nux*9
[sysadmin@controller-1 ~(keystone_admin)]$ sudo vi /var/log/bash.log
[sysadmin@controller-1 ~(keystone_admin)]$ openstack user list
The request you have made requires authentication. (HTTP 401) (Request-ID: req-59069eea-2bf5-43db-88f8-1bc6e08277a6)
[sysadmin@controller-1 ~(keystone_admin)]$ system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list
The account is locked for user: 3480356374d4409bab26d72d1fdf4bee. (HTTP 401) (Request-ID: req-1ff0cc2c-c4ef-4c61-af75-f5bef496e62b)

And did not see "xxxxx" in bash.log

2019-11-20T19:00:22.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password '!Li69nux*9' admin
2019-11-20T19:00:34.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 keyring get CGCS admin
2019-11-20T19:00:58.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 sudo vi /var/log/bash.log
2019-11-20T19:01:22.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 openstack user list
2019-11-20T19:01:37.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list
2019-11-20T19:01:51.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 sudo vi /var/log/bash.log

Peng Peng (ppeng) on 2019-11-20

Changed in starlingx:
status:	Invalid → Confirmed

Revision history for this message

yong hu (yhu6) wrote on 2019-11-21:

#10

6_auth_failures.png Edit (576.4 KiB, image/png)

It turned out this is a security enhancement done by this patch (merged on Sept 18):
https://review.opendev.org/#/c/682137

After trying over 5 times with incorrect (old) password, the account will be locked for 1800 seconds.

+ keystone_config {
+ 'security_compliance/lockout_duration': value => 1800;
+ 'security_compliance/lockout_failure_attempts': value => 5;
+ }

Inside your log tarball, keystone-all.log indicated there were 6 authorization failures before the account locked. See the attachment.

to avoid the issue, right after the password is changed, you can apply the new password in your TC by:
export OS_PASSWORD=`TERM=linux /opt/platform/.keyring/19.09/.CREDENTIAL 2>/dev/null`

or explicitly put the updated password in all following test commands.

=================================================================================
BTW: the reason I didn't reproduce this behavior (of account locked) a few days ago was that I did not run commands for over 5 times with obsolete password. At that time, I only tried 1~2 times.

=================================================================================

So in summary, this is not an issue, but an enhanced security feature.

Revision history for this message

ANIRUDH GUPTA (anyrude10) wrote on 2019-11-25:

#11

I am facing the Account locked Issue on StarlingX 2.0 Release Branch, even if I have not used any Incorrect Password.

Can someone please update how to disable this feature?

Currently my account is locked, how can I unlock it?

Revision history for this message

yong hu (yhu6) wrote on 2019-11-25:

#12

with root permission, you can remove these 2 lines in /etc/keystone/keystone.conf:

lockout_failure_attempts = 5
lockout_duration = 1800

After that, restart keystone services by killing the first process searched by the following grep.
$ ps aux | grep keystone-public

Revision history for this message

Yang Liu (yliu12) wrote on 2019-11-25:

#13

Hi Yong,

The problem is after admin password change, the account got locked itself without any user operations.

Yes something was trying to use the old password that caused the account lockout, and investigation is needed on which stx component is doing that.

Revision history for this message

Jerry Sun (jerry-sun-u) wrote on 2019-11-25:

#14

Looking at the tarball for the logs attached by Peng, it looks like after the password was changed in bash.log, there is no more activity from registry-token-server in daemon.log. This leads me to believe that something else must be triggering the locking of the account. There are some activity from token server in daemon.log but that was before the password change.

I also tried authenticating to the token server with incorrect credentials on a system without changing the password. This is to try and create an environment where the registry/token server holds incorrect keystone credentials. The admin account did not get locked which means token server does not spam requests at keystone with incorrect credentials until it locks.

Revision history for this message

Yang Liu (yliu12) wrote on 2019-11-27:

#15

Note that this issue seems to be only happening on the first admin password change.
Account will be locked for some time and then unlock itself.

Workaround is just to wait...

After that, the subsequent admin password changes are working as expected.

Revision history for this message

yong hu (yhu6) wrote on 2019-11-28:

#16

Thanks for update, @Yang.
While making the first time of admin password change, have we already done "system application-apply stx-openstack" in the background?

In addition, the lock period of time should be 30 mins, isn't it?

Revision history for this message

yong hu (yhu6) wrote on 2019-12-02:

#17

The issue was root-caused.
In short, password for "admin" in 2 k8s secrets ("default-registry-key" and "registry-local-secret" ) was not updated after the operator "sysadmin" changed the password for "admin" user by "openstack" client.

Though the updated password in keyring and keystone (:5000), there was never chance to refresh these 2 secrets, and they kept using the default password set in ansible playbook (say. localhost.yml).
So, whenever docker client pulls image and requires authentication via "registry-token-server" which furthers turns to keystone (:5000), old/default password for "admin" triggers to authentication failure.

The attachment #1 is the packet I captured by TCPDUMP when the failures happened. "GopherCloud" inside "registry-token-server/keystone/access.go" failed to get auth from keystone because it was using the default (and obsolete) password "Local.123" (set from Ansible playbook).

The attachment #2 is the code pieces in "~/containers/registry-token-server/src/keystone/access.go" which was using the obsolete password from request (from k8s secret "default-registry-key").

After updating these passwords in 2 secrets above, the authentication went on correctly.

Revision history for this message

yong hu (yhu6) wrote on 2019-12-02:

#18

1. gopherCloud_access_keystone_with_old_password.png Edit (43.2 KiB, image/png)

Revision history for this message

yong hu (yhu6) wrote on 2019-12-02:

#19

2. registry-token-server-access-keystone.png Edit (140.1 KiB, image/png)

Revision history for this message

yong hu (yhu6) wrote on 2019-12-02:

#20

If the password for "admin" is changed, any deployment with "default-registry-key" secret or "registry-local-secret" will fail to authenticate.

for example, in "charts/ingress/charts/helm-toolkit/templates/snippets/_kubernetes_pod_rbac_serviceaccount.tpl", line 47:
imagePullSecrets:
- name: default-registry-key

Revision history for this message

Peng Peng (ppeng) wrote on 2019-12-02:

#21

ALL_NODES_20191202.155620.tar Edit (111.3 MiB, application/x-tar)

Issue reproduced on DC labs at load: 2019-11-21_20-00-00
After admin pw changed,

openstack user set --password '!Li69nux*9' admin
[sysadmin@controller-1 ~(keystone_admin)]$ keyring get CGCS admin
!Li69nux*9

[sysadmin@controller-1 ~(keystone_admin)]$ system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[fd01:1::2]:5000/v3 - -os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list
The account is locked for user: 27596cc96f034c34b5632c8d8fa52837. (HTTP 401) (Request-ID: req-c70ebb25-8cf7-4756-94ac-8fbcf555c6e9)
[sysadmin@controller-1 ~(keystone_admin)]$ date
Sat Nov 30 00:22:10 UTC 2019

After more that 2 days, the account still showed locked.
[sysadmin@controller-1 ~(keystone_admin)]$ system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[fd01:1::2]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list
The account is locked for user: 27596cc96f034c34b5632c8d8fa52837. (HTTP 401) (Request-ID: req-afbc26ea-16ac-4123-8753-bd284e2fbdbf)
[sysadmin@controller-1 ~(keystone_admin)]$ date
Mon Dec 2 16:14:58 UTC 2019

yong hu (yhu6) on 2019-12-03

tags:

added: stx.config

Revision history for this message

Yang Liu (yliu12) wrote on 2019-12-05:

#22

In Distributed Cloud environment mentioned in Peng's comments, the account was never unlocked. Perhaps it's doing something differently than standalone systems.

To answer previous question from Yong in #16, stx-openstack was not applied when this was seen.

Revision history for this message

yong hu (yhu6) wrote on 2019-12-09:

#23

As mentioned, local registry key was not updated after admin's password was changed.
In this case, whoever tried to pull docker image with "imagePullSecrets" would trigger the authentication failure.

In the attached log ~/var/log/keystone/keystone-all.log, there were indeed "subcloud" related error messages, but not sure if they were the consequence of authentication failures (and user account locked) or other causes partially.

@Yang and Peng, while we are working on the fixing patch, if you want, you can take following steps to update k8s secrets for local registry: default-registry-key and registry-local-secret.

#1. list out secrets for local registry.

kubectl -n kube-system get secrets | grep registry

#2. for encode your new user and password, by the cmd below, for example, my new password is !Li69nux*9

echo -n 'admin:!Li69nux*9' | base64

#3. for updating default-registry-key, encode the whole auth data (json format), here "YWRtaW46IUxpNjludXgqOQ==" is the output from step #2 above:

echo -n '{"auths": {"registry.local:9001": {"auth": "YWRtaW46IUxpNjludXgqOQ=="}}}' | base64

#4.Use step#3 encoded auth_data to replace value of ".dockerconfigjson:"in "default-registry-key": eyJhdXRocyI6IHsicmVnaXN0cnkubG9jYWw6OTAwMSI6IHsiYXV0aCI6ICJZV1J0YVc0NklVeHBOamx1ZFhncU9RPT0ifX19

kubectl -n kube-system edit secret default-registry-key

#5.for updating registry-local-secret, encode the whole auth data (json format), here "YWRtaW46IUxpNjludXgqOQ==" is the output from step #2 above:

echo -n '{"auths":{"registry.local:9001":{"username":"admin","password":"!Li69nux*9","auth":"YWRtaW46IUxpNjludXgqOQ=="}}}' | base64

#4.Use step5 encoded auth_data to replace value of ".dockerconfigjson:"in "registry-local-secret": eyJhdXRocyI6eyJyZWdpc3RyeS5sb2NhbDo5MDAxIjp7InVzZXJuYW1lIjoiYWRtaW4iLCJwYXNzd29yZCI6IiFMaTY5bnV4KjkiLCJhdXRoIjoiWVdSdGFXNDZJVXhwTmpsdWRYZ3FPUT09In19fQ==

kubectl -n kube-system edit secret registry-local-secret

yong hu (yhu6) on 2019-12-12

Changed in starlingx:
status:	Confirmed → In Progress

Revision history for this message

yong hu (yhu6) wrote on 2019-12-12:

#24

https://review.opendev.org/#/c/698442/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-12-18: Change abandoned on config (master)

#25

Change abandoned by Lin Shuicheng (<email address hidden>) on branch: master
Review: https://review.opendev.org/698442
Reason: New patch is uploaded: https://review.opendev.org/699547

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-01-07: Fix merged to stx-puppet (master)

#26

Reviewed: https://review.opendev.org/700677
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=a36b4823b7dbacdc4a795e3e3978fbed6e952ced
Submitter: Zuul
Branch: master

commit a36b4823b7dbacdc4a795e3e3978fbed6e952ced
Author: Shuicheng Lin <email address hidden>
Date: Fri Dec 27 11:52:05 2019 +0800

Enable keystone to send out event notification

    notification driver need be set for keystone, in order to send out
    notification. The driver value could be "messaging, messagingv2,
    routing, log, test, noop (multi valued)".
    This is in order to monitor admin password change in sysinv.

Partial-Bug: 1853017

Change-Id: Ie55a16723e92ea85a615477788ca922cca3bfe42
Signed-off-by: Shuicheng Lin <email address hidden>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-04: Fix merged to upstream (master)

#27

Reviewed: https://review.opendev.org/699547
Committed: https://git.openstack.org/cgit/starlingx/upstream/commit/?id=d1294d7e679460661b42af64c87480b429a3366c
Submitter: Zuul
Branch: master

commit d1294d7e679460661b42af64c87480b429a3366c
Author: Shuicheng Lin <email address hidden>
Date: Wed Dec 18 12:47:23 2019 +0800

Update Keyring password info before sending out notification

    Need update password before send out notification. Otherwise, any
    process which monitors the "updated" notification will still get old
    password from Keyring.

Partial-Bug: 1853017

Change-Id: Id1c94fedca41abe96c7b38880bf325d4a25a95eb
Signed-off-by: Shuicheng Lin <email address hidden>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-04: Fix proposed to upstream (f/centos8)

#28

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/705854

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-05: Fix merged to config (master)

#29

Reviewed: https://review.opendev.org/698442
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=8ab1e2d7c624f83d72efcbfcddcdffa567a26bad
Submitter: Zuul
Branch: master

commit 8ab1e2d7c624f83d72efcbfcddcdffa567a26bad
Author: Shuicheng Lin <email address hidden>
Date: Wed Dec 11 16:37:03 2019 +0800

Audit local registry secret info when there is user update in keystone

    local registry uses admin's username&password for authentication.
    And admin's password could be changed by openstack client cmd. It will
    cause auth info in secrets obsolete, and lead to invalid authentication
    in keystone.
    To keep secrets info updated, keystone event notification is enabled.
    And event notification listener is added in sysinv. So when there is
    user password change, a user update event will be sent out by keystone.
    And sysinv will call function audit_local_registry_secrets to check
    whether kubernetes secret info need be updated or not.

A periodic task is added also to ensure secrets are always synced, in
case notification is missed or there is failure in handle notification.

oslo_messaging is added to tox's requirements.txt to avoid tox failure.
The version is based on global-requirements.txt from Openstack Train.

    Test:
    Pass deployment and secrets could be updated automatically with new auth
    info.
    Pass host-swact in duplex mode.

    Closes-Bug: 1853017
    Depends-On: https://review.opendev.org/700677
    Depends-On: https://review.opendev.org/699547
    Change-Id: I959b65288e0834b989aa87e40506e41d0bba0d59
    Signed-off-by: Shuicheng Lin <email address hidden>

Changed in starlingx:
status:	In Progress → Fix Released
tags:	added: in-f-centos8

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-05: Fix merged to upstream (f/centos8)

#30

Reviewed: https://review.opendev.org/705854
Committed: https://git.openstack.org/cgit/starlingx/upstream/commit/?id=41f7bff21b83512640f148fa208485beec85eeeb
Submitter: Zuul
Branch: f/centos8

commit 333380daef7623eeb8eed16245d3700227d3003c
Author: Kristal Dale <email address hidden>
Date: Fri Jan 17 13:30:49 2020 -0800

Update landing pages for docs and release notes:

    - Use updated project name in titles/text
    - Correct text for link to Storyboard (docs)
    - Correct capitalization in section headings
    - Correct formatting for section headings
    - Update project name in link to release notes, api-ref
    - Update project name in config for docs/releasenotes/api-ref

Story:2007193
Task:38347

Change-Id: I52a53260042e6924673230486476c394001683ca
Signed-off-by: Kristal Dale <email address hidden>

commit 8c7def7074be1a51fc9e01dcdafd8c99cb9115dd
Author: Don Penney <email address hidden>
Date: Wed Jan 1 18:38:19 2020 -0500

Skip UT in python-keystoneclient build

    The python-keystoneclient unit test code uses a token expiry of Jan 1,
    2020, which causes a failure as of that date. Skip running the tests
    as part of the build to avoid this issue.

    Change-Id: I85e780c6f40beb19d1527282f30b38879ccfc512
    Closes-Bug: 1858049
    Signed-off-by: Don Penney <email address hidden>

commit d1294d7e679460661b42af64c87480b429a3366c
Author: Shuicheng Lin <email address hidden>
Date: Wed Dec 18 12:47:23 2019 +0800

Update Keyring password info before sending out notification

    Need update password before send out notification. Otherwise, any
    process which monitors the "updated" notification will still get old
    password from Keyring.

Partial-Bug: 1853017

Change-Id: Id1c94fedca41abe96c7b38880bf325d4a25a95eb
Signed-off-by: Shuicheng Lin <email address hidden>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-11: Fix proposed to stx-puppet (r/stx.3.0)

#31

Fix proposed to branch: r/stx.3.0
Review: https://review.opendev.org/707154

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-11: Fix proposed to upstream (r/stx.3.0)

#32

Fix proposed to branch: r/stx.3.0
Review: https://review.opendev.org/707155

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-11: Fix proposed to config (r/stx.3.0)

#33

Fix proposed to branch: r/stx.3.0
Review: https://review.opendev.org/707156

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-11: Fix merged to stx-puppet (r/stx.3.0)

#34

Reviewed: https://review.opendev.org/707154
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=f26899071befc6330dc58fa77926425d0b67e228
Submitter: Zuul
Branch: r/stx.3.0

commit f26899071befc6330dc58fa77926425d0b67e228
Author: Shuicheng Lin <email address hidden>
Date: Fri Dec 27 11:52:05 2019 +0800

Enable keystone to send out event notification

    notification driver need be set for keystone, in order to send out
    notification. The driver value could be "messaging, messagingv2,
    routing, log, test, noop (multi valued)".
    This is in order to monitor admin password change in sysinv.

Partial-Bug: 1853017

    Change-Id: Ie55a16723e92ea85a615477788ca922cca3bfe42
    Signed-off-by: Shuicheng Lin <email address hidden>
    (cherry picked from commit a36b4823b7dbacdc4a795e3e3978fbed6e952ced)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-11: Fix merged to upstream (r/stx.3.0)

#35

Reviewed: https://review.opendev.org/707155
Committed: https://git.openstack.org/cgit/starlingx/upstream/commit/?id=52d7be2f5947d67918c3da0cf8bd2291d2c87232
Submitter: Zuul
Branch: r/stx.3.0

commit 52d7be2f5947d67918c3da0cf8bd2291d2c87232
Author: Shuicheng Lin <email address hidden>
Date: Wed Dec 18 12:47:23 2019 +0800

Update Keyring password info before sending out notification

    Need update password before send out notification. Otherwise, any
    process which monitors the "updated" notification will still get old
    password from Keyring.

Partial-Bug: 1853017

    Change-Id: Id1c94fedca41abe96c7b38880bf325d4a25a95eb
    Signed-off-by: Shuicheng Lin <email address hidden>
    (cherry picked from commit d1294d7e679460661b42af64c87480b429a3366c)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-12: Fix merged to config (r/stx.3.0)

#36

Reviewed: https://review.opendev.org/707156
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=1c3ba7706559eb99357773356230240adb3aa1ea
Submitter: Zuul
Branch: r/stx.3.0

commit 1c3ba7706559eb99357773356230240adb3aa1ea
Author: Shuicheng Lin <email address hidden>
Date: Wed Dec 11 16:37:03 2019 +0800

Audit local registry secret info when there is user update in keystone

    local registry uses admin's username&password for authentication.
    And admin's password could be changed by openstack client cmd. It will
    cause auth info in secrets obsolete, and lead to invalid authentication
    in keystone.
    To keep secrets info updated, keystone event notification is enabled.
    And event notification listener is added in sysinv. So when there is
    user password change, a user update event will be sent out by keystone.
    And sysinv will call function audit_local_registry_secrets to check
    whether kubernetes secret info need be updated or not.

A periodic task is added also to ensure secrets are always synced, in
case notification is missed or there is failure in handle notification.

oslo_messaging is added to tox's requirements.txt to avoid tox failure.
The version is based on global-requirements.txt from Openstack Train.

    Test:
    Pass deployment and secrets could be updated automatically with new auth
    info.
    Pass host-swact in duplex mode.

    Closes-Bug: 1853017
    Depends-On: https://review.opendev.org/707154
    Depends-On: https://review.opendev.org/707155
    Change-Id: I959b65288e0834b989aa87e40506e41d0bba0d59
    Signed-off-by: Shuicheng Lin <email address hidden>
    (cherry picked from commit 8ab1e2d7c624f83d72efcbfcddcdffa567a26bad)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-13: Fix proposed to upstream (r/stx.2.0)

#37

Fix proposed to branch: r/stx.2.0
Review: https://review.opendev.org/707522

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-13: Fix proposed to config (r/stx.2.0)

#38

Fix proposed to branch: r/stx.2.0
Review: https://review.opendev.org/707523

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-13:

#39

Fix proposed to branch: r/stx.2.0
Review: https://review.opendev.org/707524

Ghada Khalil (gkhalil) on 2020-02-13

tags:

added: in-r-stx30 stx.4.0

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2020-02-14:

#40

As recommended by Yong Hu in https://bugs.launchpad.net/starlingx/+bug/1853093 , I am tagging this bug for stx.2.0 as well since the same code issue exists in that release. The fix may also address #1853093 (not 100% confirmed).

tags:

added: stx.2.0

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-20: Fix merged to config (r/stx.2.0)

#41

Reviewed: https://review.opendev.org/707523
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=35d8ccb8a7adc9b3b2b46373ca9f89d08c94cd6a
Submitter: Zuul
Branch: r/stx.2.0

commit 35d8ccb8a7adc9b3b2b46373ca9f89d08c94cd6a
Author: Shuicheng Lin <email address hidden>
Date: Thu Feb 13 10:58:21 2020 +0800

Enable keystone to send out event notification

    notification driver need be set for keystone, in order to send out
    notification. The driver value could be "messaging, messagingv2,
    routing, log, test, noop (multi valued)".
    This is in order to monitor admin password change in sysinv.

Partial-Bug: 1853017
Partial-Bug: 1853093

Signed-off-by: Shuicheng Lin <email address hidden>
(cherry picked from commit a36b4823b7dbacdc4a795e3e3978fbed6e952ced)

Change-Id: Ia6661eaf294f97debca2cdb463455a23639892c1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-21: Fix merged to upstream (r/stx.2.0)

#42

Reviewed: https://review.opendev.org/707522
Committed: https://git.openstack.org/cgit/starlingx/upstream/commit/?id=dfe155136d3337a18bfbd19a7fb6f57614d455ba
Submitter: Zuul
Branch: r/stx.2.0

commit dfe155136d3337a18bfbd19a7fb6f57614d455ba
Author: Shuicheng Lin <email address hidden>
Date: Wed Dec 18 12:47:23 2019 +0800

Update Keyring password info before sending out notification

    Need update password before send out notification. Otherwise, any
    process which monitors the "updated" notification will still get old
    password from Keyring.

Partial-Bug: 1853017
Partial-Bug: 1853093

    Change-Id: Id1c94fedca41abe96c7b38880bf325d4a25a95eb
    Signed-off-by: Shuicheng Lin <email address hidden>
    (cherry picked from commit d1294d7e679460661b42af64c87480b429a3366c)

Revision history for this message

Peng Peng (ppeng) wrote on 2020-02-21:

#43

Verified on
Lab: WCP_112
Load: 2020-02-20_20-00-00

[sysadmin@controller-0 ~(keystone_admin)]$ keyring get CGCS admin
Li69nux*
[sysadmin@controller-0 ~(keystone_admin)]$
[sysadmin@controller-0 ~(keystone_admin)]$
[sysadmin@controller-0 ~(keystone_admin)]$ openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password '!Li69nux*9' admin
[sysadmin@controller-0 ~(keystone_admin)]$
[sysadmin@controller-0 ~(keystone_admin)]$
[sysadmin@controller-0 ~(keystone_admin)]$ keyring get CGCS admin
!Li69nux*9
[sysadmin@controller-0 ~(keystone_admin)]$ system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list
+--------------------------------------+-----------------------------+--------------+--------+
| uuid | service_group_name | hostname | state |
+--------------------------------------+-----------------------------+--------------+--------+
| d14f859a-f851-4598-8dd9-1c8d4fb42f0f | cloud-services | controller-0 | active |
| 3cc235b9-f4b9-4b95-83af-c89684c73396 | controller-services | controller-0 | active |
| 2c70019c-1eeb-4f55-8525-e0277ce1ed76 | directory-services | controller-0 | active |
| 3ce317e2-894a-4cef-89c8-fa83bde0af91 | oam-services | controller-0 | active |
| 76b5f3c0-639b-4b4f-984d-aabaf546cbe3 | patching-services | controller-0 | active |
| 3d10d9ef-995c-4e08-acb4-4f1c6cf79f92 | storage-monitoring-services | controller-0 | active |
| 67f02d0c-6ff5-44ce-a657-fb73a241919e | storage-services | controller-0 | active |
| ca8beaad-a905-4595-bc2c-3d810749064d | vim-services | controller-0 | active |
| 1b893b0b-05b4-42e5-806b-3dfc72a526c5 | web-services | controller-0 | active |
+--------------------------------------+-----------------------------+--------------+--------+
[sysadmin@controller-0 ~(keystone_admin)]$

Verified on
Lab: WCP_112
Load: 2020-02-20_20-00-00

[sysadmin@controller-0 ~(keystone_admin)]$ keyring get CGCS admin
Li69nux*
[sysadmin@controller-0 ~(keystone_admin)]$
[sysadmin@controller-0 ~(keystone_admin)]$
[sysadmin@controller-0 ~(keystone_admin)]$ openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password '!Li69nux*9' admin
[sysadmin@controller-0 ~(keystone_admin)]$
[sysadmin@controller-0 ~(keystone_admin)]$
[sysadmin@controller-0 ~(keystone_admin)]$ keyring get CGCS admin
!Li69nux*9
[sysadmin@controller-0 ~(keystone_admin)]$ system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list
+--------------------------------------+-----------------------------+--------------+--------+
| uuid                                 | service_group_name          | hostname     | state  |
+--------------------------------------+-----------------------------+--------------+--------+
| d14f859a-f851-4598-8dd9-1c8d4fb42f0f | cloud-services              | controller-0 | active |
| 3cc235b9-f4b9-4b95-83af-c89684c73396 | controller-services         | controller-0 | active |
| 2c70019c-1eeb-4f55-8525-e0277ce1ed76 | directory-services          | controller-0 | active |
| 3ce317e2-894a-4cef-89c8-fa83bde0af91 | oam-services                | controller-0 | active |
| 76b5f3c0-639b-4b4f-984d-aabaf546cbe3 | patching-services           | controller-0 | active |
| 3d10d9ef-995c-4e08-acb4-4f1c6cf79f92 | storage-monitoring-services | controller-0 | active |
| 67f02d0c-6ff5-44ce-a657-fb73a241919e | storage-services            | controller-0 | active |
| ca8beaad-a905-4595-bc2c-3d810749064d | vim-services                | controller-0 | active |
| 1b893b0b-05b4-42e5-806b-3dfc72a526c5 | web-services                | controller-0 | active |
+--------------------------------------+-----------------------------+--------------+--------+
[sysadmin@controller-0 ~(keystone_admin)]$

tags:

removed: stx.retestneeded

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-24: Fix merged to config (r/stx.2.0)

#44

Reviewed: https://review.opendev.org/707524
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=7e5e887eb38042a0679ec100ca5d4016c6efe2bc
Submitter: Zuul
Branch: r/stx.2.0

commit 7e5e887eb38042a0679ec100ca5d4016c6efe2bc
Author: Shuicheng Lin <email address hidden>
Date: Wed Dec 11 16:37:03 2019 +0800

Audit local registry secret info when there is user update in keystone

    local registry uses admin's username&password for authentication.
    And admin's password could be changed by openstack client cmd. It will
    cause auth info in secrets obsolete, and lead to invalid authentication
    in keystone.
    To keep secrets info updated, keystone event notification is enabled.
    And event notification listener is added in sysinv. So when there is
    user password change, a user update event will be sent out by keystone.
    And sysinv will call function audit_local_registry_secrets to check
    whether kubernetes secret info need be updated or not.

A periodic task is added also to ensure secrets are always synced, in
case notification is missed or there is failure in handle notification.

oslo_messaging is added to tox's requirements.txt to avoid tox failure.
The version is based on global-requirements.txt from Openstack Train.

    Test:
    Pass deployment and secrets could be updated automatically with new auth
    info.
    Pass host-swact in duplex mode.

    We lack of info how LP1853093 was triggered by the user, but this patch
    can address the issue that local registry secrets are not updated
    accordingly after the password of "admin" is changed.
    And this fix will help technically.

    Closes-Bug: 1853017
    Closes-Bug: 1853093
    Depends-On: https://review.opendev.org/707522
    Depends-On: https://review.opendev.org/707523
    Change-Id: I959b65288e0834b989aa87e40506e41d0bba0d59
    Signed-off-by: Shuicheng Lin <email address hidden>
    (cherry picked from commit 8ab1e2d7c624f83d72efcbfcddcdffa567a26bad)

Reviewed:  https://review.opendev.org/707524
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=7e5e887eb38042a0679ec100ca5d4016c6efe2bc
Submitter: Zuul
Branch:    r/stx.2.0

commit 7e5e887eb38042a0679ec100ca5d4016c6efe2bc
Author: Shuicheng Lin <shuicheng.lin@intel.com>
Date:   Wed Dec 11 16:37:03 2019 +0800

Audit local registry secret info when there is user update in keystone
    
    local registry uses admin's username&password for authentication.
    And admin's password could be changed by openstack client cmd. It will
    cause auth info in secrets obsolete, and lead to invalid authentication
    in keystone.
    To keep secrets info updated, keystone event notification is enabled.
    And event notification listener is added in sysinv. So when there is
    user password change, a user update event will be sent out by keystone.
    And sysinv will call function audit_local_registry_secrets to check
    whether kubernetes secret info need be updated or not.
    
    A periodic task is added also to ensure secrets are always synced, in
    case notification is missed or there is failure in handle notification.
    
    oslo_messaging is added to tox's requirements.txt to avoid tox failure.
    The version is based on global-requirements.txt from Openstack Train.
    
    Test:
    Pass deployment and secrets could be updated automatically with new auth
    info.
    Pass host-swact in duplex mode.
    
    We lack of info how LP1853093 was triggered by the user, but this patch
    can address the issue that local registry secrets are not updated
    accordingly after the password of "admin" is changed.
    And this fix will help technically.
    
    Closes-Bug: 1853017
    Closes-Bug: 1853093
    Depends-On: https://review.opendev.org/707522
    Depends-On: https://review.opendev.org/707523
    Change-Id: I959b65288e0834b989aa87e40506e41d0bba0d59
    Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
    (cherry picked from commit 8ab1e2d7c624f83d72efcbfcddcdffa567a26bad)

Revision history for this message

Peng Peng (ppeng) wrote on 2020-02-24:

#45

ALL_NODES_20200223.025201.tar Edit (20.5 MiB, application/x-tar)

Issue reproduced on
Lab: WCP_3_6
Load: 2020-02-22_04-10-00

Log: attached

[2020-02-23 02:36:27,094] 314 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password '!Li69nux*9' admin'

[2020-02-23 02:39:29,512] 314 DEBUG MainThread ssh.send :: Send 'keyring get CGCS admin'
[2020-02-23 02:39:30,118] 436 DEBUG MainThread ssh.expect :: Output:
!Li69nux*9

[2020-02-23 02:39:31,833] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-swact controller-0'

[2020-02-23 02:40:58,798] 314 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password 'Li69nux*' admin'

[2020-02-23 02:44:01,811] 314 DEBUG MainThread ssh.send :: Send 'keyring get CGCS admin'
[2020-02-23 02:44:02,399] 436 DEBUG MainThread ssh.expect :: Output:
Li69nux*

fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2020-02-23 02:44:03,214] 436 DEBUG MainThread ssh.expect :: Output:
Must provide Keystone credentials or user-defined endpoint and token, error was: The account is locked for user: c52f573e07d24a37b9b5627a8c82756d. (HTTP 401) (Request-ID: req-be5e9183-0cbf-44a7-9948-832943a06da9)

Issue reproduced on
Lab: WCP_3_6
Load: 2020-02-22_04-10-00

Log: attached

[2020-02-23 02:36:27,094] 314  DEBUG MainThread ssh.send    :: Send 'openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password '!Li69nux*9' admin'

[2020-02-23 02:39:29,512] 314  DEBUG MainThread ssh.send    :: Send 'keyring get CGCS admin'
[2020-02-23 02:39:30,118] 436  DEBUG MainThread ssh.expect  :: Output: 
!Li69nux*9

[2020-02-23 02:39:31,833] 314  DEBUG MainThread ssh.send    :: Send 'system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default  --os-endpoint-type internalURL --os-region-name RegionOne host-swact controller-0'

[2020-02-23 02:40:58,798] 314  DEBUG MainThread ssh.send    :: Send 'openstack --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password 'Li69nux*' admin'

[2020-02-23 02:44:01,811] 314  DEBUG MainThread ssh.send    :: Send 'keyring get CGCS admin'
[2020-02-23 02:44:02,399] 436  DEBUG MainThread ssh.expect  :: Output: 
Li69nux*

fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default  --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2020-02-23 02:44:03,214] 436  DEBUG MainThread ssh.expect  :: Output: 
Must provide Keystone credentials or user-defined endpoint and token, error was: The account is locked for user: c52f573e07d24a37b9b5627a8c82756d. (HTTP 401) (Request-ID: req-be5e9183-0cbf-44a7-9948-832943a06da9)

Changed in starlingx:
status:	Fix Released → Confirmed
tags:	added: stx.retestneeded

Lin Shuicheng (shuicheng) on 2020-02-25

Changed in starlingx:
assignee:	yong hu (yhu6) → Lin Shuicheng (shuicheng)

Revision history for this message

Lin Shuicheng (shuicheng) wrote on 2020-02-26:

#46

Hi Peng,
There is controller-1 only in the log tarball, controller-0 is missed.
Could you share me the detail step to reproduce the issue?
When do you change the password? And what operation before and after the password change?
From the log, the failure is still due to authentication failure with registry-token-server. But I don't know where the access request from.
I could find when password is changed, secrets are updated also. And no application is in applying stage.
I need to reproduce the issue to check where does the registry-token-server access come from.

Here is some log from controller-1:
Password change cmd at 2:40:58:
2020-02-23T02:40:58.000 controller-1 -sh: info HISTORY: PID=241256 UID=42425 openstack --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password xxxxxx admin

Secrets update at 2:41:01:
sysinv 2020-02-23 02:41:01.613 238507 INFO sysinv.conductor.kube_app [-] Secret registry-local-secret under Namespace kube-system is updated
sysinv 2020-02-23 02:41:01.645 238507 INFO sysinv.conductor.kube_app [-] Secret default-registry-key under Namespace kube-system is updated

Authentication failure at 2:41:04:
./var/log/daemon.log:38427:2020-02-23T02:41:04.547 controller-1 registry-token-server[235987]: info time="2020-02-23T02:41:04Z" level=error msg="error authenticating user \"admin\": Authentication failed" go.version=go1.12.10 http.request.host="128.224.151.227:9002" http.request.id=46c3b222-24ca-46bb-936c-0ef08fbf5141 http.request.method=GET http.request.remoteaddr="192.168.204.3:51416" http.request.uri="/token/?account=admin&scope=repository%3Adocker.io%2Fstarlingx%2Fmultus%3Apush%2Cpull&service=192.168.204.1%3A9001" http.request.useragent="docker/18.09.6 go/go1.10.8 git-commit/481bc77 kernel/3.10.0-1062.1.2.el7.2.tis.x86_64 os/linux arch/amd64 UpstreamClient(docker-sdk-python/3.3.0)" instance.id=46661299-76ed-4229-8aa6-45ac24c3f1c6

Then Account lock happen at 2:41:06 after 5 time invalid authentication:
2020-02-23 02:41:06.091 239370 WARNING keystone.server.flask.application [req-cc73c0ca-5d97-4ae0-afa4-6159b677b0bb - - - - -] Authorization failed. The account is locked for user: c52f573e07d24a37b9b5627a8c82756d. from 192.168.204.3: AccountLocked: The account is locked for user: c52f573e07d24a37b9b5627a8c82756d.

Hi Peng,
There is controller-1 only in the log tarball, controller-0 is missed.
Could you share me the detail step to reproduce the issue?
When do you change the password? And what operation before and after the password change?
From the log, the failure is still due to authentication failure with registry-token-server. But I don't know where the access request from.
I could find when password is changed, secrets are updated also. And no application is in applying stage.
I need to reproduce the issue to check where does the registry-token-server access come from.

Here is some log from controller-1:
Password change cmd at 2:40:58:
2020-02-23T02:40:58.000 controller-1 -sh: info HISTORY: PID=241256 UID=42425 openstack --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set  --password xxxxxx admin
 
Secrets update at 2:41:01:
sysinv 2020-02-23 02:41:01.613 238507 INFO sysinv.conductor.kube_app [-] Secret registry-local-secret under Namespace kube-system is updated
sysinv 2020-02-23 02:41:01.645 238507 INFO sysinv.conductor.kube_app [-] Secret default-registry-key under Namespace kube-system is updated

Authentication failure at 2:41:04:
./var/log/daemon.log:38427:2020-02-23T02:41:04.547 controller-1 registry-token-server[235987]: info time="2020-02-23T02:41:04Z" level=error msg="error authenticating user \"admin\": Authentication failed" go.version=go1.12.10 http.request.host="128.224.151.227:9002" http.request.id=46c3b222-24ca-46bb-936c-0ef08fbf5141 http.request.method=GET http.request.remoteaddr="192.168.204.3:51416" http.request.uri="/token/?account=admin&scope=repository%3Adocker.io%2Fstarlingx%2Fmultus%3Apush%2Cpull&service=192.168.204.1%3A9001" http.request.useragent="docker/18.09.6 go/go1.10.8 git-commit/481bc77 kernel/3.10.0-1062.1.2.el7.2.tis.x86_64 os/linux arch/amd64 UpstreamClient(docker-sdk-python/3.3.0)" instance.id=46661299-76ed-4229-8aa6-45ac24c3f1c6

Then Account lock happen at 2:41:06 after 5 time invalid authentication:
2020-02-23 02:41:06.091 239370 WARNING keystone.server.flask.application [req-cc73c0ca-5d97-4ae0-afa4-6159b677b0bb - - - - -] Authorization failed. The account is locked for user: c52f573e07d24a37b9b5627a8c82756d. from 192.168.204.3: AccountLocked: The account is locked for user: c52f573e07d24a37b9b5627a8c82756d.

Revision history for this message

Hrishit Mazumder (hmazumde) wrote on 2020-03-10:

#47

ALL_NODES_20200310.185444.tar Edit (34.9 MiB, application/x-tar)

Issue reproduced on lab wcp_76_77 at load: StarlingX_Upstream_build/2020-03-10_04-10-00
After admin pw changed,

Timestamp of password change:
2020-03-10T17:06:31.000 controller-0 -sh: info HISTORY: PID=1062703 UID=42425 openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password '!Li69nux*9' admin

Details: CLI 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid' failed to execute. Output: Must provide Keystone credentials or user-defined endpoint and token, error was: The account is locked for user: a7befe681ee64a82b71b63935b410cf7. (HTTP 401) (Request-ID: req-7ef8e9de-cb4c-43db-badc-ad3cb6aa9958)

I have attached logs for your perusal.

Best regards,
Hrishit Mazumder

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-03-12: Fix proposed to config (master)

#48

Fix proposed to branch: master
Review: https://review.opendev.org/712614

Changed in starlingx:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-03-13: Fix proposed to ansible-playbooks (master)

#49

Fix proposed to branch: master
Review: https://review.opendev.org/712823

Revision history for this message

Lin Shuicheng (shuicheng) wrote on 2020-03-30:

#50

Hi Peng,
Could you help update test case to avoid password update immediately after host-swact? Please try to wait 3 minutes before password change after host-swact.
The issue is that, after swact, sysinv in active controller will try to check k8s network upgrade, and need pull image from registry.local:9001. Then password is changed in the same time, and lead to keystone authentication failure, then account is locked. When sysinv do the image pulling, it will start 5 threads in parallel in order to save the image pull time, so the keystone 5 times failure count is hit easily.
I have submitted patch to avoid authentication failure caused by password cache, but cannot fix the issue totally. The issue will still occur if password change is happened just after sysinv get password, but before keystone authentication.

Here is the sysinv/sm log from ALL_NODES_20200310.185444.tar:
host-swact start at 17:09:53 and finish at 17:10:23
2020-03-10T17:09:53.000 controller-1 sm: debug time[6314.840] log<450> INFO: sm[95215]: sm_service_domain_scheduler.c(1520): Swact from (controller-0) to (controller-1) start
2020-03-10T17:10:23.000 controller-1 sm: debug time[6344.397] log<781> INFO: sm[95215]: sm_node_swact_monitor.cpp(57): Swact has completed successfully.
sysinv try to do k8s network upgrade at 17:10:25
2020-03-10 17:10:25.854 681994 INFO sysinv.conductor.manager [-] _upgrade_downgrade_kube_networking executing playbook: /usr/share/ansible/stx-ansible/playbooks/upgrade-k8s-networking.yml for version v1.16.2
k8s secret is already updated with new password at 17:11:15
2020-03-10 17:11:15.367 681994 INFO sysinv.conductor.kube_app [-] Secret registry-local-secret under Namespace kube-system is updated
Keystone report authentication failure due to receive old password at 17:11:19:
2020-03-10 17:11:19.438 682342 WARNING keystone.server.flask.application [req-3a36cda2-79ee-4a42-bbec-087ada62030e - - - - -] Authorization failed. The account is locked for user: a7befe681ee64a82b71b63935b410cf7. from 192.168.204.3: AccountLocked: The account is locked for user: a7befe681ee64a82b71b63935b410cf7.
sysinv reports ansible failure due to fail download imagae at 17:11:19:
"stderr": "time=\"2020-03-10T17:11:19Z\" level=fatal msg=\"pulling image failed: rpc error: code = Unknown desc = failed to pull and unpack image \\\"registry.local:9001/quay.io/calico/node:v3.6.2\
sysinv 2020-03-10 17:17:07.929 681994 ERROR sysinv.conductor.manager [-] Failed to upgrade/downgrade kubernetes networking images: ansible-playbook returned an error: 2: Exception: ansible-playbook returned an error: 2

Hi Peng,
Could you help update test case to avoid password update immediately after host-swact? Please try to wait 3 minutes before password change after host-swact.
The issue is that, after swact, sysinv in active controller will try to check k8s network upgrade, and need pull image from registry.local:9001. Then password is changed in the same time, and lead to keystone authentication failure, then account is locked. When sysinv do the image pulling, it will start 5 threads in parallel in order to save the image pull time, so the keystone 5 times failure count is hit easily.
I have submitted patch to avoid authentication failure caused by password cache, but cannot fix the issue totally. The issue will still occur if password change is happened just after sysinv get password, but before keystone authentication.

Here is the sysinv/sm log from ALL_NODES_20200310.185444.tar:
host-swact start at 17:09:53 and finish at 17:10:23
2020-03-10T17:09:53.000 controller-1 sm: debug time[6314.840] log<450> INFO: sm[95215]: sm_service_domain_scheduler.c(1520): Swact from (controller-0) to (controller-1) start
2020-03-10T17:10:23.000 controller-1 sm: debug time[6344.397] log<781> INFO: sm[95215]: sm_node_swact_monitor.cpp(57): Swact has completed successfully.
sysinv try to do k8s network upgrade at 17:10:25
2020-03-10 17:10:25.854 681994 INFO sysinv.conductor.manager [-] _upgrade_downgrade_kube_networking executing playbook: /usr/share/ansible/stx-ansible/playbooks/upgrade-k8s-networking.yml for version v1.16.2
k8s secret is already updated with new password at 17:11:15
2020-03-10 17:11:15.367 681994 INFO sysinv.conductor.kube_app [-] Secret registry-local-secret under Namespace kube-system is updated
Keystone report authentication failure due to receive old password at 17:11:19:
2020-03-10 17:11:19.438 682342 WARNING keystone.server.flask.application [req-3a36cda2-79ee-4a42-bbec-087ada62030e - - - - -] Authorization failed. The account is locked for user: a7befe681ee64a82b71b63935b410cf7. from 192.168.204.3: AccountLocked: The account is locked for user: a7befe681ee64a82b71b63935b410cf7.
sysinv reports ansible failure due to fail download imagae at 17:11:19:
"stderr": "time=\"2020-03-10T17:11:19Z\" level=fatal msg=\"pulling image failed: rpc error: code = Unknown desc = failed to pull and unpack image \\\"registry.local:9001/quay.io/calico/node:v3.6.2\
sysinv 2020-03-10 17:17:07.929 681994 ERROR sysinv.conductor.manager [-] Failed to upgrade/downgrade kubernetes networking images: ansible-playbook returned an error: 2: Exception: ansible-playbook returned an error: 2

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-03-31: Fix proposed to config (f/centos8)

#51

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/716137

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-03-31: Fix merged to config (f/centos8)

#52

Download full text (32.3 KiB)

Reviewed: https://review.opendev.org/716137
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=cb4cf4299c2ec10fb2eb03cdee3f6d78a6413089
Submitter: Zuul
Branch: f/centos8

commit 16477935845e1c27b4c9d31743e359b0aa94a948
Author: Steven Webster <email address hidden>
Date: Sat Mar 28 17:19:30 2020 -0400

Fix SR-IOV runtime manifest apply

    When an SR-IOV interface is configured, the platform's
    network runtime manifest is applied in order to apply the virtual
    function (VF) config and restart the interface. This results in
    sysinv being able to determine and populate the puppet hieradata
    with the virtual function PCI addresses.

    A side effect of the network manifest apply is that potentially
    all platform interfaces may be brought down/up if it is determined
    that their configuration has changed. This will likely be the case
    for a system which configures SR-IOV interfaces before initial
    unlock.

    A few issues have been encountered because of this, with some
    services not behaving well when the interface they are communicating
    over suddenly goes down.

    This commit makes the SR-IOV VF configuration much more targeted
    so that only the operation of setting the desired number of VFs
    is performed.

    Closes-Bug: #1868584
    Depends-On: https://review.opendev.org/715669
    Change-Id: Ie162380d3732eb1b6e9c553362fe68cbc313ae2b
    Signed-off-by: Steven Webster <email address hidden>

commit 45c9fe2d3571574b9e0503af108fe7c1567007db
Author: Zhipeng Liu <email address hidden>
Date: Thu Mar 26 01:58:34 2020 +0800

Add ipv6 support for novncproxy_base_url.

For ipv6 address, we need url with below format
[ip]:port

Partial-Bug: 1859641

Change-Id: I01a5cd92deb9e88c2d31bd1e16e5bce1e849fcc7
Signed-off-by: Zhipeng Liu <email address hidden>

commit d119336b3a3b24d924e000277a37ab0b5f93aae1
Author: Andy Ning <email address hidden>
Date: Mon Mar 23 16:26:21 2020 -0400

Fix timeout waiting for CA cert install during ansible replay

    During ansible bootstrap replay, the ssl_ca_complete_flag file is
    removed. It expects puppet platform::config::runtime manifest apply
    during system CA certificate install to re-generate it. So this commit
    updated conductor manager to run that puppet manifest even if the CA cert
    has already installed so that the ssl_ca_complete_flag file is created
    and makes ansible replay to continue.

    Change-Id: Ic9051fba9afe5d5a189e2be8c8c2960bdb0d20a4
    Closes-Bug: 1868585
    Signed-off-by: Andy Ning <email address hidden>

commit 24a533d800b2c57b84f1086593fe5f04f95fe906
Author: Zhipeng Liu <email address hidden>
Date: Fri Mar 20 23:10:31 2020 +0800

Fix rabbitmq could not bind port to ipv6 address issue

    When we use Armada to deploy openstack service for ipv6, rabbitmq
    pod could not start listen on [::]:5672 and [::]:15672.
    For ipv6, we need an override for configuration file.

Upstream patch link is:
https://review.opendev.org/#/c/714027/

Test pass for deploying rabbitmq service on both ipv...

Reviewed:  https://review.opendev.org/716137
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=cb4cf4299c2ec10fb2eb03cdee3f6d78a6413089
Submitter: Zuul
Branch:    f/centos8

commit 16477935845e1c27b4c9d31743e359b0aa94a948
Author: Steven Webster <steven.webster@windriver.com>
Date:   Sat Mar 28 17:19:30 2020 -0400

Fix SR-IOV runtime manifest apply
    
    When an SR-IOV interface is configured, the platform's
    network runtime manifest is applied in order to apply the virtual
    function (VF) config and restart the interface.  This results in
    sysinv being able to determine and populate the puppet hieradata
    with the virtual function PCI addresses.
    
    A side effect of the network manifest apply is that potentially
    all platform interfaces may be brought down/up if it is determined
    that their configuration has changed.  This will likely be the case
    for a system which configures SR-IOV interfaces before initial
    unlock.
    
    A few issues have been encountered because of this, with some
    services not behaving well when the interface they are communicating
    over suddenly goes down.
    
    This commit makes the SR-IOV VF configuration much more targeted
    so that only the operation of setting the desired number of VFs
    is performed.
    
    Closes-Bug: #1868584
    Depends-On: https://review.opendev.org/715669
    Change-Id: Ie162380d3732eb1b6e9c553362fe68cbc313ae2b
    Signed-off-by: Steven Webster <steven.webster@windriver.com>

commit 45c9fe2d3571574b9e0503af108fe7c1567007db
Author: Zhipeng Liu <zhipengs.liu@intel.com>
Date:   Thu Mar 26 01:58:34 2020 +0800

Add ipv6 support for novncproxy_base_url.
    
    For ipv6 address, we need url with below format
    [ip]:port
    
    Partial-Bug: 1859641
    
    Change-Id: I01a5cd92deb9e88c2d31bd1e16e5bce1e849fcc7
    Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>

commit d119336b3a3b24d924e000277a37ab0b5f93aae1
Author: Andy Ning <andy.ning@windriver.com>
Date:   Mon Mar 23 16:26:21 2020 -0400

Fix timeout waiting for CA cert install during ansible replay
    
    During ansible bootstrap replay, the ssl_ca_complete_flag file is
    removed. It expects puppet platform::config::runtime manifest apply
    during system CA certificate install to re-generate it. So this commit
    updated conductor manager to run that puppet manifest even if the CA cert
    has already installed so that the ssl_ca_complete_flag file is created
    and makes ansible replay to continue.
    
    Change-Id: Ic9051fba9afe5d5a189e2be8c8c2960bdb0d20a4
    Closes-Bug: 1868585
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit 24a533d800b2c57b84f1086593fe5f04f95fe906
Author: Zhipeng Liu <zhipengs.liu@intel.com>
Date:   Fri Mar 20 23:10:31 2020 +0800

Fix rabbitmq could not bind port to ipv6 address issue
    
    When we use Armada to deploy openstack service for ipv6, rabbitmq
    pod could not start listen on [::]:5672 and [::]:15672.
    For ipv6, we need an override for configuration file.
    
    Upstream patch link is:
    https://review.opendev.org/#/c/714027/
    
    Test pass for deploying rabbitmq service on both ipv4 and ipv6 setup
    
    Partial-Bug: 1859641
    
    Change-Id: I6495c45fbd8cc1de3c9f5d9ef5003447079d91b8
    Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>

commit 08aa950393a7e3c5fd5299b88e134307800584aa
Author: Kevin Smith <kevin.smith@windriver.com>
Date:   Sun Mar 22 14:29:15 2020 -0400

application-apply error string too long
    
    During application-apply exception handling, str(e) is
    used as the input to the progress column of the kube_app
    table in the database, which may be longer than the 255
    character limit.  The result is an application stuck
    in 'applying' status.  This update adds a more readable
    error message to just check logs.
    
    There are other instances where str(e) is used as input to
    the database and could cause a similar problem which should
    also be looked at.
    
    Change-Id: I01a5e8f56a628726163e2cfffc58143ae8d5f845
    Closes-Bug: 1867019
    Signed-off-by: Kevin Smith <kevin.smith@windriver.com>

commit c1c18871d72cdcd877b95f593bd119b47b3ddbb6
Author: Andy Ning <andy.ning@windriver.com>
Date:   Tue Feb 18 14:52:06 2020 -0500

Support multiple CA certificates installation
    
    This update enhanced sysinv certificate install API to be able to
    install multiple CA certs from a file. The returns from the API call
    indicates the certs actually installed in the call (ie, excluding these
    that are already in the system). This is neccessary especially for DC to
    support multiple CA certs synchronization.
    
    This update also added sysinv certficate uninstall API. The API is to
    be used to remove a particular CA certficate from the system, identified
    by its uuid. The API returns a json body with information about the
    certificate that has been removed. This is required by DC sysinv api
    proxy for certificate deletion synchronization, since DC tracks subcloud
    certificates resource by signature while the uninstall API request
    contains only uuid.
    
    The uninstall API only supports ssl_ca certificate.
    
    cgtsclient and system CLI are also updated to align with the updated
    and new APIs. User can use "system certificate-install ..." to install
    one or multiple CA certificates, and "system certificate-uninstall ..."
    to remove a particular CA certificate from the system.
    
    When multiple CA certificates are installed in the system,
    "system certificate-list" will display each of the individual
    certificates.
    
    THe sysinv certificate configuration API reference is updated with the
    new uninstall API. Unit tests are added for CA certificate install and
    delete APIs.
    
    Change-Id: I7dba11e56792b7d198403c436c37f71d7b7193c9
    Depends-On: https://review.opendev.org/#/c/711633/
    Closes-Bug: 1861438
    Closes-Bug: 1860995
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit 241ea2871b15965bd694895f796660f7f1fddbf3
Author: Tee Ngo <tee.ngo@windriver.com>
Date:   Thu Mar 19 13:54:15 2020 -0400

Set time limit for filebeat open filehandlers
    
    In a large system, filebeat can harvest a large number of files
    and with the default file closing policies, many deleted files are
    not freed. Over time, this leads to /var/log partition running out
    of space, services not being able to flush their logs to disk and
    logmgmt process continously rotating logs.
    
    This commit sets a default time limit for each open file harvester.
    This value can be adjusted as needed via user overrides.
    
    Closes-Bug: 1865924
    Change-Id: I9dbf9cb2128157834b937357dcc6c4945dc5d2f3
    Signed-off-by: Tee Ngo <tee.ngo@windriver.com>

commit d7c3822a52ecc3b4288106c4e544e67add80fbf5
Author: Jerry Sun <jerry.sun@windriver.com>
Date:   Fri Mar 13 12:37:39 2020 -0400

Remove usage of /etc/kubernetes/kubeadm.yaml
    
    /etc/kubernetes/kubeadm.yaml could contain stale data, for example, from
    changing kube-apiserver parameters. There are currently no system impacts
    from using the stale file, but as we change more parameters, there could
    be system impact. This commit makes the existing usage of kubeadm.yaml
    generate a temp copy of the file with current data first.
    
    Change-Id: I62391d184e3e5d6397a9af4f43c7c7ec19314afc
    Partial-bug: 1866695
    Signed-off-by: Jerry Sun <jerry.sun@windriver.com>

commit 8ecdcbbbcdc2807113c7b7004f92653acffa0b41
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Tue Mar 10 16:46:04 2020 -0400

Add platform network type for storage
    
    Added a new platform network type for optional backend storage.
    
    Story: 2007391
    Task: 39018
    
    Change-Id: I1a389b8aede49095e4f7f7d24ed8224504575d45
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit 2528dce84b5891038ca56c6959304ac4c1fc934a
Author: Thomas Gao <Thomas.Gao@windriver.com>
Date:   Thu Feb 13 18:52:15 2020 -0500

Allow VF type interface to detect underlying port
    
    Do `host-if-show` on VF interface whose underlying port supports
    dpdk will now display accelerated [True]. Before this fix, only
    ethernet, vlan, and ae type interfaces supports detecting
    underlying ports that support dpdk.
    
    Closes-Bug: 1846260
    
    Change-Id: Ifdee31811824a38ebc7d3a8febde2341d39ba986
    Signed-off-by: Thomas Gao <Thomas.Gao@windriver.com>

commit 95d8bb436b625c82e78ebb2a2134e0e861bd5574
Author: Jerry Sun <jerry.sun@windriver.com>
Date:   Wed Mar 4 16:07:22 2020 -0500

Support post-bootstrap config of kube-apiserver parameters
    
    Add system service parameters for each of the kube-apiserver parameters
    for openid connect.
    
    Story: 2006711
    Task: 38944
    
    Depends-On: https://review.opendev.org/711336
    
    Change-Id: Ib4b9aee036447087f88f803548e3f982446ccda4
    Signed-off-by: Jerry Sun <jerry.sun@windriver.com>

commit 6f162c3422df6c11b0d9f548487bfb3b9e401ca5
Author: Thomas Gao <Thomas.Gao@windriver.com>
Date:   Fri Feb 7 15:28:42 2020 -0500

Fixed address interface foreign key inconsistency
    
    Foreign key in sysinv.object.address.Address is `interface_uuid`,
    which is inconsistent with the foreign key `interface_id` defined
    in the database schema. This fix corrected that.
    
    Added a unit test to verify that addresses associated with an interface
    could be deleted.
    
    Additionally wrote a set of TODO unit tests blocked by
    the bug: tested delete address for orphaned-routes case, unlocked
    host state, and the case where address is allocated from pool.
    
    Modified interface querying mechanism to look up all interfaces.
    This modification is necessary because the current implementation of
    add_interface_filter only looks up those of type ethernet, ae and
    vlan. Attempting to get an virtual-type interface will raise an
    exception, causing Jenkins installation to fail.
    
    After a visual inspection of interface_uuid occurrences, fixed a few
    other occurrences of bad address.interface_uuid that are not caught
    by the unit test. Added new unit test suites in place to cover the
    code paths.
    
    Closes-Bug: 1861131
    
    Change-Id: I6f2449bbbb69d6f2353e521bfcd138d880ce878f
    Signed-off-by: Thomas Gao <Thomas.Gao@windriver.com>

commit 964a2b7c6238ce91d4ace34dcac790fa5a37d55c
Author: Kevin Smith <kevin.smith@windriver.com>
Date:   Tue Mar 3 14:17:42 2020 -0500

stx-monitor: only delete pvcs on app delete.
    
    It may be desired to keep the persistent volumes after removing the
    stx-monitor application.  This update will not remove the pvcs on
    application-remove, but remove them on application-delete
    
    Closes-Bug: 1865568
    
    Change-Id: I9b06008fe6b6033e5a1ce6808cc5d4fa6aabcd05
    Signed-off-by: Kevin Smith <kevin.smith@windriver.com>

commit c5d43da89e7fd2a12407bc4bebd14ab87d16c638
Author: Angie Wang <angie.wang@windriver.com>
Date:   Tue Feb 25 17:00:53 2020 -0500

Allow users to override a single image with a custom registry
    
    In the case that the user overrides a single image with a
    custom registry that is not from any known registries
    in Sysinv. This image downloading will fail as it
    prepends the docker.io registry to the image reference
    , then generates an invalid image tag.
    
    The original purpose of adding that logic is to handle
    the image that comes from docker.io but do not have
    docker.io explicitly specified in its image name. This
    case has already been updated to handle in the class
    "AppImageParser".
    
    This commit removes the related logic that causing the
    issue.
    
    Tested:
     - system helm-override-update stx-openstack nova openstack \
         --set images.tags.nova_api=mycustomregistry.com/stx-nova:latest
     - system application-apply stx-openstack
    
    Change-Id: I07d1a658c3cf56a3e09e81e1f947f93de50b513d
    Closes-Bug: 1859881
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 347af170f9cf1fd49be2a52107f0594d9d4b8ba8
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Tue Feb 25 21:13:59 2020 -0500

Update PTP API ref and unit tests
    
    Add the PTP apply function to the API ref and the unit tests.
    
    Story: 2006759
    Task: 38848
    Change-Id: Iae3cc9e90b653fd92a83a0d9a216d87016cf4c6c
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit 8e2e5f7e82efde39407d34c1a26daffb97dbe26d
Author: Kevin Smith <kevin.smith@windriver.com>
Date:   Fri Feb 21 07:56:04 2020 -0500

Set elasticsearch pod java options according to ip config
    
    The "-Djava.net.preferIPv6Addresses=true" java option was set
    for both ipv4 and ipv6 configurations which worked fine in both
    configs.  At some point recently in ipv4 configurations, the
    stx-monitor application stopped applying successfully due to
    elasticsearch cluster discovery failure.  Why the ipv4 failures
    are only recently occurring is unknown, but removal of this
    unnecessary java option for ipv4 eliminates the failures.
    
    This update will set the above java option for elasticsearch
    pods only if the cluster service network is ipv6.
    
    Closes-Bug: 1864193
    
    Change-Id: I2952f1c799b121d0812314156162af7696ebd6b0
    Signed-off-by: Kevin Smith <kevin.smith@windriver.com>

commit 6065f1318af289001d2017111cc8633c3320efda
Author: Matt Peters <matt.peters@windriver.com>
Date:   Thu Feb 20 16:22:02 2020 -0500

Remove system name from default index naming
    
    Remove the system name from the default index naming
    since it causes a large number of small independent
    indexes to be created that does not scale well against
    the current daily index rotation.
    
    Change-Id: Ia880a1d8c48703a0741a72e999c0cdb93c229423
    Story: 2006990
    Task: 38834
    Signed-off-by: Matt Peters <matt.peters@windriver.com>

commit 73d407bdf44933673e8e975e2523828b9c43e25d
Author: Matt Peters <matt.peters@windriver.com>
Date:   Thu Feb 20 16:21:40 2020 -0500

Add normalized percentages to cpu metric collection
    
    CPU metric collection which has been normalized against the
    number of cores is not enabled.  This update adds the
    appropriate configuration option to enable these metrics.
    
    Change-Id: I1e2dcd0fac144236dab3718a917344c339444003
    Closes-Bug: 1864128
    Signed-off-by: Matt Peters <matt.peters@windriver.com>

commit bbb9a477c1cb33ca51a134d742073cc200f89fb0
Author: Angie Wang <angie.wang@windriver.com>
Date:   Thu Feb 20 11:56:01 2020 -0500

Reject the k8s first control plane upgrade after networking is upgraded
    
    The first upgraded control plane shouldn't be allowed to re-upgrade
    after the k8s networking upgrade is done. This commit adds a check
    to prevent this action.
    
    Change-Id: I01c6539fe89749663dff6159e56d14f9a510ebe0
    Story: 2006781
    Task: 38761
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit cb2b83365e823cd69a0e8e2a3c54b3e679f48776
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Thu Feb 20 11:37:03 2020 -0500

Support for https in OIDC client
    
    Changed OIDC client to use HTTPS by default.
    
    Story: 2006711
    Task: 38481
    
    Depends-On: https://review.opendev.org/#/c/708911
    Change-Id: I567b224030cfe2278cdca57f2d40ad36c98d7ff6
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit 4687ea36b5fadb7dad0cfe0a1ede4b488a0b5aeb
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Fri Feb 14 15:30:41 2020 -0500

Apply PTP configuration at runtime
    
    Allow PTP configuration to be applied at runtime. Previously this would
    have required a lock/unlock of the host. A new command 'system
    ptp-apply' has been added to apply the ptp configuration.
    
    Note we will not apply ptp configurations to hosts that have switched
    from ntp to ptp. That change will require a lock/unlock as before.
    
    Depends-On: https://review.opendev.org/707904
    Change-Id: I098bd12336f34324a77615a20a4e36b7620ab79b
    Story: 2006759
    Task: 38770
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit d93d5804c626955fb711897745dce4a61136183b
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Fri Feb 14 16:47:03 2020 -0500

Fixed error responses in controller-fs
    
    Error response given by controller-fs-modify erroneously mentions
    filesystem names which are not controller filesystems. To fix this,
    hard-coded filesystem names have been completely removed.
    
    Change-Id: Ic6f563dd0b347ac7ece628f6e716c952205c1687
    Closes-Bug: 1862416
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>

commit f6eebbd318f3c596c7d408696ce1558fd03a5497
Author: Bart Wensley <barton.wensley@windriver.com>
Date:   Wed Feb 19 12:56:21 2020 -0600

Disable keystone caching on subclouds
    
    The use of keystone caching on subclouds causes problems because
    the syncing of fernet keys to the subcloud results in stale
    cache entries. This causes authentication failures until the
    cache entries age out or new tokens are created.
    
    Since the keystone load in a subcloud is light, there is really
    no need for caching at this time - it is being disabled in
    subclouds.
    
    Change-Id: I777c57c46cf1bcd701fbbac73228a2cb81d8424b
    Closes-Bug: 1860372
    Signed-off-by: Bart Wensley <barton.wensley@windriver.com>

commit b330498aecb7068e8bfa65c41c71e974b2d674aa
Author: Mingyuan Qi <mingyuan.qi@intel.com>
Date:   Tue Feb 18 03:48:44 2020 +0000

Change docker client to crictl in cert rotation
    
    When container runtime moving to containerd, the containers are
    created by containerd. Accordingly, the client tool is changed
    to crictl. In the kube cert rotation script, the containers will
    be stopped by crictl and automatically started by kubelet to
    update the renewed certificates within the container.
    
    Story: 2006145
    Task: 37619
    
    Change-Id: Ia8cf76c15811f8f9d88199158e83ccba31534e4e
    Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com>

commit 7afe5de64d0d23ec951620e0380fb65e2f49f4c3
Author: Angie Wang <angie.wang@windriver.com>
Date:   Tue Feb 11 17:25:11 2020 -0500

Add semantic checks for k8s upgrade
    
    Semantic checks added:
      - verify whether all installed applications are compatible with
        the new k8s version before starting k8s upgrade
      - prevent host-unlock if the host kubelet upgrade is in progress
        (allow --force to do force unlock).
      - prevent application-apply/update if the app is incompatible with
        the current k8s version.
    
    For the application that has k8s version restriction, the following
    keys need to be optionally specified in its metadata file:
    ie...
    supported_k8s_version:
      minimum: v1.16.1
      maximum: v1.16.3
    
    The k8s version related information in metadata file will be used for
    compatibility check. The metadata file is updated to copy over to the
    drbd fs during application-upload.
    
    Tests conducted:
      - "system kube-upgrade-start" rejected if any installed app's k8s
        version check failed
      - host-unlock rejected if the host is in upgrading-kubelet status
      - was able to forcibly unlock host even if it's upgrading kubelet
      - application-apply/update testing
    
    Change-Id: I1ef852cccddf7ae39eca4b4e25b80a7f4347d8a4
    Story: 2006781
    Task: 38761
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 2b49e9f3f93c9913961b437d4e51d1e7d46f1222
Author: Robert Church <robert.church@windriver.com>
Date:   Thu Feb 13 10:00:56 2020 -0600

Workaround for cleaning up MatchNodeSelector pods after host reboot
    
    Added a K8sPodOperator class to look for and remove Failed pods with a
    MatchNodeSelector reason.
    
    MatchNodeSelector pods related to applications will not be removed by
    K8S automatically. These pods may block subsequent application applies
    as tiller expects these pods to be in a non failed state.
    
    A check for this condition is added in two locations:
    - to the _k8s_application_audit() which is run immediately on
      sysinv-conductor startup and runs every minute. This runs 4 times in a
      5 minute window at startup on a simplex install. This should catch all
      cases unless there is a delay accessing the k8s API that lasts longer
      than 5 minutes at startup.
    - to the application-apply path. This would cover any case that occurs
      after the initial 5 minute conductor startup OR any occurance on a
      non-simplex installation (so far only observed on AIO-SX)
    
    NOTE: This commit will be reverted once a proper upstream k8S fix is
    provided.
    
    Related upstream bugs:
    - https://github.com/kubernetes/kubernetes/issues/80745
    - https://github.com/kubernetes/kubernetes/issues/85334
    
    The following PR was tested and fixed this issue but has not landed
    upstream in a new k8s release:
    - https://github.com/kubernetes/kubernetes/pull/80976
    
    Change-Id: Ia5418794a44e7821933e8335d5c5db25b58a739f
    Closes-Bug: #1849688
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit 34e410821b7b0699444b303fcdec1ab89d860cc6
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Thu Feb 13 15:38:51 2020 -0500

Fix inconsistent disk space calculation
    
    Integer division in Python 2 behaves like floating-point
    division in Python 3. Thus, changes are made to rectify this
    behavior.
    
    Change-Id: I6a5905a4d97df5b9e73e165580801c865006f316
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>
    Closes-Bug: 1862668

commit e6e37c949a39e4ee3d4f4c9407a85089e7514345
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Mon Feb 10 16:26:13 2020 -0500

Added unit test cases for host file system.
    
    Test cases added for API endpoints used by:
     1. host-fs-list
     2. host-fs-modify
     3. host-fs-show
    
    This commit also fixes the issue of Host FS disk space calculations
    yielding different values in Python 2 and Python 3.
    
    Change-Id: I50a1ca43c43c3bba30730c616b3788664920d0c9
    Story: 2007082
    Task: 38013
    Partial-Bug: 1862668
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>

commit 227ddec6189fdabdc75d45162fc22b9af7118982
Author: Thomas Gao <Thomas.Gao@windriver.com>
Date:   Thu Feb 13 10:47:55 2020 -0500

Fix device plugin port handling for pci-passthrough
    
    While generating the SR-IOV device plugin configuration data,
    it is necessary to get the underlying port information.
    For SR-IOV ports there is special handling required to deal
    with the case of a 'VF' subinterface.  For PCI-Passthrough,
    the port can and should be accessed directly.
    
    Closes-Bug: 1856587
    
    Co-Authored-By: Steven Webster <steven.webster@windriver.com>
    
    Change-Id: I70f315669776a591e23e69c6653098e720815b99
    Signed-off-by: Thomas Gao <Thomas.Gao@windriver.com>

commit cab522030f79c0060b80050c6a560696d7db80d9
Author: Stefan Dinescu <stefan.dinescu@windriver.com>
Date:   Fri Jan 31 17:31:17 2020 +0200

Make Ceph storage backend optional
    
    Changes included in this commit:
    - change consistency checks to allow a system to
      be deployed without ceph configured
    - allow ceph to be provisioned before unlocking
      controller-0
    - add support for runtime provisioning of ceph
      on an already fully deployed system
    - move default cluster and storage tier config
      from conductor initialization to storage-backend
      creation
    - move CephOperator initialization from conductor
      initialization to a greenthread that waits for
      the ceph cluster to become responsive
    - make adding ceph storage-backend timing consistent
      across all setups: you can add it before unlocking
      controller-0 or only after all controller nodes
      have been unlocked.
    
    Tests run:
    - all tests were run on AIO-SX, AIO-DX, Standard
      and Storage configs
    - deploy system without ceph
    - configure ceph after running ansible bootstrap,
      but before unlocking controller-0
    - configure ceph at runtime on an already deployed
      system
    - swacting
    
    Change-Id: I05fbd494d9a22a535eae200a26c21b1702500194
    Depends-On: https://review.opendev.org/705234
    Story: 2007064
    Task: 37931
    Signed-off-by: Stefan Dinescu <stefan.dinescu@windriver.com>

commit f1605d465b5cb10a9d46803e88096951cdacc3a5
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Mon Feb 3 14:35:45 2020 -0500

PTP Configuration Enhancements
    
    Add PTP service parameters. Any service parameters in the global ptp
    section will be written to the ptp4l conf. phc2sys service parameters
    will be used to specify the command line options used with the phc2sys
    service.
    
    Values specified in the service parameters will take precedence over
    values specified by the PTP table.
    
    Story: 2006759
    Task: 38669
    Depends-On: https://review.opendev.org/#/c/706364
    Change-Id: I791ec251be44d963bfb5eb69268fbc7a8a75391a
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit 173eb3bea75e2a774976461a5caef482c20a814a
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Mon Feb 3 16:21:42 2020 -0500

Added unit test cases for controller file system
    
    Test cases added for API endpoints used by:
     1. controllerfs-list
     2. controllerfs-modify
     3. controllerfs-show
    
    Change-Id: Ifd525d2218a099b15139f17d6b4ae1b7279e8810
    Story: 2007082
    Task: 38003
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>

commit aead92341082065798ee4450d804f64d63ba35f1
Author: Thomas Gao <Thomas.Gao@windriver.com>
Date:   Tue Jan 21 18:12:46 2020 -0500

Enabled platform interfaces to add ip address(es)
    
    Removed network type check in api controller interface to allow platform
    interfaces to have static address mode in the database.
    
    Removed broken network type check in api controller address.
    
    Loosened interface-class and network-type restrictions in puppet
    controller to allow platform interfaces to have static ip address
    during system unlock.
    
    Added unit tests to test puppet interface's new restriction logic of
    get_interface_address_method for ipv4 static mode (valid), ipv6 static
    mode (valid), and ipv4 static mode with network type (invalid).
    
    Added unit test to ensure one can add an ip address to the static
    platform interface. Enabled DAD for ipv6 tests. Renamed get_post_object
    parameter interface_id to interface_uuid to eliminate usage
    inconsistency because the former is rejected in the POST request.
    
    Closes-Bug: 1855191
    
    Change-Id: I1f2bc92bb1a97dc4afb21966de4055b12855510a
    Signed-off-by: Thomas Gao <Thomas.Gao@windriver.com>

commit b27ae6b348fdd03d83859e7c1a21baf828859328
Author: Thomas Gao <Thomas.Gao@windriver.com>
Date:   Thu Jan 16 11:21:30 2020 -0500

Fixed semantic checks for SR-IOV VF parameters.
    
    Only interfaces of class pci-sriov may have numvfs and vf_driver.
    However, interfaces of class data attempting to add numvfs and
    vf_driver via the cli was able to pass the semantic check.
    Moreover, when an interface class changes from pci-sriov to data,
    the numvfs and vf_driver fields are not cleared.
    
    This fix tackles the above issues by altering the condition-
    check that resets the 2 fields before the semantic check such
    that faulty semantic will not pass the semantic check.
    This fix also ensures the 2 fields are permanently reset
    once interface class is changed from pci-sriov to data.
    
    Added several unit tests to verify all situations described
    above.
    
    Depends-On: https://review.opendev.org/#/c/705293
    
    Closes-Bug: 1855933
    
    Change-Id: I3c25c57edcdd50c5e76e17da658c7985821a3436
    Signed-off-by: Thomas Gao <Thomas.Gao@windriver.com>

commit 4598ca8d65417b7ac9f19f6fd3954639d230b46b
Author: Al Bailey <Al.Bailey@windriver.com>
Date:   Wed Feb 5 09:38:42 2020 -0600

Deprecate sysinv.openstack.common.db in favor of oslo_db
    
    openstack.common.db was not being used except by unit tests.
    The sysinv engine had previously been converted, so the
    changes are primarily in the unit test environment.
    
    Story: 2006796
    Task: 37426
    Change-Id: Ie638ee7e347fef0ada061ed4047decd0cbb919ef
    Signed-off-by: Al Bailey <Al.Bailey@windriver.com>

commit fb84bf9bdcb7844e6ac0ea192480a43ae4ac7480
Author: Thomas Gao <Thomas.Gao@windriver.com>
Date:   Fri Jan 31 10:06:25 2020 -0500

Forbid unlocked hosts to modify interfaces
    
    Simplified the convoluted logic that allows certain unlocked hosts to
    modify interfaces. Now the logic simply rejects unlocked hosts.
    
    Fixed a series of unit tests that modifies unlocked test controller by
    transfer the modification operations to locked test workers. Moreover,
    hardcoded test controller id is replaced with worker id attribute.
    
    Fixed another set of tests that attempts to create ethernet, vlan, or
    bond on a unlocked test controller, even though those tests are intended
    for locked test workers. These redundant network configuration are
    promptly removed, because to keep them will force the only active
    controller node to be locked.
    
    Closes-Bug: 1855187
    
    Change-Id: I7eacba9d064a4efb2c2032c3879d11460401ca08
    Signed-off-by: Thomas Gao <Thomas.Gao@windriver.com>

commit 29f38ce63725a829a165989bb134fd98ac8bea78
Author: Andy Ning <andy.ning@windriver.com>
Date:   Tue Feb 4 15:42:20 2020 -0500

Copy encryption provider config file to second controller
    
    kube-apiserver encryption provider config file is generated by ansible
    bootstrap on the first controller and stored in the shared fs. It is
    then copied over to the second controller. When kube-apiserver pod
    starts it will take this configuration file as its encryption provider
    configuration.
    
    Change-Id: Ibfcfb13c8a6685e38a1043acd7ec752ea116911c
    Story: 2007243
    Task: 38627
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit c4fa36214c444b34ae9c2b06f35758eb1ba8c987
Author: Thomas Gao <Thomas.Gao@windriver.com>
Date:   Mon Feb 3 15:41:28 2020 -0500

Forbid IPv4 DNS in an IPv6 OAM config
    
    Implemented IP version check in DNS controller api to reject patch
    operations with mismatched DNS server IP version.
    
    Enabled and fixed relevant unit tests.
    
    Rearranged unit test inheritance hierachy to eliminate undesired test
    repetitions.
    
    Closes-Bug: 1860489
    
    Change-Id: Ief4a19eeea03086bb5816a13cb3a706a48bab51a
    Signed-off-by: Thomas Gao <Thomas.Gao@windriver.com>

commit 5df1f3a89a6e1ef699fc6030a18902faf45daf88
Author: Bin Qian <bin.qian@windriver.com>
Date:   Wed Feb 5 13:26:43 2020 -0500

Adding job to upload commits to GitHub
    
    Add job to publish config repo to GitHub
    Fix host_key
    
    Story: 2007252
    Task: 38657
    
    Change-Id: Id0c1fe7278cbddbf6082f452323537427fefe95f
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit 8ab1e2d7c624f83d72efcbfcddcdffa567a26bad
Author: Shuicheng Lin <shuicheng.lin@intel.com>
Date:   Wed Dec 11 16:37:03 2019 +0800

Audit local registry secret info when there is user update in keystone
    
    local registry uses admin's username&password for authentication.
    And admin's password could be changed by openstack client cmd. It will
    cause auth info in secrets obsolete, and lead to invalid authentication
    in keystone.
    To keep secrets info updated, keystone event notification is enabled.
    And event notification listener is added in sysinv. So when there is
    user password change, a user update event will be sent out by keystone.
    And sysinv will call function audit_local_registry_secrets to check
    whether kubernetes secret info need be updated or not.
    
    A periodic task is added also to ensure secrets are always synced, in
    case notification is missed or there is failure in handle notification.
    
    oslo_messaging is added to tox's requirements.txt to avoid tox failure.
    The version is based on global-requirements.txt from Openstack Train.
    
    Test:
    Pass deployment and secrets could be updated automatically with new auth
    info.
    Pass host-swact in duplex mode.
    
    Closes-Bug: 1853017
    Depends-On: https://review.opendev.org/700677
    Depends-On: https://review.opendev.org/699547
    Change-Id: I959b65288e0834b989aa87e40506e41d0bba0d59
    Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-01: Fix merged to ansible-playbooks (master)

#53

Reviewed: https://review.opendev.org/712823
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=d6cff0496dcf52655eba340e1e57b1d973040edf
Submitter: Zuul
Branch: master

commit d6cff0496dcf52655eba340e1e57b1d973040edf
Author: Shuicheng Lin <email address hidden>
Date: Thu Mar 12 14:34:09 2020 +0800

Refresh local registry auth info each time when access local registry

    Local registry uses admin account password as authentication info.
    And this password may be changed by openstack client at any time.
    When try to download images from local registry, auth info cannot
    be cached, otherwise it may lead to authentication failure in keystone,
    and account be locked at the end.
    For this specific case, there is host-swact first, then function
    "_upgrade_downgrade_kube_networking" in sysinv conductor is called.
    And upgrade-k8s-networking.yml is executed which will try to download
    kube network images from local registry. During this period, admin
    account password is changed. And lead to account be locked due to
    authentication failure in keystone.
    With this update, there is still possibility that password be changed
    just after get operation. And due to the images download are run in
    parallel with multi threads, so account lock may still hit. This
    change could minimize the issue rate, but cannot fix all.

Closes-Bug: 1853017

Change-Id: I686616937031a3f7ac6d65e5b118511dc549ab85
Signed-off-by: Shuicheng Lin <email address hidden>

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-15: Fix merged to config (master)

#54

Reviewed: https://review.opendev.org/712614
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=423a475aff4f9ea1b60af6a9a2989027d1506f10
Submitter: Zuul
Branch: master

commit 423a475aff4f9ea1b60af6a9a2989027d1506f10
Author: Shuicheng Lin <email address hidden>
Date: Thu Mar 12 14:06:08 2020 +0800

Refresh local registry auth info each time when access local registry

    Local registry uses admin account password as authentication info.
    And this password may be changed by openstack client at any time.
    When sysinv tries to download images from local registry, it cannot
    cache the auth info, otherwise it may lead to authentication failure
    in keystone, and account be locked at the end.

Partial-Bug: 1853017

Change-Id: I07f273a05a1bc3c08b48d13c94eb6df6aecdf7c3
Signed-off-by: Shuicheng Lin <email address hidden>

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2020-04-27:

#55

Shuicheng, There are recent commits in master related to this fix that haven't been cherrypicked to the stx.2.0 & stx.3.0 branches. Are these commits applicable to those releases?

Ghada Khalil (gkhalil) on 2020-04-27

tags:

removed: in-r-stx30

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-28: Fix proposed to config (r/stx.3.0)

#56

Fix proposed to branch: r/stx.3.0
Review: https://review.opendev.org/723766

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-28: Fix proposed to ansible-playbooks (r/stx.3.0)

#57

Fix proposed to branch: r/stx.3.0
Review: https://review.opendev.org/723767

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-28: Fix proposed to config (r/stx.2.0)

#58

Fix proposed to branch: r/stx.2.0
Review: https://review.opendev.org/723781

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-28: Fix merged to config (r/stx.3.0)

#59

Reviewed: https://review.opendev.org/723766
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=9bcd1b066bff4b51a1ef82ccd476116bd4dd8ab5
Submitter: Zuul
Branch: r/stx.3.0

commit 9bcd1b066bff4b51a1ef82ccd476116bd4dd8ab5
Author: Shuicheng Lin <email address hidden>
Date: Thu Mar 12 14:06:08 2020 +0800

Refresh local registry auth info each time when access local registry

(cherry picked from commit 423a475aff4f9ea1b60af6a9a2989027d1506f10)

    Local registry uses admin account password as authentication info.
    And this password may be changed by openstack client at any time.
    When sysinv tries to download images from local registry, it cannot
    cache the auth info, otherwise it may lead to authentication failure
    in keystone, and account be locked at the end.

Partial-Bug: 1853017

Change-Id: I07f273a05a1bc3c08b48d13c94eb6df6aecdf7c3
Signed-off-by: Shuicheng Lin <email address hidden>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-28: Fix merged to ansible-playbooks (r/stx.3.0)

#60

Reviewed: https://review.opendev.org/723767
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=75b5edfa6ce1ea32293889ec9da8d0e6ae2007f8
Submitter: Zuul
Branch: r/stx.3.0

commit 75b5edfa6ce1ea32293889ec9da8d0e6ae2007f8
Author: Shuicheng Lin <email address hidden>
Date: Thu Mar 12 14:34:09 2020 +0800

Refresh local registry auth info each time when access local registry

(cherry picked from commit d6cff0496dcf52655eba340e1e57b1d973040edf)
(cherry picked from commit 1b50022d55a9da2bbab284b1fdda2ddc78c30c79)

    Local registry uses admin account password as authentication info.
    And this password may be changed by openstack client at any time.
    When try to download images from local registry, auth info cannot
    be cached, otherwise it may lead to authentication failure in keystone,
    and account be locked at the end.
    For this specific case, there is host-swact first, then function
    "_upgrade_downgrade_kube_networking" in sysinv conductor is called.
    And upgrade-k8s-networking.yml is executed which will try to download
    kube network images from local registry. During this period, admin
    account password is changed. And lead to account be locked due to
    authentication failure in keystone.
    With this update, there is still possibility that password be changed
    just after get operation. And due to the images download are run in
    parallel with multi threads, so account lock may still hit. This
    change could minimize the issue rate, but cannot fix all.

Closes-Bug: 1853017

Change-Id: I686616937031a3f7ac6d65e5b118511dc549ab85
Signed-off-by: Shuicheng Lin <email address hidden>

Revision history for this message

Peng Peng (ppeng) wrote on 2020-04-29:

#61

Issue was reproduced on
Lab: WCP_71_75
Load: 2020-04-28_20-00-00
all nodes collect log added

test log:
====================== Test Step 1: Changing admin password to !Li69nux*9

[2020-04-29 18:07:44,424] 314 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password '!Li69nux*9' admin'

====================== Test Step 2: Sleep for 180 seconds after admin password change

====================== Test Step 3: Check admin password is updated in keyring

[2020-04-29 18:10:46,870] 314 DEBUG MainThread ssh.send :: Send 'keyring get CGCS admin'
[2020-04-29 18:10:47,477] 436 DEBUG MainThread ssh.expect :: Output:
!Li69nux*9

====================== Test Step 4: Swact active controller

[2020-04-29 18:10:47,583] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list'
[2020-04-29 18:10:48,435] 436 DEBUG MainThread ssh.expect :: Output:
The account is locked for user: 7fb2fa710fca4ff0bb1cdce312d05fce. (HTTP 401) (Request-ID: req-33846962-49c1-4804-96a3-5ae633577987)
[sysadmin@controller-1 ~(keystone_admin)]$

Changed in starlingx:
status:	Fix Released → Confirmed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-29: Fix merged to config (r/stx.2.0)

#62

Reviewed: https://review.opendev.org/723781
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=a70ecf4baa3809cbf60e6c1b835b07fe9ba2d2d4
Submitter: Zuul
Branch: r/stx.2.0

commit a70ecf4baa3809cbf60e6c1b835b07fe9ba2d2d4
Author: Shuicheng Lin <email address hidden>
Date: Thu Mar 12 14:06:08 2020 +0800

Refresh local registry auth info each time when access local registry

(cherry picked from commit 423a475aff4f9ea1b60af6a9a2989027d1506f10)

    Local registry uses admin account password as authentication info.
    And this password may be changed by openstack client at any time.
    When sysinv tries to download images from local registry, it cannot
    cache the auth info, otherwise it may lead to authentication failure
    in keystone, and account be locked at the end.

Partial-Bug: 1853017

Change-Id: I07f273a05a1bc3c08b48d13c94eb6df6aecdf7c3
Signed-off-by: Shuicheng Lin <email address hidden>

Revision history for this message

Lin Shuicheng (shuicheng) wrote on 2020-04-29:

#63

Hi Peng,
Could you share me the collected log?
Thanks.

Revision history for this message

Peng Peng (ppeng) wrote on 2020-04-30:

#64

https://files.starlingx.kube.cengn.ca/launchpad/1853017

Revision history for this message

Lin Shuicheng (shuicheng) wrote on 2020-05-06:

#65

Hi Peng,
The cause is different with previous, it is not caused by registry-token-server authentication.

From the log I could find error log in pod platform-deployment-manager, which seems from
"tis-lab-registry.cumulus.wrs.com:9001/wind-river/cloud-platform-deployment-manager WRCP_20.04"

controller-0_20200429.185943/var/log/containers/platform-deployment-manager-0_platform-deployment-manager_manager-dd039c6a54492b52244abd7e7ecb7dccf197d0fa595be92ddc3f9bd7f1f8d513.log
"
2020-04-29T18:10:37.070876069Z stderr F E0429 18:10:37.070686 1 common.go:242] controller/host "msg"="an unhandled error occurred" "error"="failed to get: a896665e-9d40-4342-9a07-92c56715e008: Unable to re-authenticate: Expected HTTP response code [] when accessing [GET http://[face::1]:6385/v1/ihosts/a896665e-9d40-4342-9a07-92c56715e008], but got 401 instead\n{\"error\": {\"message\": \"The request you have made requires authentication.\", \"code\": 401, \"title\": \"Unauthorized\"}}" "type"={}
2020-04-29T18:10:37.070916344Z stderr F E0429 18:10:37.070781 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to get: a896665e-9d40-4342-9a07-92c56715e008: Unable to re-authenticate: Expected HTTP response code [] when accessing [GET http://[face::1]:6385/v1/ihosts/a896665e-9d40-4342-9a07-92c56715e008], but got 401 instead\n{\"error\": {\"message\": \"The request you have made requires authentication.\", \"code\": 401, \"title\": \"Unauthorized\"}}" "controller"="host-controller" "request"={"Namespace":"deployment","Name":"controller-0"}
...
"

Please help ask WR guy help confirm whether admin password is used or not in "cloud-platform-deployment-manager".
Thanks.

Revision history for this message

Lin Shuicheng (shuicheng) wrote on 2020-05-06:

#66

@yong please help assign the issue to WR, since it is caused by WR specific image.

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2020-05-06:

#67

@Lin Shuicheng, I followed up on this and you are correct. The most recent issue reported by Peng Peng is tied to a wr lab specific pod that continues to use the old password to access the config REST API, resulting in the admin account getting locked after a password change. Therefore, we should consider this Launchpad as Fixed. I'm putting it back to "Fix Released".

@Peng Peng, Please do not re-open this Launchpad again. Please also note that there are issues with admin password changes for Distributed Cloud. These are unrelated to this original issue and will be tracked separately. Please do not test admin password changes on Distributed Cloud.

Ghada Khalil (gkhalil) on 2020-05-06

Changed in starlingx:
status:	Confirmed → Fix Released

Peng Peng (ppeng) on 2020-05-07

tags:

removed: stx.retestneeded

Revision history for this message

Peng Peng (ppeng) wrote on 2020-05-20:

#68

Verified on
Lab: WP_8_12
Load: 2020-05-19_20-00-00

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-21: Fix proposed to ansible-playbooks (f/centos8)

#69

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/729809

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-21: Fix proposed to config (f/centos8)

#70

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/729812

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-22: Fix merged to config (f/centos8)

#71

Download full text (37.5 KiB)

Reviewed: https://review.opendev.org/729812
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=539d476456277c22d0dcbc3cbbc832e623242264
Submitter: Zuul
Branch: f/centos8

commit 320cc40de8518787c2be234d7fdf88ec0a462df2
Author: Don Penney <email address hidden>
Date: Wed May 13 13:06:11 2020 -0400

Add auto-versioning to starlingx/config packages

This update makes use of the PKG_GITREVCOUNT variable to auto-version
the packages in this repo.

    Change-Id: I3a2c8caeb4b4647608978b1f2ccfcf0661508803
    Depends-On: https://review.opendev.org/727837
    Story: 2006166
    Task: 39766
    Signed-off-by: Don Penney <email address hidden>

commit d9f2aea0fb228ed69eb9c9262e29041eedabc15d
Author: Sharath Kumar K <email address hidden>
Date: Wed Apr 22 16:22:22 2020 +0200

De-branding in starlingx/config: CGCS -> StarlingX

1. Rename CGCS to StarlingX for .spec files

    Test:
    After the de-brand change, bootimage.iso has been built in the flock
    Layer and installed on the dev machine to validate the changes.

Please note, doing de-brand changes in batches, this is batch9 changes.

Story: 2006387
Task: 39524

Change-Id: Ia1fe0f2baafb78c974551100f16e6a7d99882f15
Signed-off-by: Sharath Kumar K <email address hidden>

De-branding in starlingx/config: CGCS -> StarlingX

1. Rename CGCS to StarlingX for .spec file
2. Rename TIS to StarlingX for .service files

    Test:
    After the de-brand change, bootimage.iso has been built in the flock
    Layer and installed on the dev machine to validate the changes.

Please note, doing de-brand changes in batches, this is batch10 changes.

Story: 2006387
Task: 36202

Change-Id: I404ce0da2621495175ad31489e9ad6f7b0211e26
Signed-off-by: Sharath Kumar K <email address hidden>

commit d141e954fa6bbf688929ec90d1b6604a97792c43
Author: Teresa Ho <email address hidden>
Date: Tue Mar 31 10:08:57 2020 -0400

Sysinv extensions for FPGA support

This update adds cli and restapi to support FPGA device
programming.

    CLI commands:
    system device-image-apply
    system device-image-create
    system device-image-delete
    system device-image-list
    system device-image-remove
    system device-image-show
    system device-image-state-list
    system device-label-list
    system host-device-image-update
    system host-device-image-update-abort
    system host-device-label-assign
    system host-device-label-list
    system host-device-label-remove

Story: 2006740
Task: 39498

Change-Id: I556c2e7a51b3931b5a66ab27b67f51e3a8aebd9f
Signed-off-by: Teresa Ho <email address hidden>

commit 491cca42ed854d2cb3ee3646b93c56a4f45f563c
Author: Elena Taivan <email address hidden>
Date: Wed Apr 29 11:25:26 2020 +0000

Qcow2 conversion to raw can be done using 'image-conversion' filesystem

    1. Conversion filesystem can be added before/after
       stx-openstack is applied
    2. If conversion filesystem is added after stx-openstack
       is applied, changes to stx-openstack will only take effec...

Reviewed:  https://review.opendev.org/729812
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=539d476456277c22d0dcbc3cbbc832e623242264
Submitter: Zuul
Branch:    f/centos8

commit 320cc40de8518787c2be234d7fdf88ec0a462df2
Author: Don Penney <don.penney@windriver.com>
Date:   Wed May 13 13:06:11 2020 -0400

Add auto-versioning to starlingx/config packages
    
    This update makes use of the PKG_GITREVCOUNT variable to auto-version
    the packages in this repo.
    
    Change-Id: I3a2c8caeb4b4647608978b1f2ccfcf0661508803
    Depends-On: https://review.opendev.org/727837
    Story: 2006166
    Task: 39766
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit d9f2aea0fb228ed69eb9c9262e29041eedabc15d
Author: Sharath Kumar K <sharath.kumar@intel.com>
Date:   Wed Apr 22 16:22:22 2020 +0200

De-branding in starlingx/config: CGCS -> StarlingX
    
    1. Rename CGCS to StarlingX for .spec files
    
    Test:
    After the de-brand change, bootimage.iso has been built in the flock
    Layer and installed on the dev machine to validate the changes.
    
    Please note, doing de-brand changes in batches, this is batch9 changes.
    
    Story: 2006387
    Task: 39524
    
    Change-Id: Ia1fe0f2baafb78c974551100f16e6a7d99882f15
    Signed-off-by: Sharath Kumar K <sharath.kumar@intel.com>
    
    De-branding in starlingx/config: CGCS -> StarlingX
    
    1. Rename CGCS to StarlingX for .spec file
    2. Rename TIS to StarlingX for .service files
    
    Test:
    After the de-brand change, bootimage.iso has been built in the flock
    Layer and installed on the dev machine to validate the changes.
    
    Please note, doing de-brand changes in batches, this is batch10 changes.
    
    Story: 2006387
    Task: 36202
    
    Change-Id: I404ce0da2621495175ad31489e9ad6f7b0211e26
    Signed-off-by: Sharath Kumar K <sharath.kumar@intel.com>

commit d141e954fa6bbf688929ec90d1b6604a97792c43
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Tue Mar 31 10:08:57 2020 -0400

Sysinv extensions for FPGA support
    
    This update adds cli and restapi to support FPGA device
    programming.
    
    CLI commands:
    system device-image-apply
    system device-image-create
    system device-image-delete
    system device-image-list
    system device-image-remove
    system device-image-show
    system device-image-state-list
    system device-label-list
    system host-device-image-update
    system host-device-image-update-abort
    system host-device-label-assign
    system host-device-label-list
    system host-device-label-remove
    
    Story: 2006740
    Task: 39498
    
    Change-Id: I556c2e7a51b3931b5a66ab27b67f51e3a8aebd9f
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit 491cca42ed854d2cb3ee3646b93c56a4f45f563c
Author: Elena Taivan <elena.taivan@windriver.com>
Date:   Wed Apr 29 11:25:26 2020 +0000

Qcow2 conversion to raw can be done using 'image-conversion' filesystem
    
    1. Conversion filesystem can be added before/after
       stx-openstack is applied
    2. If conversion filesystem is added after stx-openstack
       is applied, changes to stx-openstack will only take effect
       once the application is re-applied
    
    3. It is not allowed to delete image-conversion filesystem
       when stx-openstack is in applying/applied/removing state
    4. Raise alarms for image-conversion
    
    Change-Id: Ie205329b694525509b0820497186fcd9ec2e45c9
    Closes-bug: 1819688
    Depends-On: https://review.opendev.org/#/c/724270/
    Depends-On: https://review.opendev.org/724288/
    Signed-off-by: Elena Taivan <elena.taivan@windriver.com>

commit bc9cde71a0bbcd099427b8808e0bdb1b78cb9725
Author: albailey <Al.Bailey@windriver.com>
Date:   Tue May 12 14:24:17 2020 -0500

Specify an upper limit for flake8 and pycodestyle
    
    Both flake8 and pycodestyle were updated on May 11
    which caused zuul jobs to start failing.
    
    The copyrights were updated as a way of triggering
    zuul to run the flake8 jobs associated with the
    test-requirements.txt
    
    Similar solution as:
    https://review.opendev.org/#/c/727133/
    
    Change-Id: Ia2b97203e7ab767586ee7393ac08fcf781af7609
    Closes-Bug: 1878276
    Signed-off-by: albailey <Al.Bailey@windriver.com>

commit c317fb0324c93cbaeab1b635c745b806c04dc613
Author: Don Penney <don.penney@windriver.com>
Date:   Fri May 8 11:40:03 2020 -0400

Add support to sysinv-conductor to update static images
    
    As part of the sysinv-conductor init, apply the
    upgrade-static-images.yml playbook to download updated images to the
    local registry as needed.
    
    Change-Id: I726a244ae226588327ebe2f69d4131b57cebab85
    Depends-On: https://review.opendev.org/726420
    Story: 2006781
    Task: 39705
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit dac06a7a57efca8c6eeeb1021a768df4842eecb2
Author: John Kung <john.kung@windriver.com>
Date:   Wed May 6 17:56:19 2020 +0000

Revert "Update conditions for oam config change and manifest apply"
    
    Investigation into requirement for
      openstack::keystone::endpoint::runtime
    for configuring admin-ep is required.
    
    This reverts commit c1112ad2c5d6a6ee3a34bb345055d21fcd08a6d9.
    
    Change-Id: Icfe6bbcd0c0a0489aede56552ec15712f314c1c5

commit 09dc3cbcded99900feb0fca5f65542c3fa673231
Author: Robert Church <robert.church@windriver.com>
Date:   Tue May 5 15:29:33 2020 -0400

Provide an update strategy for Tiller deployment
    
    In the case of a simplex controller configuration the current patching
    strategy for the Tiller environment will fail as the tiller ports will
    be in use when the new deployment is attempted to be applied. The
    resulting tiller pod will be stuck in a Pending state.
    
    The deployment strategy provided by 'helm init' is unspecified. This
    change will allow one additional pod (current + new) and one unavailable
    pod (current) during an update. The maxUnavailable setting allows the
    tiller pod to be deleted which will release its ports, thus allowing the
    patch deployment to spin up an new pod to a Running state.
    
    This patching ensures that on an installed system where tiller has been
    manually removed and re-applied via 'helm init', it is patched
    appropriately.
    
    Change-Id: I356545d05a585f7cbbbd5ca5071aa834fb086c31
    Depends-On: https://review.opendev.org/#/c/725705/
    Closes-Bug: #1876396
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit 40463adf9aba476ba44b0dd89d4c30c9343b43b4
Author: John Kung <john.kung@windriver.com>
Date:   Mon May 4 17:13:59 2020 -0400

Fix certificate-key to 64 characters
    
    Update get_secure_static_config() to fix the
    kubernetes::kubeadm::certificate-key,  to the 64 characters
    expected by kubernetes.
    
    Change-Id: I366e6eb1dc4e764425ef2a82a493db47a080f49a
    Closes-bug: 1876755
    Signed-off-by: John Kung <john.kung@windriver.com>

commit ee72ac30762d5182ff5fa8051cd0f86a1a18efba
Author: Ovidiu Poncea <ovidiu.poncea@windriver.com>
Date:   Thu Apr 30 20:10:53 2020 +0300

Copy RBD provisioner secret to k8s namespaces only when Ceph is enabled
    
    When an application is started the secret to access kube RBD pool is
    needed in the application namespace to allow PVC creation.
    
    This commit adds a semantic check to verify that Ceph is enabled before
    attempting the copy operation.
    
    Change-Id: If890e53414df183337b563902d3566285ab27213
    Story: 2007391
    Task: 39604
    Signed-off-by: Ovidiu Poncea <ovidiu.poncea@windriver.com>

commit fbcdbf63ea3ac192a8e6dbd8588ca34399444008
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Thu Apr 30 18:19:29 2020 -0400

Use persistent backup during upgrade
    
    Use the persistent backup to store the upgrade data during simplex
    upgrades.
    
    Change-Id: I83280fdc5b2c702045a6a51b1c379758dd50baa2
    Story: 2007403
    Task: 39606
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit c1112ad2c5d6a6ee3a34bb345055d21fcd08a6d9
Author: John Kung <john.kung@windriver.com>
Date:   Thu Apr 30 11:33:52 2020 -0400

Update conditions for oam config change and manifest apply
    
    The runtime manifest apply for an oam config change was being
    triggered on host-swact to the target controller after startup.
    Thus, the config runtime manifest was being triggered even when
    there was not an oam config change.
    
    Update the runtime manifest apply for oam config to be triggered
    on active controller startup after an oam configuration change.
    
    During upgrades, disallow oam network changes as the configuration
    affects the platform and kubernetes components dependent on the
    OAM network.
    
    Tests Performed:
    bootstrap and enable duplex controllers
    bootstrap and enable AIO-SX
    host-swact after initial install and reinstall
    oam-modify and host-swact and verify oam access
    
    Change-Id: I4777891eaec05a6a39322325cec3c2ed006446da
    Story: 2007403
    Task: 39605
    Partial-Bug: 1874136
    Signed-off-by: John Kung <john.kung@windriver.com>

commit 88f2f7dc1a12327e12b006ef437b919bcef29108
Author: Paul Vaduva <Paul.Vaduva@windriver.com>
Date:   Wed Apr 22 03:19:29 2020 +0300

Fix race condition during certificate key regeneration
    
    When monitor is created on compute-1 hiera data is regenerated including
    certificate-key during controller-1 reboot as part of the unlock. When
    controller-1 boots up the join command fails as certificate key is no
    longer valid
    
    Change-Id: I99057fa1afc3648c7aa3910f95067bde7b51b033
    Closes-bug: 1873916
    Signed-off-by: Paul Vaduva <Paul.Vaduva@windriver.com>

commit 426f034c14ea6f3d292c8a3a8b8de50efe0a2171
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Thu Apr 23 19:31:34 2020 +0300

Persistent backup partition comments
    
    Add some information about the persistent partition
    in the sysinv where the partitions sizes are computed
    
    Depends-On: https://review.opendev.org/#/c/720256/
    Change-Id: Id07e38c1c8cf68c83ba393bf3e809bf892f430f5
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit 8099bbbbcf6e67190dc2ede949c47da081317e2d
Author: Elena Taivan <elena.taivan@windriver.com>
Date:   Wed Mar 25 12:33:42 2020 +0000

Add a new filesystem for image conversion
    
    Create the new host_fs CLI commands and the APIs
        system host-fs-add
        system host-fs-delete
    
    These commands will be used only for adding/removing 'image-conversion'
    filesystem dedicated only for qcow2 image conversion.
    'image-conversion' filesystem is optional.
    It is not allowed to add/remove any other filesystem.
    
    Change-Id: I87c876371e123ec1ba946170258401d220260e31
    Partial-bug: 1819688
    Depends-On: https://review.opendev.org/#/c/714936/
    Signed-off-by: Elena Taivan <stefan.dinescu@windriver.com>

commit e0d751f79060c788526ad4f3af56abe1e2308f8f
Author: Matt Peters <matt.peters@windriver.com>
Date:   Tue Apr 28 12:38:19 2020 -0500

Remove storage class backend from helm overrides
    
    Remove the storageClass parameter from the stx-monitor
    helm system overrides.  With support for different
    storage classes, the specific request for the storage
    class of "general" should not be configured so that the
    default storage class is used when not specified.
    
    NOTE: The old parameter had no effect since it should
    have been storageClassName.  However, it is being removed
    since it is confusing to the end user.
    
    Story: 2007391
    Task: 39589
    
    Change-Id: Ie690e53404df183337b563902d3566285db27313
    Signed-off-by: Matt Peters <matt.peters@windriver.com>

commit dbc41d03626d4f963f24cd83b7b417bba361a969
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Mon Apr 27 22:32:49 2020 -0400

Fix db error in creating route for dc host
    
    In creating a host route for DC, the interface id is
    required instead of the interface uuid.
    This update fixed the database error.
    
    Tested in vbox with system controller and subcloud.
    
    Closes-Bug: 1875461
    
    Change-Id: Ica81d0cd237ada1232f3fb3b3518a8d74df9ba99
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit f0b1f8b604f9cb908213113648dabc63e268aaa8
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Fri Apr 24 14:12:58 2020 -0400

Rename the existing /opt/patch-vault filesystem to /opt/dc-vault
    
    The filesystem /opt/patch-vault is created on the system controller.
    In order to re-use this filesystem to store FPGA images and software
    loads, it is renamed to /opt/dc-vault. Additionally, the default size
    of the dc-vault-lv is increased from 8G to 15G.
    
    Story: 2006740
    Task: 39550
    Change-Id: Id8cda76759da6e6c73fd24357f79658894c95a64
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>

commit 5d04b37e9074c2beedc678a98e54a6d27e5d35c7
Author: Simon Cousineau <Simon.Cousineau@windriver.com>
Date:   Mon Apr 27 12:49:02 2020 -0400

Data Collection Reduction
    
    This changeset aims to reduce the amount of data collected by
    stx-monitor. This is achieved by:
    - Dropping the load, process_summary and fsstat metricsets from the
      system module
    - Dropping the system metricset from the kubernetes module
    - Dropping percentage metrics from the cpu metricset
    - Increasing daemonset kubernetes module period from 10s to 60s
    
    Story: 2007221
    Task: 39567
    
    Change-Id: I01899ac5af8dc48313d801c3d16bff209286000b
    Signed-off-by: Simon Cousineau <Simon.Cousineau@windriver.com>

commit 382491ffde5bbafd43154fcd69f8345df9ea9bc7
Author: John Kung <john.kung@windriver.com>
Date:   Mon Apr 27 17:45:40 2020 -0400

Disallow host-lock controller-1 during upgrade-starting
    
    Add a semantic check to prevent host-lock controller-1
    when the upgrade state is 'starting'.  This is to ensure the
    database is not snapshot with N+1 controller administratively locked,
    as that is to become the N+1 active controller.
    
    Change-Id: Ia34cbe40d58920fb26be0901bce6a6966a3ec27c
    Story: 2007403
    Task: 39574
    Signed-off-by: John Kung <john.kung@windriver.com>

commit 600f0a678541368d8c973850fbabcd6b55eacf3f
Author: Robert Church <robert.church@windriver.com>
Date:   Mon Apr 27 15:04:15 2020 -0400

Include app isolated CPUs when checking for minimum app cores
    
    Add total_isolated_cores when computing the total number of reserved
    cores. This will ensure that at least one unassigned core is available
    for general applications and all CPUs will not be consumed by all
    reservations.
    
    Change-Id: Ic5b493741dbd5d626906f686c002eb4e6f5775a4
    Story: 2006999
    Task: 39573
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit 2d30ca7673acddefd22baf1d25641f3ebbf1a42a
Author: Matt Peters <matt.peters@windriver.com>
Date:   Fri Apr 24 15:01:09 2020 -0500

Remove helm plugin version checks
    
    Until the full application decoupling is completed, the helm
    plugin version enforcement is being removed since applications
    may still want to upversion the application without a change
    to the platform plugins.
    
    Full platform application compatibility will be enforced once
    the application decoupling story is completed.
    
    Story: 2006537
    Task: 39551
    
    Change-Id: Ia86fcfc2d100bad6fce5763bd2ab21a6bc3611b2
    Signed-off-by: Matt Peters <matt.peters@windriver.com>

commit ff66f652d5b5108e19030852cb30c7f395517779
Author: Matt Peters <matt.peters@windriver.com>
Date:   Fri Apr 24 11:20:01 2020 -0500

Update Logstash to use NodePort Ingress
    
    Logstash should not be using a custom port for collectd
    input from the K8s NodePort range since it might cause a
    conflict if the port is allocated to another service.
    Therefore, logstash will use a proper NodePort value
    reserved by the nginx-ingress service.
    
    Do not disable the nginx-ingress on the subcloud since it
    is required for collectd to send events to logstash.
    
    Story: 2007221
    Task: 39549
    
    Change-Id: Ibdcbcf1b217ddd17197c0e8fb6cc069a573d10a5
    Depends-On: https://review.opendev.org/#/c/722674
    Signed-off-by: Matt Peters <matt.peters@windriver.com>

commit 92828038b4cfa720c6dfc74fbdcb2e463ac5996d
Author: Robert Church <robert.church@windriver.com>
Date:   Wed Apr 22 02:50:11 2020 -0400

Enable --reserved-cpus option in k8s v1.18.1
    
    The option was introduced in k8s v1.17 and will now be used to define
    the explicit set of CPUs that are reserved for specific cpu functions in
    StarlingX.
    
    This retires setting the number of CPUs reserved in the --kube-reserved
    and --system-reserved options.
    
    Instead of calculating the number of CPUs related to reservations,
    provide the specific list of CPUs in a comma separated range format.
    This will be used by puppet to set the --reserved-cpus option based on
    cpu manager policy.
    
    Remove restrictions around CPU assignments:
    - Allow platform cores to be reserved on any processor
    - Allow application isolated cores to be reserved on any processor
    
    Change-Id: I1a3d4e4cca7b6940682a787c2e7348e56a047a06
    Depends-On: https://review.opendev.org/#/c/722189
    Story: 2006999
    Task: 39528
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit fd3a279c83de163face3cc69f551cc6f65d1cace
Author: Kevin Smith <kevin.smith@windriver.com>
Date:   Thu Apr 23 17:32:33 2020 -0400

Fix application-update reuse-user-overrides
    
    The 'maintain_user_overrides' flag in the application tarball
    metadata.yaml file is meant to indicate whether to preserve
    user overrides over application update.  The --reuse-user-overrides
    flag of the application-update command can override the setting
    in the metadata.yaml file, but the current logic means the
    'maintain_user_overrides' flag will never be checked even if the
    --reuse-user-overrides flag is not set.
    
    This update allows the maintain_user_overrides to be checked when
    the --reuse-user-overrides flag is not set.
    
    Closes-Bug: 1874552
    Change-Id: I38e009f72c432f43b1ad8744771ce32de1269736
    Signed-off-by: Kevin Smith <kevin.smith@windriver.com>

commit 1c77d6664264814e37ccf998fd1aea896235e7e6
Author: Bin Qian <bin.qian@windriver.com>
Date:   Tue Apr 7 23:58:08 2020 -0400

Set dc adminep cert and root ca cert to secure system config
    
    Extract admin endpoint cert and key pair from cert-manager to secure
    system config, for puppet to pick up and install.
    The cert and key are used to by haproxy to provide ssl termination
    on admin endpoints.
    
    Performed tests:
    Install DC, unlocked system controller 0 and 1
    Unlocked SX subcloud controller 0.
    
    Story: 2007347
    Task: 39429
    
    Depends-on: https://review.opendev.org/#/c/720270
    Depends-on: https://review.opendev.org/#/c/720224
    
    Change-Id: Idb302fffe2b4c4ae36a901377d5089a91d26a3ba
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit 0333ccbb4216300eb451004790ce8b4c7e492e6f
Author: Simon Cousineau <Simon.Cousineau@windriver.com>
Date:   Thu Apr 23 09:56:38 2020 -0400

Fix Filebeat readiness probe exceeding timeout
    
    The 7.6.0 chart upgrade added a readiness probe to the beats. The
    Filebeat readiness probe will occasionally fail, causing
    application-apply to fail. This fix addresses this issue by increasing
    Filebeat's resource limits to match those allotted to Metricbeat.
    
    Closes-Bug: 1874328
    
    Change-Id: Ie2e23bbe063fd837999ceb48cc97071034526f35
    Signed-off-by: Simon Cousineau <Simon.Cousineau@windriver.com>

commit f20970adcff43bfc1f410fd7efa211920ac33e2e
Author: John Kung <john.kung@windriver.com>
Date:   Wed Apr 22 17:02:59 2020 -0400

Fix application-update to reference inst_path
    
    Issue with directory path to metadata_file set incorrectly,
    is fixed with setting to path.inst_path.
    
    Tests Performed:
    - Verified application-update passes
    - Verified updated application stx-monitor metadata
    - Verified updated application stx-openstack metadata
    
    Change-Id: I084bf34c6e19d9c05766639160af5dbe39aa4499
    Closes-Bug: 1874284
    Signed-off-by: John Kung <john.kung@windriver.com>

commit 24a0284e3d182faac2b613ddb9f9f36c5ba3995a
Author: Robert Church <robert.church@windriver.com>
Date:   Sun Apr 19 06:22:50 2020 -0400

Patch Tiller deployment to ensure self-recovery
    
    On node startup, there appears to be a race condition between when
    kubelet sees a pod and when kubelet sees a service. Due to this race,
    required environment variable are missing to allow tiller to function
    properly.
    
    See the comment at
    https://github.com/kubernetes/kubernetes/blob/v1.18.1/pkg/kubelet/kubelet_pods.go#L566
    
    This change patches the tiller deployment to make sure the four classes
    of environment variables are present prior to starting tiller. If any
    class of variables are not present in the environment, then exit. This
    will recreate the pod and will populate the correct environment for
    tiller to function.
    
    Since the upgrade to v1.18.1, this has been seen in simplex and duplex
    controller configurations.
    
    Review https://review.opendev.org/#/c/699307/ will cover patching during
    initial provisioning via ansible. This change will check that tiller is
    patched every time the conductor starts as part of the tiller upgrade
    logic. This will cover scenarios where tiller is manually removed from
    the cluster and reinstalled via helm.
    
    This change should be reverted once StarlingX moves to helm v3.
    
    Also removed dead code: get_k8s_secret()
    
    Change-Id: Icd199ec1b1e10840094c0eae59d53838f32ffd6f
    Closes-Bug: #1856078
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit 7e10e2091497a70bb39583c0678968171790bfdf
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Tue Apr 21 22:37:50 2020 -0400

Correct alarm calculations in health check
    
    The health check would incorrectly report all alarms as management
    affecting. This was a result of moving to the fm API instead of directly
    querying the database. As we are querying the API, a tuple is never
    returned and the mgmt_affecting property is calculated to "True" or
    "False".
    
    Same root cause as this bug/change:
    https://review.opendev.org/#/c/664274/
    
    Change-Id: Ia0b8a1df9526daa5052bf977f2c8812416b7e3b9
    Story: 2007403
    Task: 39517
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit 4ccb11cb4019734e424362d677afb00dd6ecc4b6
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Tue Apr 21 11:47:37 2020 +0300

Improve host-overrides
    
    Add missing variables for DC.
    
    Central+Subclod:
    system_mode
    location
    description
    
    Subcloud:
    region_config
    region_name
    system_controller_oam_subnet
    system_controller_oam_floating_address
    system_controller_subnet
    system_controller_floating_address
    
    Partial-Bug: 1870389
    Closes-Bug: 1873617
    Change-Id: Ieb12ffc0ad769dd6ca22eb4c15f9d6d55778fd4b
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>

commit 7b8ab9ff532dca6f1bf9e1a37deef9650e790167
Author: Stefan Dinescu <stefan.dinescu@windriver.com>
Date:   Tue Apr 21 13:47:50 2020 +0000

Support host routes for storage networks
    
    Add storage network type in the list of interfaces
    that support routes
    
    Change-Id: I6fd5117006159c6622649a563d5268bbd49d05d3
    Story: 2007391
    Task: 39511
    Signed-off-by: Stefan Dinescu <stefan.dinescu@windriver.com>

commit b0e76a69277441b6becec6533214bdbbb38e6058
Author: Stefan Dinescu <stefan.dinescu@windriver.com>
Date:   Thu Dec 19 15:47:00 2019 +0200

Allow yaml formatting for controllerfs-list
    
    In oder to be easily parsed by ansible, the controllerfs-list
    command should support yaml output format.
    
    Change-Id: Ic766980645d618d54d34bd04d82339fd2cd36562
    Depends-On: https://review.opendev.org/#/c/719782/
    Partial-bug: 1854169
    Signed-off-by: Stefan Dinescu <stefan.dinescu@windriver.com>

commit e169d1caea71b63034dbe1a008616df0f7a52639
Author: Andy Ning <andy.ning@windriver.com>
Date:   Mon Apr 6 10:47:09 2020 -0400

Generate admin_url to enable https for admin endpoints
    
    This commit updated platform services' sysinv puppet plugins to
    generate proper admin_url hiera data to enable https for these endpoints
    during controller unlock.
    
    This commit also updated controller_config to copy and install dc admin
    endpoint CA cert and haproxy cert for the second controller.
    
    Change-Id: I21345a96f8a0ffb416069ff28dbcfa51b9e12359
    Story: 2007347
    Task: 39314
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit 4e0b2acdfed437e95abf789748969f26880a53a5
Author: John Kung <john.kung@windriver.com>
Date:   Thu Apr 2 10:46:56 2020 -0400

Enable duplex platform upgrades
    
    Enable the mechanism to upgrade the platform components on
    a running StarlingX system with duplex controllers.
    
    This includes upgrade updates for:
      o generation of kubernetes join_cmd to enable the N+1 controller
        to join the cluster
      o migrate of kubernetes config
      o migrate etcd on host-swact
      o migrate of DistributedCloud dcmanager and dcorch databases
    
    A maintenance release for stx3.x is required to upgrade to stx4.0
    
    Tests Performed with duplex controller: AIO-DX and Standard
    - system load-import
    - system health-query-upgrade
    - system upgrade-start
    - system host-upgrade controller-0
    - system host-lock/unlock host N while controller N, N+1
    - system host-lock/unlock controller-0 while controller N+1
    - system host-upgrade controller-1
    - system host-upgrade storage
    - system host-upgrade worker
    - system upgrade-activate
    - system upgrade-abort
    - system host-downgrade
    - system upgrade-complete
    - verified application (e.g. stx-monitor) over upgrade
    
    Change-Id: I4267c7b32b2e7b59b5ffdd8146288698962da1e0
    Story: 2007403
    Task: 39243
    Task: 39244
    Task: 39245
    Signed-off-by: John Kung <john.kung@windriver.com>

commit 4247ed2fde53aa17b51feba93421090c432084e4
Author: Carmen Rata <carmen.rata@windriver.com>
Date:   Wed Apr 15 16:40:52 2020 -0400

Update verify-license call in sysinv
    
    This commit updates the parameters required to call verify-license in
    sysinv to bring it in sync with its most recent implementation.
    
    Story: 2007403
    Task: 39433
    
    Depends-on: https://review.opendev.org/#/c/720615/
    
    Change-Id: Ie35e5bb3f1237887dfff66f4ed8d71a24f95ebdb
    Signed-off-by: Carmen Rata <carmen.rata@windriver.com>

commit 5c1361b0e81f53349d0d6715f7b627b4456147a0
Author: Robert Church <robert.church@windriver.com>
Date:   Sun Apr 19 06:23:44 2020 -0400

Update MatchNodeSelector recovery logic for NodeAffinity status
    
    NodeAffinity pods related to applications will not be removed by
    K8S automatically. These pods may block subsequent application applies
    as tiller expects these pods to be in a non failed state.
    
    This update now will look for NodeAffinity pods when the sysinv
    conductor starts. This is no longer limited to simplex nodes. This
    behavior is now observed on simplex and duplex controller configurations
    as of the upversion to k8s v1.18.1.
    
    Change-Id: I6384ffd1d14ac105e26b83c02aaa8f090e1fdde1
    Story: 2006999
    Task: 39475
    Related-Bug: #1849688
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit b1a290f0ccfa0b44af6fd7247be92f361d919467
Author: Simon Cousineau <Simon.Cousineau@windriver.com>
Date:   Fri Apr 17 10:48:09 2020 -0400

Fix beat fails to parse kubernetes.pod.labels.app
    
    Metricbeat and Filebeat fail to parse labels that are used as both
    objects and keywords in the Elasticsearch document hierarchy. This
    change addresses this issue by enabling the 'labels.dedot' and
    'annotations.dedot' options on Metricbeat kubernetes modules and
    Filebeat's kubernetes metadata processor, which automatically escapes
    conflicting labels and annotations.
    
    Story: 2007221
    Task: 39463
    
    Change-Id: Id7f6cd6fc499ea4644e16c80b68ebde19c6f59ad
    Signed-off-by: Simon Cousineau <Simon.Cousineau@windriver.com>

commit e6dd6fee38f1f180c4b611db4570021eb7c85bae
Author: Zhipeng Liu <zhipengs.liu@intel.com>
Date:   Thu Apr 2 01:03:34 2020 +0800

Add mariadb database config override to support ipv6
    
    Override "config_override" in helm/mariadb.py according to ip version.
    Test Pass on both ipv4 and ipv6 simplex.
    
    Closes-Bug: 1859641
    
    Change-Id: Ic15865105f305a8d7b93187eb51ef5aaf3d7d96e
    Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>

commit d4c2f23d4fd24fa60c79c2fe0ac7e0c6ab97887b
Author: Thomas Gao <Thomas.Gao@windriver.com>
Date:   Thu Apr 16 16:42:41 2020 -0400

Fixed invalid lldp tlv update by sysinv conductor
    
    Sysinv conductor reads tlv packets for a list of vlan names, and attempts
    to shove it into DB without checking its string size. Since in DB,
    'dot1_vlan_names' field only permits 255 char, the DB update can fail.
    
    This fix truncates the list of vlan names to ensure it is under 255 char.
    Unit tests are added to verify the conductor behavior.
    
    Closes-Bug: 1866230
    
    Change-Id: Ibe0f06bc5c6a96573a338ebbb991bfc88cde6fb4
    Signed-off-by: Thomas Gao <Thomas.Gao@windriver.com>

commit 2e40c98ed07abad6cc84b32b129cac52baea794f
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Wed Apr 15 10:55:41 2020 -0400

Use ansible for simplex upgrade start
    
    Use the ansible backup playbook for simplex upgrade start. Pass the
    backup location and filename to the playbook.
    
    Change-Id: I624e38adfb5a7d4c1193da0dfe29991492f41d6a
    Story: 2007403
    Task: 39427
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit f64ae62e4dfede86ad821aa8282a783f3c406c8d
Author: Tao Liu <tao.liu@windriver.com>
Date:   Thu Apr 16 10:04:52 2020 -0400

Support subcloud deploy upload the common files
    
    Define a constant for /opt/platform/deploy/<version>
    
    Partial-Bug: 1864508
    
    Change-Id: Ide43993992aeae830631a0c1bb8ee377990a6974
    Signed-off-by: Tao Liu <tao.liu@windriver.com>

commit 49f93b5d6d4d30d5717753efe499485ea15cca8f
Author: Simon Cousineau <Simon.Cousineau@windriver.com>
Date:   Tue Apr 14 13:34:03 2020 -0400

Add system fields to container logs
    
    These changes move the system fields out from Filebeat's 'log' input
    config so that they are added to all log inputs. System fields are now
    added to autodiscovered container logs as well.
    
    Change-Id: I4810df8c79f69029347554124849ee44068f5e5f
    Signed-off-by: Simon Cousineau <Simon.Cousineau@windriver.com>

commit 994068cbd88f9eb3df99a4bff016df73493285e8
Author: Simon Cousineau <Simon.Cousineau@windriver.com>
Date:   Mon Apr 13 11:46:07 2020 -0400

Container logs collected without Kubernetes metadata
    
    Container logs are now being collected using Filebeat's 'container'
    input.
    This change excludes container logs from being collected by the 'log'
    input, so that the logs can be enriched with Kubernetes metadata.
    
    Depends-On: https://review.opendev.org/#/c/719585/
    Change-Id: Ia7ed274975bfe4c4a5bd0dc78f256fa3fae23d5f
    Signed-off-by: Simon Cousineau <Simon.Cousineau@windriver.com>

commit b1ca87c7cfca4ac493fe8ef6e57de4d425effba2
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Fri Apr 10 16:48:49 2020 +0300

Change ceph manager port
    
    Free port 5001 to be used by keystone.
    
    Story: 2007347
    Task: 39391
    
    Depends-On: I45ee810c9b4686d98c246c3a73f21f0de4ba76a1
    Change-Id: Ie2901a5affc803e0c86af6a94ed27bfa9cd9d458
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>

commit a6f0615860742b4323a6967a0f9a0059aabb1550
Author: Robert Church <robert.church@windriver.com>
Date:   Mon Mar 23 20:35:46 2020 -0400

Update get_kube_versions to align with v1.18.1
    
    Change-Id: Ib5b2cb2849a2865b8e31bc37a84d35bb9736f131
    Story: 2006999
    Task: 39341
    Depends-On: https://review.opendev.org/#/c/718568/
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit 362d905dad25369bf116bb1e34a659f33b7260af
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Fri Apr 10 11:31:06 2020 +0300

Improve host-overrides
    
    Add distributed cloud role information in the host overrides.
    The restore playbook needs this information.
    
    Partial-Bug: 1870389
    Change-Id: I278f19be32d1fe87687feb75e26b2898237de86f
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>

commit 55ce64cc58c3548e66b0e2aee454087f5d17c23d
Author: Carmen Rata <carmen.rata@windriver.com>
Date:   Tue Apr 7 15:11:35 2020 -0400

Refactor obsolete versions usage in sysinv
    
    This commit removes obsolete version checks from sysinv code.
    
    Story: 2007403
    Task: 39226
    
    Change-Id: Ibc5ba1d65c16971926dfd3aae05564fbb314aa1b
    Signed-off-by: Carmen Rata <carmen.rata@windriver.com>

commit a68e15140886a8ed31a40ce8186012b25de77b87
Author: Jerry Sun <jerry.sun@windriver.com>
Date:   Fri Mar 27 14:15:52 2020 -0400

Support adding admission plugin post bootstrap
    
    This commit adds a system service parameter for admission plugins of
    kube-apiserver. We need this for pod security plugin. Starting pod
    security plugin without any policies will result in all pods being
    denied. This means pod security plugin must be started by service
    parameter after bootstrap.
    
    Story: 2007351
    Task: 38897
    Depends-On:  https://review.opendev.org/#/c/717374
    
    Change-Id: I1a7e19f85a4be609112765c975bb81a248217168
    Signed-off-by: Jerry Sun <jerry.sun@windriver.com>

commit fb8ae2dbae2e6d441579b04a9629439e2cced3c8
Author: Sharath Kumar K <sharath.kumar@intel.com>
Date:   Mon Apr 6 09:53:28 2020 +0200

De-branding in starlingx/config: Titanium Cloud -> StarlingX
    
    1. Rename Titanium Cloud to StarlingX for .spec files
    2. Rename Titanium Cloud to StarlingX for .service file
    
    Test:
    After the de-brand change, bootimage.iso has built in the flock layer
    and installed on the dev machine to validate the changes.
    
    Please note, doing de-brand changes in batches, this is batch4 changes.
    
    Story: 2006387
    Task: 36202
    
    Change-Id: I708a1edb07dcd21a623fa484bb3b935c5180d089
    Signed-off-by: Sharath Kumar K <sharath.kumar@intel.com>

commit b101cc1719e356baac24b7eda3f7ff2bdd5e984d
Author: Ambarish Das <ambarish.das@intel.com>
Date:   Fri Apr 3 21:38:17 2020 -0500

Clean up: python libvirt removed from test requirement of sysinv
    
    This patch removes the dependency of libvirt-python from
    test_requirements.txt file of sysinv.This package is no more
    used by sysinv and generates error in "tox" execution.
    
    Closes-Bug:#1869318
               libvirt-python in test requirement throws error in tox build
               for py27 for config module
    
    Change-Id: I6f662159d5d71465079746755dabc8c063d9a158
    Signed-off-by: Ambarish Das <ambarish.das@intel.com>

commit 01c8b191d19ed6dd7a0d6475aa3a439890e43379
Author: Carmen Rata <carmen.rata@windriver.com>
Date:   Fri Mar 27 14:03:28 2020 -0400

Config updates for stx3.0 upgrades
    
    Update controllerconfig to remove non-platform openstack components
    and fix db barbican migration.
    Create RPC call to allow to touch /etc/platform/.upgrade_controller_1
    Remove not needed upgrade-scripts.
    Obsolete software version related fixes.
    
    Story: 2007403
    Task: 39086
    Task: 39087
    Task: 39182
    Task: 39183
    Task: 39226
    
    Change-Id: I28e746f3d267c322f59402beaf25c271138a124d
    Signed-off-by: Carmen Rata <carmen.rata@windriver.com>

commit 898d48afe5ee894277246642e3533113771d1672
Author: Simon Cousineau <Simon.Cousineau@windriver.com>
Date:   Mon Mar 2 16:23:19 2020 -0500

Update helm overrides for elastic helm charts 7.6.0
    
    Update filebeat overrides to use "filebeatConfig" parameter for config
    files.
    Update logstash "replicas" and "elasticseachHosts" overrides.
    Update metricbeat module overrides to conform to metricbeat's
    configuration format.
    
    Story: 2007221
    Task: 38473
    Task: 38476
    Task: 38477
    Task: 38478
    
    Change-Id: Ie27916c1e26c4c1ada25c15277daa0598f7599b5
    Depends-On: https://review.opendev.org/#/c/708730/

commit d7ba6775212401f2bfc0bee04febe661152e504d
Author: Kevin Smith <kevin.smith@windriver.com>
Date:   Mon Mar 23 19:06:49 2020 -0400

Wait for pod termination on stx-monitor remove
    
    On removal of the stx-monitor application, wait for all pods
    to have terminated before moving to 'uploaded' status.
    This will prevent the user from issuing an application-delete
    command which could possibly timeout.
    
    Change-Id: I116a98bdc60a4a7fe05e50eb9b4ddd4e6ef2e24f
    Closes-Bug: 1868567
    Signed-off-by: Kevin Smith <kevin.smith@windriver.com>

commit 423a475aff4f9ea1b60af6a9a2989027d1506f10
Author: Shuicheng Lin <shuicheng.lin@intel.com>
Date:   Thu Mar 12 14:06:08 2020 +0800

Refresh local registry auth info each time when access local registry
    
    Local registry uses admin account password as authentication info.
    And this password may be changed by openstack client at any time.
    When sysinv tries to download images from local registry, it cannot
    cache the auth info, otherwise it may lead to authentication failure
    in keystone, and account be locked at the end.
    
    Partial-Bug: 1853017
    
    Change-Id: I07f273a05a1bc3c08b48d13c94eb6df6aecdf7c3
    Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-06-03: Fix merged to ansible-playbooks (f/centos8)

#72

Download full text (22.6 KiB)

Reviewed: https://review.opendev.org/729809
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=73027425d4501a6b7785e91024c9e8ddbc03115d
Submitter: Zuul
Branch: f/centos8

commit 55c9afd075194f7669fa2a87e546f61034679b04
Author: Dan Voiculeasa <email address hidden>
Date: Wed May 13 14:19:52 2020 +0300

Restore: disconnect etcd from ceph

At the moment etcd is restored only if ceph data is kept.
Etcd should be restored regardless if ceph data is kept or wiped.

    Story: 2006770
    Task 39751
    Change-Id: I9dfb1be0a83c3fdc5f1b29cbb974c5e0e2236ad3
    Signed-off-by: Dan Voiculeasa <email address hidden>

commit 003ddff574c74adf11cf8e4758e93ba0eed45a6a
Author: Don Penney <email address hidden>
Date: Fri May 8 11:35:58 2020 -0400

Add playbook for updating static images

This commit introduces a new playbook, upgrade-static-images.yml, used
for downloading updating images and pushing to the local registry.

    Change-Id: I8884440261a5a4e27b40398e5a75c9d03b09d4ba
    Story: 2006781
    Task: 39706
    Signed-off-by: Don Penney <email address hidden>

commit 26fd273cf5175ba4bdd31d6b6b777814f1a6c860
Author: Matt Peters <email address hidden>
Date: Thu May 7 14:29:02 2020 -0500

Add kube-apiserver port to calico failsafe rules

    An invalid GlobalNetworkPolicy or NetworkPolicy may prevent
    calico-node from communicating with the kube-apiserver.
    Once the communication is broken, calico-node is no longer
    able to update the policies since it cannot communicate to
    read the updated policies. It can also prevent the pod
    from starting since the policies will prevent it from
    reading the configuration.

    To ensure that this scenario does not happen, the kube-apiserver
    port is being added to the failsafe rules to ensure communication
    is always possible, regardless of the network policy configuration.

    Change-Id: I1b065a74e7ad0ba9b1fdba4b63136b97efbe98ce
    Closes-Bug: 1877166
    Related-Bug: 1877383
    Signed-off-by: Matt Peters <email address hidden>

commit bd0f14a7dfb206ccaa3ce0f5e7d9034703b3403c
Author: Robert Church <email address hidden>
Date: Tue May 5 15:11:15 2020 -0400

Provide an update strategy for Tiller deployment

    In the case of a simplex controller configuration the current patching
    strategy for the Tiller environment will fail as the tiller ports will
    be in use when the new deployment is attempted to be applied. The
    resulting tiller pod will be stuck in a Pending state.

    This will be observed if the node becomes ready after 'helm init'
    installs the initial deployment and before the deployment is patched for
    environment checks.

    The deployment strategy provided by 'helm init' is unspecified. This
    change will allow one additional pod (current + new) and one unavailable
    pod (current) during an update. The maxUnavailable setting allows the
    tiller pod to be deleted which will release its ports, thus allowing the
    patch deployment to spin up an new pod to a Running state.

Change-Id: I83c43c52a77...

Reviewed:  https://review.opendev.org/729809
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=73027425d4501a6b7785e91024c9e8ddbc03115d
Submitter: Zuul
Branch:    f/centos8

commit 55c9afd075194f7669fa2a87e546f61034679b04
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Wed May 13 14:19:52 2020 +0300

Restore: disconnect etcd from ceph
    
    At the moment etcd is restored only if ceph data is kept.
    Etcd should be restored regardless if ceph data is kept or wiped.
    
    Story: 2006770
    Task 39751
    Change-Id: I9dfb1be0a83c3fdc5f1b29cbb974c5e0e2236ad3
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>

commit 003ddff574c74adf11cf8e4758e93ba0eed45a6a
Author: Don Penney <don.penney@windriver.com>
Date:   Fri May 8 11:35:58 2020 -0400

Add playbook for updating static images
    
    This commit introduces a new playbook, upgrade-static-images.yml, used
    for downloading updating images and pushing to the local registry.
    
    Change-Id: I8884440261a5a4e27b40398e5a75c9d03b09d4ba
    Story: 2006781
    Task: 39706
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit 26fd273cf5175ba4bdd31d6b6b777814f1a6c860
Author: Matt Peters <matt.peters@windriver.com>
Date:   Thu May 7 14:29:02 2020 -0500

Add kube-apiserver port to calico failsafe rules
    
    An invalid GlobalNetworkPolicy or NetworkPolicy may prevent
    calico-node from communicating with the kube-apiserver.
    Once the communication is broken, calico-node is no longer
    able to update the policies since it cannot communicate to
    read the updated policies.  It can also prevent the pod
    from starting since the policies will prevent it from
    reading the configuration.
    
    To ensure that this scenario does not happen, the kube-apiserver
    port is being added to the failsafe rules to ensure communication
    is always possible, regardless of the network policy configuration.
    
    Change-Id: I1b065a74e7ad0ba9b1fdba4b63136b97efbe98ce
    Closes-Bug: 1877166
    Related-Bug: 1877383
    Signed-off-by: Matt Peters <matt.peters@windriver.com>

commit bd0f14a7dfb206ccaa3ce0f5e7d9034703b3403c
Author: Robert Church <robert.church@windriver.com>
Date:   Tue May 5 15:11:15 2020 -0400

Provide an update strategy for Tiller deployment
    
    In the case of a simplex controller configuration the current patching
    strategy for the Tiller environment will fail as the tiller ports will
    be in use when the new deployment is attempted to be applied. The
    resulting tiller pod will be stuck in a Pending state.
    
    This will be observed if the node becomes ready after 'helm init'
    installs the initial deployment and before the deployment is patched for
    environment checks.
    
    The deployment strategy provided by 'helm init' is unspecified. This
    change will allow one additional pod (current + new) and one unavailable
    pod (current) during an update. The maxUnavailable setting allows the
    tiller pod to be deleted which will release its ports, thus allowing the
    patch deployment to spin up an new pod to a Running state.
    
    Change-Id: I83c43c52a77bce9f085bfb6c6a2c4171f2ba8f97
    Partial-Bug: #1876396
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit 0dc9e173855792c38bec90360c0c4c066c36d66b
Author: Robert Church <robert.church@windriver.com>
Date:   Mon May 4 12:59:49 2020 -0400

Ensure containerd binds to the loopback interface
    
    Set the stream_server_address to bind to the loopback interface with a
    value of "127.0.0.1" for IPv4 and "::1" for IPv6.
    
    This will explicitly update the containerd configuration to use the IP
    address of the loopback interface based on the system's network
    configuration.
    
    Change-Id: I76a4ad1c123b8b701cb1fa74b16609b50cdf9bd2
    Partial-Bug: #1875891
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit 2ea3ce6a7fdff5c2079acd76bd8eee7001b4127c
Author: Andy Ning <andy.ning@windriver.com>
Date:   Thu Apr 30 13:41:33 2020 -0400

Increase wait time for certificate during subcloud bootstrap
    
    Currently during subcloud ansible bootstrap, it waits up to 15s for
    certificate secret to be ready after the yaml file applies. For some
    slow hosts (VBox for example) 15s appears not long enough so the
    extracted certificate is partial, which in turn fails haproxy.
    
    This commit updates to use the better "kubectl wait" mechanism to wait
    for the certificate to be ready, with a timeout of 30s.
    
    Change-Id: Ibd8cab9339c6d532353b45b49cc4d141f0cf5ace
    Closes-Bug: 1876099
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit d05785ffd9add6553662fcab43f30bf8d9f6d2e3
Author: Stefan Dinescu <stefan.dinescu@windriver.com>
Date:   Fri Apr 24 10:48:20 2020 +0000

Upversion Netapp application
    
    Changes included in this commit:
    - updated netapp required docker images
    - add support for PVC snapshots (beta feature since K8s
      1.17);
    - create new ansible role for enabling PVC snapshot
      support and start required pod
    - import role for bootstrap as well, so any backend
      added in the future will also have support enabled
      by default
    - also use snapshot role for the netapp backend
      configuration (for upgrade considerations)
    - change netapp backend configuration of mapping backends
      and storage classes from 1-to-1 mapping to many-to-many
      mapping; instead of one backend configured for each
      storage-class, now any number of backends can be
      configured for any number of storage classes
    - add a new VolumeSnapshotClass configuration option for
      PVC snapshot support
    
    Change-Id: Ib1cf5a5b46f24a6864ac6d894e37db8732e0c6fb
    Depends-On: https://review.opendev.org/#/c/724237/
    Story: 2007391
    Task: 39566
    Signed-off-by: Stefan Dinescu <stefan.dinescu@windriver.com>

commit 204641a5b3082c9873109169f93ae1845eb79813
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Wed Apr 15 15:54:58 2020 +0300

DC subcloud restore registry.central certs
    
    During restore a certificate is missing.
    Docker needs the certificate to connect to registry.central.
    Extract it from backup archive.
    
    Closes-Bug: 1870389
    
    Depends-On: I64c8b38a51bf04714931d70e126e0f63782deb20
    Depends-On: Ieb12ffc0ad769dd6ca22eb4c15f9d6d55778fd4b
    Depends-On: I86166da31491736d6695e04fa287f79871975b55
    Depends-On: Iebab8dc059435c7e2b0f19947fedce88bd71bb65
    Depends-On: I278f19be32d1fe87687feb75e26b2898237de86f
    
    Change-Id: Ief65a8963b81ef489171c264964d472a66fec282
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>

commit acd84841d201f1d5777edd2996086732cb3a3104
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Thu Apr 23 17:37:23 2020 +0300

Fix SystemController filesystem at restore
    
    The filesystem `dc-vault` is created at unlock.
    It doesn't exist at restore time to be resized.
    It will be correctly sized during unlock.
    
    It is not mounted into /dev/cgts-vg/dc-vault-lv.
    
    Closes-Bug: 1873617
    Change-Id: Ia2748756eaa8109065af1848374cc058c447910e
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>

commit 885cfe61269a43c7cff7e56732baefc2190d5be1
Author: Bin Qian <bin.qian@windriver.com>
Date:   Wed Apr 29 11:58:14 2020 -0400

Set root certification duration
    
    Setting root certification to 5 years and renew 30 days ahead.
    
    Change-Id: I780edaab0c041a0db1e9faf47bcd473e20068247
    Story: 3007347
    Task: 39428

commit 54e9b94773f3ae9c6be7eb14e141537cad373915
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Wed Apr 22 15:44:15 2020 +0300

Fix restore without ceph backend
    
    When ceph backend is not configured there is no ceph crushmap to be
    restored, nor ceph monitors data. Skip restoring those.
    
    The rest of the logic regarding ceph osds can be treaded as if osds were
    wiped.
    
    Closes-Bug: 1873974
    Depends-On: Ic2b7a77f4a54d3d30aedd6c00747fc4586428997
    Change-Id: I2776d7c2d5801ce6e81c487da263075b6f6873c8
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>

commit dd89ba118d21027da28f860f2da47e6794d0453b
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Wed Apr 22 13:32:21 2020 +0300

Fix backup without ceph backend
    
    When ceph backend is not configured there is no ceph crushmap to be
    backed up. Skip the crushmap backup step.
    
    Partial-Bug: 1873974
    Change-Id: Ic2b7a77f4a54d3d30aedd6c00747fc4586428997
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>

commit 3bb26d81d51f0590dba2a19caf9cc430673f6018
Author: Andy Ning <andy.ning@windriver.com>
Date:   Wed Apr 8 09:42:10 2020 -0400

Setup https admin endpoint certificates for subcloud
    
    This commit updated ansible bootstrap to generate, install and
    configure certificates for https enabled admin endpoints. This change
    applies to subcloud of a DC system only.
    
    The subcloud admin endpoint certificate has valid duration of 180 days
    and renew before of 30 days.
    
    Tests:
      - Successfully deploy subcloud by "dcmanager subcloud add"
      - Verify haproxy admin endpoint certificate is generated and
        installed properly in subcloud.
      - Verify DC admin endpoint root CA certificate is installed in
        subcloud's trusted CA cert list in subcloud.
      - Verify the haproxy admin endpoint certificate can be validiated by
        the DC endpoint root CA certificate successfully in subcloud.
    
    Change-Id: Ib24d27ac4cafe345fb57ba906ea5baf0930af892
    Story: 2007347
    Task: 39465
    Depends-On: https://review.opendev.org/#/c/720224/
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit 2b287b1050fa2b1a7b5f5d983eaa634a055b8ec2
Author: Bin Qian <bin.qian@windriver.com>
Date:   Tue Apr 7 23:48:11 2020 -0400

Install dc root cert
    
    This is to create a distributed cloud specific root CA issuer with
    cert-manager.
    
    The root CA issuer is to authorize intermediate issuers for each
    subcloud, the latter then to issue certificate for admin endpoints.
    
    Test cases:
    Bootstrap systemcontroller from local/remote
    Replay systemcontroller bootstrap playbook
    
    Story: 3007347
    Task: 39428
    
    Change-Id: I7546d6562f0bc072c3cf76f422a258a2c32b4a34
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit 36a01e8ba38f3e0d1e2ea7a2bce31edbedfde04e
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Tue Apr 21 17:54:53 2020 +0300

B&R: Do keystone db backup for subcloud
    
    Keystone db backup file is missing for subclouds.
    Create the keystone db backup file when running the backup playbook on
    subcloud.
    
    Partial-Bug: 1870389
    Change-Id: I64c8b38a51bf04714931d70e126e0f63782deb20
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>

commit df25466798d2487c933f7d2fc1d04ec968f4bcd2
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Fri Apr 24 15:23:37 2020 -0400

Rename the existing /opt/patch-vault filesystem to /opt/dc-vault
    
    The filesystem /opt/patch-vault is renamed to /opt/dc-vault so that
    it can be re-used to store FPGA images and software loads. Thus,
    necessary changes have been made to the ansible playbook files.
    
    Change-Id: I3358fe2d87c79785a8803815b1bbd2727ae80a24
    Story: 2006740
    Task: 39550
    Depends-On: https://review.opendev.org/#/c/723007/
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>

commit d3341102189031551e8d4d194e42d86d8878920f
Author: Jerry Sun <jerry.sun@windriver.com>
Date:   Sun Apr 19 21:30:57 2020 -0400

Enable applying applications after bootstrap
    
    This commit adds the ability to specify applications to be applied
    directly after bootstrap, before controller-0 have been unlocked.
    This is needed for cert manager.
    
    Currently, nginx and cert manager will be applied by default, with
    no overrides. The user can optionally specify overrides if they wish
    
    NOTE: This aligns with long term direction for platform applications
    to:
    - move away from the existing platform application framework in sysinv
      due to wanting to decouple application behaviour from sysinv code
      in order to support such things as independent upgrades of these
      platform applications.
    - support auto-upload/apply of platform applications in either:
         a) bootstrap playbook, if app required for supporting bootstrap
            functions, or
         b) a post-bootstrap deployment-type playbook.
    In the case of cert-manager, in near future, it will be required at
    bootstrap to support initial configuration around generating
    certificates for kubernetes and https connections.
    
    Story: 2007360
    Task: 39471
    
    Change-Id: I91ee31c7c2d35c2a101b156ef8633fc69139938d
    Signed-off-by: Jerry Sun <jerry.sun@windriver.com>

commit 0a1c06a66bc286b306bfdf4ada7cf823787b7a94
Author: Tao Liu <tao.liu@windriver.com>
Date:   Tue Apr 21 15:36:29 2020 -0400

Increase wait timeout for service endpoints reconfig
    
    Install/bootstrap HP EL8000 as subcloud timed out, while
    waiting for endpoints reconfiguration to complete
    during bootstrapping.
    
    This server has a single processor which takes around 9 mins
    to apply the runtime manifest, which is greater timeout
    value than 450 seconds. In general, everything is slower on this
    particular hardware, e.g. install is slower and cli commands
    take almost twice longer to complete than other servers.
    
    This update increases the endpoints reconfiguration wait
    timeout to 720 seconds which provides a safety margin.
    
    Testcases:
    Install/bootstrap HP EL8000 as a subcloud.
    
    Closes-Bug: 1871699
    
    Change-Id: If284281aa13e79cc14d0369e44e8cacebb24f415
    Signed-off-by: Tao Liu <tao.liu@windriver.com>

commit abbf21f7fcef00e90e75d393f638a73d58b41adb
Author: Robert Church <robert.church@windriver.com>
Date:   Mon Dec 16 12:53:10 2019 -0500

Patch tiller deployment to provide environment validation
    
    There appears to be a race condition between when kubelet sees a pod and
    when kubelet sees a service. Due to this race, required environment
    variable are missing to allow tiller to function properly.
    
    See the comment at
    https://github.com/kubernetes/kubernetes/blob/v1.18.1/pkg/kubelet/kubelet_pods.go#L566
    
    This change patches the tiller deployment to make sure the four classes
    of environment variables are present prior to starting tiller. If any
    class of variables are not present in the environment, then exit. This
    will recreate the pod and will populate the correct environment for
    tiller to function.
    
    Since the upgrade to v1.18.1, this has been seen in simplex and duplex
    controller configurations.
    
    This will cover patching during initial provisioning via ansible and
    will be reverted once StarlingX moves to helm v3.
    
    Change-Id: I78e43459fedab611a67b8d9b6b3121b78ef048a6
    Partial-Bug: #1856078
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit 9a8136b5b11a874da9a5b67519a59b27530b4aad
Author: Tao Liu <tao.liu@windriver.com>
Date:   Sat Apr 18 13:54:45 2020 -0400

Backup & restore: subcloud deploy files
    
    Backup the subcloud deploy files if available on the system.
    Restore the subcloud deploy files if included in the archive.
    
    Testcases:
    Backup & restore System Controller with the subcloud deploy
    files.
    Backup & restore a regular system without the subcloud
    deploy files
    
    Partial-Bug: 1864508
    
    Change-Id: Ic14f6c02dd187a082b03458b0a766c690400e317
    Signed-off-by: Tao Liu <tao.liu@windriver.com>

commit 40cfef7c417709c234e50a1a034fb4a11dbf180a
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Tue Apr 14 14:18:29 2020 +0300

Remove subcloud task from restore mode
    
    A task supposed to run only during bootstrap is running during restore.
    
    Keystone dc variables (dc_admin_user_id and dc_admin_project_id) are
    added during bootstrap to hieradata static.yaml file.
    When doing the restore the information is already present in the file in
    the backup archive.
    
    Partial-Bug: 1870389
    Change-Id: Iebab8dc059435c7e2b0f19947fedce88bd71bb65
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>

commit 5cdd394cb10c2c2d94174fdc32beb989290c6de9
Author: Stefan Dinescu <stefan.dinescu@windriver.com>
Date:   Thu Dec 19 15:23:23 2019 +0200

Resize DRBD resources when doing a restore
    
    In cases where we do a backup of a system that has non-default
    sizes for drbd-backed partitions, the restore fails when first
    unlocking controller-0.
    
    The normal resize procedure requires all controller nodes to
    be unlocked and available because the puppet manifest does
    not support resizing at unlock.
    
    To prevent the issue from occuring, as part of the restore
    procedure, we should resize the partitions on controller-0
    with the proper sizes found in sysinv. Controller-1 will
    automatically create the partitions with the proper sizes
    from the very start, so it will not need any resizes.
    
    Change-Id: Ia73452ce721514d393b486a659730d0ca7c0d7e5
    Closes-bug: 1854169
    Depends-on: https://review.opendev.org/#/c/699990
    Signed-off-by: Stefan Dinescu <stefan.dinescu@windriver.com>

commit a027bcf50a037166f84d897e22535c8dedf2590f
Author: Robert Church <robert.church@windriver.com>
Date:   Mon Mar 23 20:32:08 2020 -0400

Support for upversioning of k8s to v1.18.1
    
    Changes include:
    - Renamed the v1.16.2 versioned directories to v1.18.1.
    - Updated kubeadm.yaml to align the kubernetesVersion and enable the
      featureGate for multiple hugepage support
    
    Change-Id: I7241164f0185496093c0c8b5cb541fd09926b2ed
    Story: 2006999
    Task: 39334
    Depends-On: https://review.opendev.org/#/c/718568/
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit 1b50022d55a9da2bbab284b1fdda2ddc78c30c79
Author: Shuicheng Lin <shuicheng.lin@intel.com>
Date:   Wed Apr 8 10:57:50 2020 +0800

Fix account be locked due to access registry without password
    
    Correct code to let exception be raised when password cannot be
    got from keyring. Account is locked due to exception is not raised,
    and client try to access registry with None password, which is
    incorrect.
    
    Closes-Bug: #1871141
    Change-Id: Ia68b4a4f25756fdad7a198a31d5870245ff9dc1a
    Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>

commit 9080db419d559d3d5d33c0a6459e9f5e8b7700e5
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Thu Apr 9 16:07:30 2020 +0300

Add registry.central host for DC subcloud restore
    
    During bootstrap management network is temporarly assigned on lo
    interface. Backup archive contains /etc/resolv.conf and /etc/hosts
    of an already unlocked controller. Before backup registry.central is
    resolved through dns (nameserver `floating central management`).
    
    During restore a temporary host for registry.central must be created.
    Since there is no reference of a backup/shadow management network that
    provides connectivity for such use cases the `floating central oam`
    can be used.
    
    Partial-Bug: 1870389
    
    Change-Id: I86166da31491736d6695e04fa287f79871975b55
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>

commit 46e9c405cb13972a3bf08cbfcdfe4181c12b3cfc
Author: Jerry Sun <jerry.sun@windriver.com>
Date:   Fri Mar 27 14:09:45 2020 -0400

Add default pod security policies
    
    This commit adds default pod security policies. We need this
    pod security plugin. Starting pod security plugin without any
    policies will result in all pods being denied. These default
    policies prevent the user from putting the system into an
    unusable state if they accidentally enable pod security
    policies without adding policies first.
    
    Story: 2007351
    Task: 38897
    
    Change-Id: Iac49f81ef44e6cb82ff884717888dfc1a7cd2a45
    Signed-off-by: Jerry Sun <jerry.sun@windriver.com>

commit f3340a3b5379f8c33de42aeaf11e96cc886df020
Author: Stefan Dinescu <stefan.dinescu@windriver.com>
Date:   Tue Apr 7 11:36:19 2020 +0300

Backup & restore: Restore license files
    
    STX offers support for installing license files through the
    "system license-install" command.
    
    While, these licenses are not enforced, they are part of the
    backups created, but they are not restored when doing a full
    backup & restore.
    
    Since license is optional, it is not expected to always be
    present in the backup archive, so we only restore it if it
    is present in the archive.
    
    Change-Id: Ibd4cdcb53d1d55409d947c1f3af45659ed21a7ae
    Closes-bug: 1871034
    Signed-off-by: Stefan Dinescu <stefan.dinescu@windriver.com>

commit 5c542524e4cd9fb65da698c1d4cba4d50f56bdab
Author: Shuicheng Lin <shuicheng.lin@intel.com>
Date:   Wed Apr 1 15:58:07 2020 +0800

Add kubelet_vol_plugin_dir definition to fix ansible failure
    
    When do host-swact, upgrade-k8s-networking.yml will be called to check
    calico upgrade. And kubelet_vol_plugin_dir is missed in definition
    and cause ansible fail. Add definition from main.yml to fix it.
    
    Closes-Bug: 1870038
    Change-Id: I30287ebca7f0d4a1d3c5ee656136375a7b1c182f
    Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>

commit d6cff0496dcf52655eba340e1e57b1d973040edf
Author: Shuicheng Lin <shuicheng.lin@intel.com>
Date:   Thu Mar 12 14:34:09 2020 +0800

Refresh local registry auth info each time when access local registry
    
    Local registry uses admin account password as authentication info.
    And this password may be changed by openstack client at any time.
    When try to download images from local registry, auth info cannot
    be cached, otherwise it may lead to authentication failure in keystone,
    and account be locked at the end.
    For this specific case, there is host-swact first, then function
    "_upgrade_downgrade_kube_networking" in sysinv conductor is called.
    And upgrade-k8s-networking.yml is executed which will try to download
    kube network images from local registry. During this period, admin
    account password is changed. And lead to account be locked due to
    authentication failure in keystone.
    With this update, there is still possibility that password be changed
    just after get operation. And due to the images download are run in
    parallel with multi threads, so account lock may still hit. This
    change could minimize the issue rate, but cannot fix all.
    
    Closes-Bug: 1853017
    
    Change-Id: I686616937031a3f7ac6d65e5b118511dc549ab85
    Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>

Ghada Khalil (gkhalil) on 2020-06-28

tags:

added: in-r-stx30

StarlingX

After keystone admin password changed, user account locked

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches