After keystone admin password changed, user account locked

Bug #1853017 reported by Peng Peng
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Lin Shuicheng

Bug Description

Brief Description
-----------------
changing keystone admin password. After 180 secs, all system CMD failed by user account locked.

Severity
--------
Major

Steps to Reproduce
------------------
as description

TC-name: security/test_keystone_admin_psswd_change.py::test_admin_password

Expected Behavior
------------------
system CMD all working with new password

Actual Behavior
----------------
user account locked

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Multi-node system

Lab-name: WCP_71-75

Branch/Pull Time/Commit
-----------------------
2019-11-15_20-00-00

Last Pass
---------
2019-11-08_20-00-00

Timestamp/Logs
--------------
[2019-11-17 04:38:28,728] 311 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password '!Li69nux*9' admin'

[2019-11-17 04:41:31,564] 311 DEBUG MainThread ssh.send :: Send 'keyring get CGCS admin'
[2019-11-17 04:41:32,142] 433 DEBUG MainThread ssh.expect :: Output:
!Li69nux*9

[2019-11-17 04:41:32,246] 311 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list'
[2019-11-17 04:41:33,067] 433 DEBUG MainThread ssh.expect :: Output:
The account is locked for user: dcd90f0c830f467b92ff4cf3e6c4bb5a. (HTTP 401) (Request-ID: req-1b6dc762-f417-4801-9900-040b0e5e39e7)

Test Activity
-------------
Regression Testing

Revision history for this message
Peng Peng (ppeng) wrote :
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0 / high priority - appears to have been broken in the last week.

Changed in starlingx:
importance: Undecided → High
status: New → Triaged
tags: added: stx.3.0 stx.security
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → yong hu (yhu6)
tags: added: stx.distro.openstack
Yang Liu (yliu12)
tags: added: stx.retestneeded
Revision history for this message
yong hu (yhu6) wrote :

The issue was reproduced at:

the password for admin was indeed changed by following command: openstack user set --password 'newpassword' admin, and it also updated to "keyring". However it was not timely reflected to sysinv, so auth for "system" commands would fail if the user name and password were not *explicitly" set by:
--os-username 'admin' --os-password 'newpassword'

Will look into the cause and what recent change led to this issue.

Revision history for this message
yong hu (yhu6) wrote :

The root cause was found:
After changing the password for "admin", it took effect in keyring. That's why "keyring get CGCS admin" returns the correct password.
However, the local environment OS_PASSWORD (which was set by "source /etc/platform/openrc") still held the old password.

The solution is to re-apply "source /etc/platform/openrc", which should update OS_PASSWORD by
```
export OS_PASSWORD=`TERM=linux /opt/platform/.keyring/19.09/.CREDENTIAL 2>/dev/null`
```

In addition, I checked STX.2.0, the same behavior was there as what we are seeing now.

So, this won't be an issue.

Changed in starlingx:
assignee: yong hu (yhu6) → Peng Peng (ppeng)
Revision history for this message
Peng Peng (ppeng) wrote :

The TC does not use "source /etc/platform/openrc" prior to run system cmd and it used new password as log shows,
system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as Invalid based on Yong's investigation.

Changed in starlingx:
status: Triaged → Invalid
assignee: Peng Peng (ppeng) → yong hu (yhu6)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning back to Yong since our policy is to keep the bug assigned to the development prime

Revision history for this message
yong hu (yhu6) wrote :

@peng, by specifying the updated password in commands explicitly, did your TCs work or not?

I tried this way as well on my side and the command worked.

In addition, in the bash history from the log tarball you attached, I saw the new password was "xxxxxx". Was it expected?

Revision history for this message
Peng Peng (ppeng) wrote :

Reproduced on 2019-11-19_20-00-00 (wcp_63-66)

[sysadmin@controller-1 ~(keystone_admin)]$ openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password '!Li69nux*9' admin
[sysadmin@controller-1 ~(keystone_admin)]$ keyring get CGCS admin
!Li69nux*9
[sysadmin@controller-1 ~(keystone_admin)]$ sudo vi /var/log/bash.log
[sysadmin@controller-1 ~(keystone_admin)]$ openstack user list
The request you have made requires authentication. (HTTP 401) (Request-ID: req-59069eea-2bf5-43db-88f8-1bc6e08277a6)
[sysadmin@controller-1 ~(keystone_admin)]$ system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list
The account is locked for user: 3480356374d4409bab26d72d1fdf4bee. (HTTP 401) (Request-ID: req-1ff0cc2c-c4ef-4c61-af75-f5bef496e62b)

And did not see "xxxxx" in bash.log

2019-11-20T19:00:22.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password '!Li69nux*9' admin
2019-11-20T19:00:34.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 keyring get CGCS admin
2019-11-20T19:00:58.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 sudo vi /var/log/bash.log
2019-11-20T19:01:22.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 openstack user list
2019-11-20T19:01:37.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list
2019-11-20T19:01:51.000 controller-1 -sh: info HISTORY: PID=231945 UID=42425 sudo vi /var/log/bash.log

Peng Peng (ppeng)
Changed in starlingx:
status: Invalid → Confirmed
Revision history for this message
yong hu (yhu6) wrote :

It turned out this is a security enhancement done by this patch (merged on Sept 18):
https://review.opendev.org/#/c/682137

After trying over 5 times with incorrect (old) password, the account will be locked for 1800 seconds.

+ keystone_config {
+ 'security_compliance/lockout_duration': value => 1800;
+ 'security_compliance/lockout_failure_attempts': value => 5;
+ }

Inside your log tarball, keystone-all.log indicated there were 6 authorization failures before the account locked. See the attachment.

to avoid the issue, right after the password is changed, you can apply the new password in your TC by:
export OS_PASSWORD=`TERM=linux /opt/platform/.keyring/19.09/.CREDENTIAL 2>/dev/null`

or explicitly put the updated password in all following test commands.

=================================================================================
BTW: the reason I didn't reproduce this behavior (of account locked) a few days ago was that I did not run commands for over 5 times with obsolete password. At that time, I only tried 1~2 times.

=================================================================================

So in summary, this is not an issue, but an enhanced security feature.

Revision history for this message
ANIRUDH GUPTA (anyrude10) wrote :

I am facing the Account locked Issue on StarlingX 2.0 Release Branch, even if I have not used any Incorrect Password.

Can someone please update how to disable this feature?

Currently my account is locked, how can I unlock it?

Revision history for this message
yong hu (yhu6) wrote :

with root permission, you can remove these 2 lines in /etc/keystone/keystone.conf:

lockout_failure_attempts = 5
lockout_duration = 1800

After that, restart keystone services by killing the first process searched by the following grep.
$ ps aux | grep keystone-public

Revision history for this message
Yang Liu (yliu12) wrote :

Hi Yong,

The problem is after admin password change, the account got locked itself without any user operations.

Yes something was trying to use the old password that caused the account lockout, and investigation is needed on which stx component is doing that.

Revision history for this message
Jerry Sun (jerry-sun-u) wrote :

Looking at the tarball for the logs attached by Peng, it looks like after the password was changed in bash.log, there is no more activity from registry-token-server in daemon.log. This leads me to believe that something else must be triggering the locking of the account. There are some activity from token server in daemon.log but that was before the password change.

I also tried authenticating to the token server with incorrect credentials on a system without changing the password. This is to try and create an environment where the registry/token server holds incorrect keystone credentials. The admin account did not get locked which means token server does not spam requests at keystone with incorrect credentials until it locks.

Revision history for this message
Yang Liu (yliu12) wrote :

Note that this issue seems to be only happening on the first admin password change.
Account will be locked for some time and then unlock itself.

Workaround is just to wait...

After that, the subsequent admin password changes are working as expected.

Revision history for this message
yong hu (yhu6) wrote :

Thanks for update, @Yang.
While making the first time of admin password change, have we already done "system application-apply stx-openstack" in the background?

In addition, the lock period of time should be 30 mins, isn't it?

Revision history for this message
yong hu (yhu6) wrote :

The issue was root-caused.
In short, password for "admin" in 2 k8s secrets ("default-registry-key" and "registry-local-secret" ) was not updated after the operator "sysadmin" changed the password for "admin" user by "openstack" client.

Though the updated password in keyring and keystone (:5000), there was never chance to refresh these 2 secrets, and they kept using the default password set in ansible playbook (say. localhost.yml).
So, whenever docker client pulls image and requires authentication via "registry-token-server" which furthers turns to keystone (:5000), old/default password for "admin" triggers to authentication failure.

The attachment #1 is the packet I captured by TCPDUMP when the failures happened. "GopherCloud" inside "registry-token-server/keystone/access.go" failed to get auth from keystone because it was using the default (and obsolete) password "Local.123" (set from Ansible playbook).

The attachment #2 is the code pieces in "~/containers/registry-token-server/src/keystone/access.go" which was using the obsolete password from request (from k8s secret "default-registry-key").

After updating these passwords in 2 secrets above, the authentication went on correctly.

Revision history for this message
yong hu (yhu6) wrote :
Revision history for this message
yong hu (yhu6) wrote :
Revision history for this message
yong hu (yhu6) wrote :

If the password for "admin" is changed, any deployment with "default-registry-key" secret or "registry-local-secret" will fail to authenticate.

for example, in "charts/ingress/charts/helm-toolkit/templates/snippets/_kubernetes_pod_rbac_serviceaccount.tpl", line 47:
imagePullSecrets:
  - name: default-registry-key

Revision history for this message
Peng Peng (ppeng) wrote :

Issue reproduced on DC labs at load: 2019-11-21_20-00-00
After admin pw changed,

openstack user set --password '!Li69nux*9' admin
[sysadmin@controller-1 ~(keystone_admin)]$ keyring get CGCS admin
!Li69nux*9

[sysadmin@controller-1 ~(keystone_admin)]$ system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[fd01:1::2]:5000/v3 - -os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list
The account is locked for user: 27596cc96f034c34b5632c8d8fa52837. (HTTP 401) (Request-ID: req-c70ebb25-8cf7-4756-94ac-8fbcf555c6e9)
[sysadmin@controller-1 ~(keystone_admin)]$ date
Sat Nov 30 00:22:10 UTC 2019

After more that 2 days, the account still showed locked.
[sysadmin@controller-1 ~(keystone_admin)]$ system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[fd01:1::2]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list
The account is locked for user: 27596cc96f034c34b5632c8d8fa52837. (HTTP 401) (Request-ID: req-afbc26ea-16ac-4123-8753-bd284e2fbdbf)
[sysadmin@controller-1 ~(keystone_admin)]$ date
Mon Dec 2 16:14:58 UTC 2019

yong hu (yhu6)
tags: added: stx.config
Revision history for this message
Yang Liu (yliu12) wrote :

In Distributed Cloud environment mentioned in Peng's comments, the account was never unlocked. Perhaps it's doing something differently than standalone systems.

To answer previous question from Yong in #16, stx-openstack was not applied when this was seen.

Revision history for this message
yong hu (yhu6) wrote :

As mentioned, local registry key was not updated after admin's password was changed.
In this case, whoever tried to pull docker image with "imagePullSecrets" would trigger the authentication failure.

In the attached log ~/var/log/keystone/keystone-all.log, there were indeed "subcloud" related error messages, but not sure if they were the consequence of authentication failures (and user account locked) or other causes partially.

@Yang and Peng, while we are working on the fixing patch, if you want, you can take following steps to update k8s secrets for local registry: default-registry-key and registry-local-secret.

#1. list out secrets for local registry.

kubectl -n kube-system get secrets | grep registry

#2. for encode your new user and password, by the cmd below, for example, my new password is !Li69nux*9

echo -n 'admin:!Li69nux*9' | base64

#3. for updating default-registry-key, encode the whole auth data (json format), here "YWRtaW46IUxpNjludXgqOQ==" is the output from step #2 above:

echo -n '{"auths": {"registry.local:9001": {"auth": "YWRtaW46IUxpNjludXgqOQ=="}}}' | base64

#4.Use step#3 encoded auth_data to replace value of ".dockerconfigjson:"in "default-registry-key": eyJhdXRocyI6IHsicmVnaXN0cnkubG9jYWw6OTAwMSI6IHsiYXV0aCI6ICJZV1J0YVc0NklVeHBOamx1ZFhncU9RPT0ifX19

kubectl -n kube-system edit secret default-registry-key

#5.for updating registry-local-secret, encode the whole auth data (json format), here "YWRtaW46IUxpNjludXgqOQ==" is the output from step #2 above:

echo -n '{"auths":{"registry.local:9001":{"username":"admin","password":"!Li69nux*9","auth":"YWRtaW46IUxpNjludXgqOQ=="}}}' | base64

#4.Use step5 encoded auth_data to replace value of ".dockerconfigjson:"in "registry-local-secret": eyJhdXRocyI6eyJyZWdpc3RyeS5sb2NhbDo5MDAxIjp7InVzZXJuYW1lIjoiYWRtaW4iLCJwYXNzd29yZCI6IiFMaTY5bnV4KjkiLCJhdXRoIjoiWVdSdGFXNDZJVXhwTmpsdWRYZ3FPUT09In19fQ==

kubectl -n kube-system edit secret registry-local-secret

yong hu (yhu6)
Changed in starlingx:
status: Confirmed → In Progress
Revision history for this message
yong hu (yhu6) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on config (master)

Change abandoned by Lin Shuicheng (<email address hidden>) on branch: master
Review: https://review.opendev.org/698442
Reason: New patch is uploaded: https://review.opendev.org/699547

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/700677
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=a36b4823b7dbacdc4a795e3e3978fbed6e952ced
Submitter: Zuul
Branch: master

commit a36b4823b7dbacdc4a795e3e3978fbed6e952ced
Author: Shuicheng Lin <email address hidden>
Date: Fri Dec 27 11:52:05 2019 +0800

    Enable keystone to send out event notification

    notification driver need be set for keystone, in order to send out
    notification. The driver value could be "messaging, messagingv2,
    routing, log, test, noop (multi valued)".
    This is in order to monitor admin password change in sysinv.

    Partial-Bug: 1853017

    Change-Id: Ie55a16723e92ea85a615477788ca922cca3bfe42
    Signed-off-by: Shuicheng Lin <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to upstream (master)

Reviewed: https://review.opendev.org/699547
Committed: https://git.openstack.org/cgit/starlingx/upstream/commit/?id=d1294d7e679460661b42af64c87480b429a3366c
Submitter: Zuul
Branch: master

commit d1294d7e679460661b42af64c87480b429a3366c
Author: Shuicheng Lin <email address hidden>
Date: Wed Dec 18 12:47:23 2019 +0800

    Update Keyring password info before sending out notification

    Need update password before send out notification. Otherwise, any
    process which monitors the "updated" notification will still get old
    password from Keyring.

    Partial-Bug: 1853017

    Change-Id: Id1c94fedca41abe96c7b38880bf325d4a25a95eb
    Signed-off-by: Shuicheng Lin <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to upstream (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/705854

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/698442
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=8ab1e2d7c624f83d72efcbfcddcdffa567a26bad
Submitter: Zuul
Branch: master

commit 8ab1e2d7c624f83d72efcbfcddcdffa567a26bad
Author: Shuicheng Lin <email address hidden>
Date: Wed Dec 11 16:37:03 2019 +0800

    Audit local registry secret info when there is user update in keystone

    local registry uses admin's username&password for authentication.
    And admin's password could be changed by openstack client cmd. It will
    cause auth info in secrets obsolete, and lead to invalid authentication
    in keystone.
    To keep secrets info updated, keystone event notification is enabled.
    And event notification listener is added in sysinv. So when there is
    user password change, a user update event will be sent out by keystone.
    And sysinv will call function audit_local_registry_secrets to check
    whether kubernetes secret info need be updated or not.

    A periodic task is added also to ensure secrets are always synced, in
    case notification is missed or there is failure in handle notification.

    oslo_messaging is added to tox's requirements.txt to avoid tox failure.
    The version is based on global-requirements.txt from Openstack Train.

    Test:
    Pass deployment and secrets could be updated automatically with new auth
    info.
    Pass host-swact in duplex mode.

    Closes-Bug: 1853017
    Depends-On: https://review.opendev.org/700677
    Depends-On: https://review.opendev.org/699547
    Change-Id: I959b65288e0834b989aa87e40506e41d0bba0d59
    Signed-off-by: Shuicheng Lin <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
tags: added: in-f-centos8
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to upstream (f/centos8)

Reviewed: https://review.opendev.org/705854
Committed: https://git.openstack.org/cgit/starlingx/upstream/commit/?id=41f7bff21b83512640f148fa208485beec85eeeb
Submitter: Zuul
Branch: f/centos8

commit 333380daef7623eeb8eed16245d3700227d3003c
Author: Kristal Dale <email address hidden>
Date: Fri Jan 17 13:30:49 2020 -0800

    Update landing pages for docs and release notes:

    - Use updated project name in titles/text
    - Correct text for link to Storyboard (docs)
    - Correct capitalization in section headings
    - Correct formatting for section headings
    - Update project name in link to release notes, api-ref
    - Update project name in config for docs/releasenotes/api-ref

    Story:2007193
    Task:38347

    Change-Id: I52a53260042e6924673230486476c394001683ca
    Signed-off-by: Kristal Dale <email address hidden>

commit 8c7def7074be1a51fc9e01dcdafd8c99cb9115dd
Author: Don Penney <email address hidden>
Date: Wed Jan 1 18:38:19 2020 -0500

    Skip UT in python-keystoneclient build

    The python-keystoneclient unit test code uses a token expiry of Jan 1,
    2020, which causes a failure as of that date. Skip running the tests
    as part of the build to avoid this issue.

    Change-Id: I85e780c6f40beb19d1527282f30b38879ccfc512
    Closes-Bug: 1858049
    Signed-off-by: Don Penney <email address hidden>

commit d1294d7e679460661b42af64c87480b429a3366c
Author: Shuicheng Lin <email address hidden>
Date: Wed Dec 18 12:47:23 2019 +0800

    Update Keyring password info before sending out notification

    Need update password before send out notification. Otherwise, any
    process which monitors the "updated" notification will still get old
    password from Keyring.

    Partial-Bug: 1853017

    Change-Id: Id1c94fedca41abe96c7b38880bf325d4a25a95eb
    Signed-off-by: Shuicheng Lin <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (r/stx.3.0)

Fix proposed to branch: r/stx.3.0
Review: https://review.opendev.org/707154

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to upstream (r/stx.3.0)

Fix proposed to branch: r/stx.3.0
Review: https://review.opendev.org/707155

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (r/stx.3.0)

Fix proposed to branch: r/stx.3.0
Review: https://review.opendev.org/707156

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (r/stx.3.0)

Reviewed: https://review.opendev.org/707154
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=f26899071befc6330dc58fa77926425d0b67e228
Submitter: Zuul
Branch: r/stx.3.0

commit f26899071befc6330dc58fa77926425d0b67e228
Author: Shuicheng Lin <email address hidden>
Date: Fri Dec 27 11:52:05 2019 +0800

    Enable keystone to send out event notification

    notification driver need be set for keystone, in order to send out
    notification. The driver value could be "messaging, messagingv2,
    routing, log, test, noop (multi valued)".
    This is in order to monitor admin password change in sysinv.

    Partial-Bug: 1853017

    Change-Id: Ie55a16723e92ea85a615477788ca922cca3bfe42
    Signed-off-by: Shuicheng Lin <email address hidden>
    (cherry picked from commit a36b4823b7dbacdc4a795e3e3978fbed6e952ced)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to upstream (r/stx.3.0)

Reviewed: https://review.opendev.org/707155
Committed: https://git.openstack.org/cgit/starlingx/upstream/commit/?id=52d7be2f5947d67918c3da0cf8bd2291d2c87232
Submitter: Zuul
Branch: r/stx.3.0

commit 52d7be2f5947d67918c3da0cf8bd2291d2c87232
Author: Shuicheng Lin <email address hidden>
Date: Wed Dec 18 12:47:23 2019 +0800

    Update Keyring password info before sending out notification

    Need update password before send out notification. Otherwise, any
    process which monitors the "updated" notification will still get old
    password from Keyring.

    Partial-Bug: 1853017

    Change-Id: Id1c94fedca41abe96c7b38880bf325d4a25a95eb
    Signed-off-by: Shuicheng Lin <email address hidden>
    (cherry picked from commit d1294d7e679460661b42af64c87480b429a3366c)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (r/stx.3.0)

Reviewed: https://review.opendev.org/707156
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=1c3ba7706559eb99357773356230240adb3aa1ea
Submitter: Zuul
Branch: r/stx.3.0

commit 1c3ba7706559eb99357773356230240adb3aa1ea
Author: Shuicheng Lin <email address hidden>
Date: Wed Dec 11 16:37:03 2019 +0800

    Audit local registry secret info when there is user update in keystone

    local registry uses admin's username&password for authentication.
    And admin's password could be changed by openstack client cmd. It will
    cause auth info in secrets obsolete, and lead to invalid authentication
    in keystone.
    To keep secrets info updated, keystone event notification is enabled.
    And event notification listener is added in sysinv. So when there is
    user password change, a user update event will be sent out by keystone.
    And sysinv will call function audit_local_registry_secrets to check
    whether kubernetes secret info need be updated or not.

    A periodic task is added also to ensure secrets are always synced, in
    case notification is missed or there is failure in handle notification.

    oslo_messaging is added to tox's requirements.txt to avoid tox failure.
    The version is based on global-requirements.txt from Openstack Train.

    Test:
    Pass deployment and secrets could be updated automatically with new auth
    info.
    Pass host-swact in duplex mode.

    Closes-Bug: 1853017
    Depends-On: https://review.opendev.org/707154
    Depends-On: https://review.opendev.org/707155
    Change-Id: I959b65288e0834b989aa87e40506e41d0bba0d59
    Signed-off-by: Shuicheng Lin <email address hidden>
    (cherry picked from commit 8ab1e2d7c624f83d72efcbfcddcdffa567a26bad)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to upstream (r/stx.2.0)

Fix proposed to branch: r/stx.2.0
Review: https://review.opendev.org/707522

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (r/stx.2.0)

Fix proposed to branch: r/stx.2.0
Review: https://review.opendev.org/707523

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: r/stx.2.0
Review: https://review.opendev.org/707524

Ghada Khalil (gkhalil)
tags: added: in-r-stx30 stx.4.0
Revision history for this message
Ghada Khalil (gkhalil) wrote :

As recommended by Yong Hu in https://bugs.launchpad.net/starlingx/+bug/1853093 , I am tagging this bug for stx.2.0 as well since the same code issue exists in that release. The fix may also address #1853093 (not 100% confirmed).

tags: added: stx.2.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (r/stx.2.0)

Reviewed: https://review.opendev.org/707523
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=35d8ccb8a7adc9b3b2b46373ca9f89d08c94cd6a
Submitter: Zuul
Branch: r/stx.2.0

commit 35d8ccb8a7adc9b3b2b46373ca9f89d08c94cd6a
Author: Shuicheng Lin <email address hidden>
Date: Thu Feb 13 10:58:21 2020 +0800

    Enable keystone to send out event notification

    notification driver need be set for keystone, in order to send out
    notification. The driver value could be "messaging, messagingv2,
    routing, log, test, noop (multi valued)".
    This is in order to monitor admin password change in sysinv.

    Partial-Bug: 1853017
    Partial-Bug: 1853093

    Signed-off-by: Shuicheng Lin <email address hidden>
    (cherry picked from commit a36b4823b7dbacdc4a795e3e3978fbed6e952ced)

    Change-Id: Ia6661eaf294f97debca2cdb463455a23639892c1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to upstream (r/stx.2.0)

Reviewed: https://review.opendev.org/707522
Committed: https://git.openstack.org/cgit/starlingx/upstream/commit/?id=dfe155136d3337a18bfbd19a7fb6f57614d455ba
Submitter: Zuul
Branch: r/stx.2.0

commit dfe155136d3337a18bfbd19a7fb6f57614d455ba
Author: Shuicheng Lin <email address hidden>
Date: Wed Dec 18 12:47:23 2019 +0800

    Update Keyring password info before sending out notification

    Need update password before send out notification. Otherwise, any
    process which monitors the "updated" notification will still get old
    password from Keyring.

    Partial-Bug: 1853017
    Partial-Bug: 1853093

    Change-Id: Id1c94fedca41abe96c7b38880bf325d4a25a95eb
    Signed-off-by: Shuicheng Lin <email address hidden>
    (cherry picked from commit d1294d7e679460661b42af64c87480b429a3366c)

Revision history for this message
Peng Peng (ppeng) wrote :

Verified on
Lab: WCP_112
Load: 2020-02-20_20-00-00

[sysadmin@controller-0 ~(keystone_admin)]$ keyring get CGCS admin
Li69nux*
[sysadmin@controller-0 ~(keystone_admin)]$
[sysadmin@controller-0 ~(keystone_admin)]$
[sysadmin@controller-0 ~(keystone_admin)]$ openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password '!Li69nux*9' admin
[sysadmin@controller-0 ~(keystone_admin)]$
[sysadmin@controller-0 ~(keystone_admin)]$
[sysadmin@controller-0 ~(keystone_admin)]$ keyring get CGCS admin
!Li69nux*9
[sysadmin@controller-0 ~(keystone_admin)]$ system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[abcd:204::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list
+--------------------------------------+-----------------------------+--------------+--------+
| uuid | service_group_name | hostname | state |
+--------------------------------------+-----------------------------+--------------+--------+
| d14f859a-f851-4598-8dd9-1c8d4fb42f0f | cloud-services | controller-0 | active |
| 3cc235b9-f4b9-4b95-83af-c89684c73396 | controller-services | controller-0 | active |
| 2c70019c-1eeb-4f55-8525-e0277ce1ed76 | directory-services | controller-0 | active |
| 3ce317e2-894a-4cef-89c8-fa83bde0af91 | oam-services | controller-0 | active |
| 76b5f3c0-639b-4b4f-984d-aabaf546cbe3 | patching-services | controller-0 | active |
| 3d10d9ef-995c-4e08-acb4-4f1c6cf79f92 | storage-monitoring-services | controller-0 | active |
| 67f02d0c-6ff5-44ce-a657-fb73a241919e | storage-services | controller-0 | active |
| ca8beaad-a905-4595-bc2c-3d810749064d | vim-services | controller-0 | active |
| 1b893b0b-05b4-42e5-806b-3dfc72a526c5 | web-services | controller-0 | active |
+--------------------------------------+-----------------------------+--------------+--------+
[sysadmin@controller-0 ~(keystone_admin)]$

tags: removed: stx.retestneeded
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (r/stx.2.0)

Reviewed: https://review.opendev.org/707524
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=7e5e887eb38042a0679ec100ca5d4016c6efe2bc
Submitter: Zuul
Branch: r/stx.2.0

commit 7e5e887eb38042a0679ec100ca5d4016c6efe2bc
Author: Shuicheng Lin <email address hidden>
Date: Wed Dec 11 16:37:03 2019 +0800

    Audit local registry secret info when there is user update in keystone

    local registry uses admin's username&password for authentication.
    And admin's password could be changed by openstack client cmd. It will
    cause auth info in secrets obsolete, and lead to invalid authentication
    in keystone.
    To keep secrets info updated, keystone event notification is enabled.
    And event notification listener is added in sysinv. So when there is
    user password change, a user update event will be sent out by keystone.
    And sysinv will call function audit_local_registry_secrets to check
    whether kubernetes secret info need be updated or not.

    A periodic task is added also to ensure secrets are always synced, in
    case notification is missed or there is failure in handle notification.

    oslo_messaging is added to tox's requirements.txt to avoid tox failure.
    The version is based on global-requirements.txt from Openstack Train.

    Test:
    Pass deployment and secrets could be updated automatically with new auth
    info.
    Pass host-swact in duplex mode.

    We lack of info how LP1853093 was triggered by the user, but this patch
    can address the issue that local registry secrets are not updated
    accordingly after the password of "admin" is changed.
    And this fix will help technically.

    Closes-Bug: 1853017
    Closes-Bug: 1853093
    Depends-On: https://review.opendev.org/707522
    Depends-On: https://review.opendev.org/707523
    Change-Id: I959b65288e0834b989aa87e40506e41d0bba0d59
    Signed-off-by: Shuicheng Lin <email address hidden>
    (cherry picked from commit 8ab1e2d7c624f83d72efcbfcddcdffa567a26bad)

Revision history for this message
Peng Peng (ppeng) wrote :

Issue reproduced on
Lab: WCP_3_6
Load: 2020-02-22_04-10-00

Log: attached

[2020-02-23 02:36:27,094] 314 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password '!Li69nux*9' admin'

[2020-02-23 02:39:29,512] 314 DEBUG MainThread ssh.send :: Send 'keyring get CGCS admin'
[2020-02-23 02:39:30,118] 436 DEBUG MainThread ssh.expect :: Output:
!Li69nux*9

[2020-02-23 02:39:31,833] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-swact controller-0'

[2020-02-23 02:40:58,798] 314 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password 'Li69nux*' admin'

[2020-02-23 02:44:01,811] 314 DEBUG MainThread ssh.send :: Send 'keyring get CGCS admin'
[2020-02-23 02:44:02,399] 436 DEBUG MainThread ssh.expect :: Output:
Li69nux*

fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2020-02-23 02:44:03,214] 436 DEBUG MainThread ssh.expect :: Output:
Must provide Keystone credentials or user-defined endpoint and token, error was: The account is locked for user: c52f573e07d24a37b9b5627a8c82756d. (HTTP 401) (Request-ID: req-be5e9183-0cbf-44a7-9948-832943a06da9)

Changed in starlingx:
status: Fix Released → Confirmed
tags: added: stx.retestneeded
Changed in starlingx:
assignee: yong hu (yhu6) → Lin Shuicheng (shuicheng)
Revision history for this message
Lin Shuicheng (shuicheng) wrote :

Hi Peng,
There is controller-1 only in the log tarball, controller-0 is missed.
Could you share me the detail step to reproduce the issue?
When do you change the password? And what operation before and after the password change?
From the log, the failure is still due to authentication failure with registry-token-server. But I don't know where the access request from.
I could find when password is changed, secrets are updated also. And no application is in applying stage.
I need to reproduce the issue to check where does the registry-token-server access come from.

Here is some log from controller-1:
Password change cmd at 2:40:58:
2020-02-23T02:40:58.000 controller-1 -sh: info HISTORY: PID=241256 UID=42425 openstack --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password xxxxxx admin

Secrets update at 2:41:01:
sysinv 2020-02-23 02:41:01.613 238507 INFO sysinv.conductor.kube_app [-] Secret registry-local-secret under Namespace kube-system is updated
sysinv 2020-02-23 02:41:01.645 238507 INFO sysinv.conductor.kube_app [-] Secret default-registry-key under Namespace kube-system is updated

Authentication failure at 2:41:04:
./var/log/daemon.log:38427:2020-02-23T02:41:04.547 controller-1 registry-token-server[235987]: info time="2020-02-23T02:41:04Z" level=error msg="error authenticating user \"admin\": Authentication failed" go.version=go1.12.10 http.request.host="128.224.151.227:9002" http.request.id=46c3b222-24ca-46bb-936c-0ef08fbf5141 http.request.method=GET http.request.remoteaddr="192.168.204.3:51416" http.request.uri="/token/?account=admin&scope=repository%3Adocker.io%2Fstarlingx%2Fmultus%3Apush%2Cpull&service=192.168.204.1%3A9001" http.request.useragent="docker/18.09.6 go/go1.10.8 git-commit/481bc77 kernel/3.10.0-1062.1.2.el7.2.tis.x86_64 os/linux arch/amd64 UpstreamClient(docker-sdk-python/3.3.0)" instance.id=46661299-76ed-4229-8aa6-45ac24c3f1c6

Then Account lock happen at 2:41:06 after 5 time invalid authentication:
2020-02-23 02:41:06.091 239370 WARNING keystone.server.flask.application [req-cc73c0ca-5d97-4ae0-afa4-6159b677b0bb - - - - -] Authorization failed. The account is locked for user: c52f573e07d24a37b9b5627a8c82756d. from 192.168.204.3: AccountLocked: The account is locked for user: c52f573e07d24a37b9b5627a8c82756d.

Revision history for this message
Hrishit Mazumder (hmazumde) wrote :

Issue reproduced on lab wcp_76_77 at load: StarlingX_Upstream_build/2020-03-10_04-10-00
After admin pw changed,

Timestamp of password change:
2020-03-10T17:06:31.000 controller-0 -sh: info HISTORY: PID=1062703 UID=42425 openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password '!Li69nux*9' admin

Details: CLI 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid' failed to execute. Output: Must provide Keystone credentials or user-defined endpoint and token, error was: The account is locked for user: a7befe681ee64a82b71b63935b410cf7. (HTTP 401) (Request-ID: req-7ef8e9de-cb4c-43db-badc-ad3cb6aa9958)

I have attached logs for your perusal.

Best regards,
Hrishit Mazumder

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/712614

Changed in starlingx:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)

Fix proposed to branch: master
Review: https://review.opendev.org/712823

Revision history for this message
Lin Shuicheng (shuicheng) wrote :

Hi Peng,
Could you help update test case to avoid password update immediately after host-swact? Please try to wait 3 minutes before password change after host-swact.
The issue is that, after swact, sysinv in active controller will try to check k8s network upgrade, and need pull image from registry.local:9001. Then password is changed in the same time, and lead to keystone authentication failure, then account is locked. When sysinv do the image pulling, it will start 5 threads in parallel in order to save the image pull time, so the keystone 5 times failure count is hit easily.
I have submitted patch to avoid authentication failure caused by password cache, but cannot fix the issue totally. The issue will still occur if password change is happened just after sysinv get password, but before keystone authentication.

Here is the sysinv/sm log from ALL_NODES_20200310.185444.tar:
host-swact start at 17:09:53 and finish at 17:10:23
2020-03-10T17:09:53.000 controller-1 sm: debug time[6314.840] log<450> INFO: sm[95215]: sm_service_domain_scheduler.c(1520): Swact from (controller-0) to (controller-1) start
2020-03-10T17:10:23.000 controller-1 sm: debug time[6344.397] log<781> INFO: sm[95215]: sm_node_swact_monitor.cpp(57): Swact has completed successfully.
sysinv try to do k8s network upgrade at 17:10:25
2020-03-10 17:10:25.854 681994 INFO sysinv.conductor.manager [-] _upgrade_downgrade_kube_networking executing playbook: /usr/share/ansible/stx-ansible/playbooks/upgrade-k8s-networking.yml for version v1.16.2
k8s secret is already updated with new password at 17:11:15
2020-03-10 17:11:15.367 681994 INFO sysinv.conductor.kube_app [-] Secret registry-local-secret under Namespace kube-system is updated
Keystone report authentication failure due to receive old password at 17:11:19:
2020-03-10 17:11:19.438 682342 WARNING keystone.server.flask.application [req-3a36cda2-79ee-4a42-bbec-087ada62030e - - - - -] Authorization failed. The account is locked for user: a7befe681ee64a82b71b63935b410cf7. from 192.168.204.3: AccountLocked: The account is locked for user: a7befe681ee64a82b71b63935b410cf7.
sysinv reports ansible failure due to fail download imagae at 17:11:19:
"stderr": "time=\"2020-03-10T17:11:19Z\" level=fatal msg=\"pulling image failed: rpc error: code = Unknown desc = failed to pull and unpack image \\\"registry.local:9001/quay.io/calico/node:v3.6.2\
sysinv 2020-03-10 17:17:07.929 681994 ERROR sysinv.conductor.manager [-] Failed to upgrade/downgrade kubernetes networking images: ansible-playbook returned an error: 2: Exception: ansible-playbook returned an error: 2

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/716137

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (f/centos8)
Download full text (32.3 KiB)

Reviewed: https://review.opendev.org/716137
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=cb4cf4299c2ec10fb2eb03cdee3f6d78a6413089
Submitter: Zuul
Branch: f/centos8

commit 16477935845e1c27b4c9d31743e359b0aa94a948
Author: Steven Webster <email address hidden>
Date: Sat Mar 28 17:19:30 2020 -0400

    Fix SR-IOV runtime manifest apply

    When an SR-IOV interface is configured, the platform's
    network runtime manifest is applied in order to apply the virtual
    function (VF) config and restart the interface. This results in
    sysinv being able to determine and populate the puppet hieradata
    with the virtual function PCI addresses.

    A side effect of the network manifest apply is that potentially
    all platform interfaces may be brought down/up if it is determined
    that their configuration has changed. This will likely be the case
    for a system which configures SR-IOV interfaces before initial
    unlock.

    A few issues have been encountered because of this, with some
    services not behaving well when the interface they are communicating
    over suddenly goes down.

    This commit makes the SR-IOV VF configuration much more targeted
    so that only the operation of setting the desired number of VFs
    is performed.

    Closes-Bug: #1868584
    Depends-On: https://review.opendev.org/715669
    Change-Id: Ie162380d3732eb1b6e9c553362fe68cbc313ae2b
    Signed-off-by: Steven Webster <email address hidden>

commit 45c9fe2d3571574b9e0503af108fe7c1567007db
Author: Zhipeng Liu <email address hidden>
Date: Thu Mar 26 01:58:34 2020 +0800

    Add ipv6 support for novncproxy_base_url.

    For ipv6 address, we need url with below format
    [ip]:port

    Partial-Bug: 1859641

    Change-Id: I01a5cd92deb9e88c2d31bd1e16e5bce1e849fcc7
    Signed-off-by: Zhipeng Liu <email address hidden>

commit d119336b3a3b24d924e000277a37ab0b5f93aae1
Author: Andy Ning <email address hidden>
Date: Mon Mar 23 16:26:21 2020 -0400

    Fix timeout waiting for CA cert install during ansible replay

    During ansible bootstrap replay, the ssl_ca_complete_flag file is
    removed. It expects puppet platform::config::runtime manifest apply
    during system CA certificate install to re-generate it. So this commit
    updated conductor manager to run that puppet manifest even if the CA cert
    has already installed so that the ssl_ca_complete_flag file is created
    and makes ansible replay to continue.

    Change-Id: Ic9051fba9afe5d5a189e2be8c8c2960bdb0d20a4
    Closes-Bug: 1868585
    Signed-off-by: Andy Ning <email address hidden>

commit 24a533d800b2c57b84f1086593fe5f04f95fe906
Author: Zhipeng Liu <email address hidden>
Date: Fri Mar 20 23:10:31 2020 +0800

    Fix rabbitmq could not bind port to ipv6 address issue

    When we use Armada to deploy openstack service for ipv6, rabbitmq
    pod could not start listen on [::]:5672 and [::]:15672.
    For ipv6, we need an override for configuration file.

    Upstream patch link is:
    https://review.opendev.org/#/c/714027/

    Test pass for deploying rabbitmq service on both ipv...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/712823
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=d6cff0496dcf52655eba340e1e57b1d973040edf
Submitter: Zuul
Branch: master

commit d6cff0496dcf52655eba340e1e57b1d973040edf
Author: Shuicheng Lin <email address hidden>
Date: Thu Mar 12 14:34:09 2020 +0800

    Refresh local registry auth info each time when access local registry

    Local registry uses admin account password as authentication info.
    And this password may be changed by openstack client at any time.
    When try to download images from local registry, auth info cannot
    be cached, otherwise it may lead to authentication failure in keystone,
    and account be locked at the end.
    For this specific case, there is host-swact first, then function
    "_upgrade_downgrade_kube_networking" in sysinv conductor is called.
    And upgrade-k8s-networking.yml is executed which will try to download
    kube network images from local registry. During this period, admin
    account password is changed. And lead to account be locked due to
    authentication failure in keystone.
    With this update, there is still possibility that password be changed
    just after get operation. And due to the images download are run in
    parallel with multi threads, so account lock may still hit. This
    change could minimize the issue rate, but cannot fix all.

    Closes-Bug: 1853017

    Change-Id: I686616937031a3f7ac6d65e5b118511dc549ab85
    Signed-off-by: Shuicheng Lin <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/712614
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=423a475aff4f9ea1b60af6a9a2989027d1506f10
Submitter: Zuul
Branch: master

commit 423a475aff4f9ea1b60af6a9a2989027d1506f10
Author: Shuicheng Lin <email address hidden>
Date: Thu Mar 12 14:06:08 2020 +0800

    Refresh local registry auth info each time when access local registry

    Local registry uses admin account password as authentication info.
    And this password may be changed by openstack client at any time.
    When sysinv tries to download images from local registry, it cannot
    cache the auth info, otherwise it may lead to authentication failure
    in keystone, and account be locked at the end.

    Partial-Bug: 1853017

    Change-Id: I07f273a05a1bc3c08b48d13c94eb6df6aecdf7c3
    Signed-off-by: Shuicheng Lin <email address hidden>

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Shuicheng, There are recent commits in master related to this fix that haven't been cherrypicked to the stx.2.0 & stx.3.0 branches. Are these commits applicable to those releases?

Ghada Khalil (gkhalil)
tags: removed: in-r-stx30
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (r/stx.3.0)

Fix proposed to branch: r/stx.3.0
Review: https://review.opendev.org/723766

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (r/stx.3.0)

Fix proposed to branch: r/stx.3.0
Review: https://review.opendev.org/723767

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (r/stx.2.0)

Fix proposed to branch: r/stx.2.0
Review: https://review.opendev.org/723781

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (r/stx.3.0)

Reviewed: https://review.opendev.org/723766
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=9bcd1b066bff4b51a1ef82ccd476116bd4dd8ab5
Submitter: Zuul
Branch: r/stx.3.0

commit 9bcd1b066bff4b51a1ef82ccd476116bd4dd8ab5
Author: Shuicheng Lin <email address hidden>
Date: Thu Mar 12 14:06:08 2020 +0800

    Refresh local registry auth info each time when access local registry

    (cherry picked from commit 423a475aff4f9ea1b60af6a9a2989027d1506f10)

    Local registry uses admin account password as authentication info.
    And this password may be changed by openstack client at any time.
    When sysinv tries to download images from local registry, it cannot
    cache the auth info, otherwise it may lead to authentication failure
    in keystone, and account be locked at the end.

    Partial-Bug: 1853017

    Change-Id: I07f273a05a1bc3c08b48d13c94eb6df6aecdf7c3
    Signed-off-by: Shuicheng Lin <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (r/stx.3.0)

Reviewed: https://review.opendev.org/723767
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=75b5edfa6ce1ea32293889ec9da8d0e6ae2007f8
Submitter: Zuul
Branch: r/stx.3.0

commit 75b5edfa6ce1ea32293889ec9da8d0e6ae2007f8
Author: Shuicheng Lin <email address hidden>
Date: Thu Mar 12 14:34:09 2020 +0800

    Refresh local registry auth info each time when access local registry

    (cherry picked from commit d6cff0496dcf52655eba340e1e57b1d973040edf)
    (cherry picked from commit 1b50022d55a9da2bbab284b1fdda2ddc78c30c79)

    Local registry uses admin account password as authentication info.
    And this password may be changed by openstack client at any time.
    When try to download images from local registry, auth info cannot
    be cached, otherwise it may lead to authentication failure in keystone,
    and account be locked at the end.
    For this specific case, there is host-swact first, then function
    "_upgrade_downgrade_kube_networking" in sysinv conductor is called.
    And upgrade-k8s-networking.yml is executed which will try to download
    kube network images from local registry. During this period, admin
    account password is changed. And lead to account be locked due to
    authentication failure in keystone.
    With this update, there is still possibility that password be changed
    just after get operation. And due to the images download are run in
    parallel with multi threads, so account lock may still hit. This
    change could minimize the issue rate, but cannot fix all.

    Closes-Bug: 1853017

    Change-Id: I686616937031a3f7ac6d65e5b118511dc549ab85
    Signed-off-by: Shuicheng Lin <email address hidden>

Revision history for this message
Peng Peng (ppeng) wrote :

Issue was reproduced on
Lab: WCP_71_75
Load: 2020-04-28_20-00-00
all nodes collect log added

test log:
====================== Test Step 1: Changing admin password to !Li69nux*9

[2020-04-29 18:07:44,424] 314 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne user set --password '!Li69nux*9' admin'

====================== Test Step 2: Sleep for 180 seconds after admin password change

====================== Test Step 3: Check admin password is updated in keyring

[2020-04-29 18:10:46,870] 314 DEBUG MainThread ssh.send :: Send 'keyring get CGCS admin'
[2020-04-29 18:10:47,477] 436 DEBUG MainThread ssh.expect :: Output:
!Li69nux*9

====================== Test Step 4: Swact active controller

[2020-04-29 18:10:47,583] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password '!Li69nux*9' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list'
[2020-04-29 18:10:48,435] 436 DEBUG MainThread ssh.expect :: Output:
The account is locked for user: 7fb2fa710fca4ff0bb1cdce312d05fce. (HTTP 401) (Request-ID: req-33846962-49c1-4804-96a3-5ae633577987)
[sysadmin@controller-1 ~(keystone_admin)]$

Changed in starlingx:
status: Fix Released → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (r/stx.2.0)

Reviewed: https://review.opendev.org/723781
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=a70ecf4baa3809cbf60e6c1b835b07fe9ba2d2d4
Submitter: Zuul
Branch: r/stx.2.0

commit a70ecf4baa3809cbf60e6c1b835b07fe9ba2d2d4
Author: Shuicheng Lin <email address hidden>
Date: Thu Mar 12 14:06:08 2020 +0800

    Refresh local registry auth info each time when access local registry

    (cherry picked from commit 423a475aff4f9ea1b60af6a9a2989027d1506f10)

    Local registry uses admin account password as authentication info.
    And this password may be changed by openstack client at any time.
    When sysinv tries to download images from local registry, it cannot
    cache the auth info, otherwise it may lead to authentication failure
    in keystone, and account be locked at the end.

    Partial-Bug: 1853017

    Change-Id: I07f273a05a1bc3c08b48d13c94eb6df6aecdf7c3
    Signed-off-by: Shuicheng Lin <email address hidden>

Revision history for this message
Lin Shuicheng (shuicheng) wrote :

Hi Peng,
Could you share me the collected log?
Thanks.

Revision history for this message
Peng Peng (ppeng) wrote :
Revision history for this message
Lin Shuicheng (shuicheng) wrote :

Hi Peng,
The cause is different with previous, it is not caused by registry-token-server authentication.

From the log I could find error log in pod platform-deployment-manager, which seems from
"tis-lab-registry.cumulus.wrs.com:9001/wind-river/cloud-platform-deployment-manager WRCP_20.04"

controller-0_20200429.185943/var/log/containers/platform-deployment-manager-0_platform-deployment-manager_manager-dd039c6a54492b52244abd7e7ecb7dccf197d0fa595be92ddc3f9bd7f1f8d513.log
"
2020-04-29T18:10:37.070876069Z stderr F E0429 18:10:37.070686 1 common.go:242] controller/host "msg"="an unhandled error occurred" "error"="failed to get: a896665e-9d40-4342-9a07-92c56715e008: Unable to re-authenticate: Expected HTTP response code [] when accessing [GET http://[face::1]:6385/v1/ihosts/a896665e-9d40-4342-9a07-92c56715e008], but got 401 instead\n{\"error\": {\"message\": \"The request you have made requires authentication.\", \"code\": 401, \"title\": \"Unauthorized\"}}" "type"={}
2020-04-29T18:10:37.070916344Z stderr F E0429 18:10:37.070781 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to get: a896665e-9d40-4342-9a07-92c56715e008: Unable to re-authenticate: Expected HTTP response code [] when accessing [GET http://[face::1]:6385/v1/ihosts/a896665e-9d40-4342-9a07-92c56715e008], but got 401 instead\n{\"error\": {\"message\": \"The request you have made requires authentication.\", \"code\": 401, \"title\": \"Unauthorized\"}}" "controller"="host-controller" "request"={"Namespace":"deployment","Name":"controller-0"}
...
"

Please help ask WR guy help confirm whether admin password is used or not in "cloud-platform-deployment-manager".
Thanks.

Revision history for this message
Lin Shuicheng (shuicheng) wrote :

@yong please help assign the issue to WR, since it is caused by WR specific image.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Lin Shuicheng, I followed up on this and you are correct. The most recent issue reported by Peng Peng is tied to a wr lab specific pod that continues to use the old password to access the config REST API, resulting in the admin account getting locked after a password change. Therefore, we should consider this Launchpad as Fixed. I'm putting it back to "Fix Released".

@Peng Peng, Please do not re-open this Launchpad again. Please also note that there are issues with admin password changes for Distributed Cloud. These are unrelated to this original issue and will be tracked separately. Please do not test admin password changes on Distributed Cloud.

Ghada Khalil (gkhalil)
Changed in starlingx:
status: Confirmed → Fix Released
Peng Peng (ppeng)
tags: removed: stx.retestneeded
Revision history for this message
Peng Peng (ppeng) wrote :

Verified on
Lab: WP_8_12
Load: 2020-05-19_20-00-00

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/729809

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/729812

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (f/centos8)
Download full text (37.5 KiB)

Reviewed: https://review.opendev.org/729812
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=539d476456277c22d0dcbc3cbbc832e623242264
Submitter: Zuul
Branch: f/centos8

commit 320cc40de8518787c2be234d7fdf88ec0a462df2
Author: Don Penney <email address hidden>
Date: Wed May 13 13:06:11 2020 -0400

    Add auto-versioning to starlingx/config packages

    This update makes use of the PKG_GITREVCOUNT variable to auto-version
    the packages in this repo.

    Change-Id: I3a2c8caeb4b4647608978b1f2ccfcf0661508803
    Depends-On: https://review.opendev.org/727837
    Story: 2006166
    Task: 39766
    Signed-off-by: Don Penney <email address hidden>

commit d9f2aea0fb228ed69eb9c9262e29041eedabc15d
Author: Sharath Kumar K <email address hidden>
Date: Wed Apr 22 16:22:22 2020 +0200

    De-branding in starlingx/config: CGCS -> StarlingX

    1. Rename CGCS to StarlingX for .spec files

    Test:
    After the de-brand change, bootimage.iso has been built in the flock
    Layer and installed on the dev machine to validate the changes.

    Please note, doing de-brand changes in batches, this is batch9 changes.

    Story: 2006387
    Task: 39524

    Change-Id: Ia1fe0f2baafb78c974551100f16e6a7d99882f15
    Signed-off-by: Sharath Kumar K <email address hidden>

    De-branding in starlingx/config: CGCS -> StarlingX

    1. Rename CGCS to StarlingX for .spec file
    2. Rename TIS to StarlingX for .service files

    Test:
    After the de-brand change, bootimage.iso has been built in the flock
    Layer and installed on the dev machine to validate the changes.

    Please note, doing de-brand changes in batches, this is batch10 changes.

    Story: 2006387
    Task: 36202

    Change-Id: I404ce0da2621495175ad31489e9ad6f7b0211e26
    Signed-off-by: Sharath Kumar K <email address hidden>

commit d141e954fa6bbf688929ec90d1b6604a97792c43
Author: Teresa Ho <email address hidden>
Date: Tue Mar 31 10:08:57 2020 -0400

    Sysinv extensions for FPGA support

    This update adds cli and restapi to support FPGA device
    programming.

    CLI commands:
    system device-image-apply
    system device-image-create
    system device-image-delete
    system device-image-list
    system device-image-remove
    system device-image-show
    system device-image-state-list
    system device-label-list
    system host-device-image-update
    system host-device-image-update-abort
    system host-device-label-assign
    system host-device-label-list
    system host-device-label-remove

    Story: 2006740
    Task: 39498

    Change-Id: I556c2e7a51b3931b5a66ab27b67f51e3a8aebd9f
    Signed-off-by: Teresa Ho <email address hidden>

commit 491cca42ed854d2cb3ee3646b93c56a4f45f563c
Author: Elena Taivan <email address hidden>
Date: Wed Apr 29 11:25:26 2020 +0000

    Qcow2 conversion to raw can be done using 'image-conversion' filesystem

    1. Conversion filesystem can be added before/after
       stx-openstack is applied
    2. If conversion filesystem is added after stx-openstack
       is applied, changes to stx-openstack will only take effec...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (f/centos8)
Download full text (22.6 KiB)

Reviewed: https://review.opendev.org/729809
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=73027425d4501a6b7785e91024c9e8ddbc03115d
Submitter: Zuul
Branch: f/centos8

commit 55c9afd075194f7669fa2a87e546f61034679b04
Author: Dan Voiculeasa <email address hidden>
Date: Wed May 13 14:19:52 2020 +0300

    Restore: disconnect etcd from ceph

    At the moment etcd is restored only if ceph data is kept.
    Etcd should be restored regardless if ceph data is kept or wiped.

    Story: 2006770
    Task 39751
    Change-Id: I9dfb1be0a83c3fdc5f1b29cbb974c5e0e2236ad3
    Signed-off-by: Dan Voiculeasa <email address hidden>

commit 003ddff574c74adf11cf8e4758e93ba0eed45a6a
Author: Don Penney <email address hidden>
Date: Fri May 8 11:35:58 2020 -0400

    Add playbook for updating static images

    This commit introduces a new playbook, upgrade-static-images.yml, used
    for downloading updating images and pushing to the local registry.

    Change-Id: I8884440261a5a4e27b40398e5a75c9d03b09d4ba
    Story: 2006781
    Task: 39706
    Signed-off-by: Don Penney <email address hidden>

commit 26fd273cf5175ba4bdd31d6b6b777814f1a6c860
Author: Matt Peters <email address hidden>
Date: Thu May 7 14:29:02 2020 -0500

    Add kube-apiserver port to calico failsafe rules

    An invalid GlobalNetworkPolicy or NetworkPolicy may prevent
    calico-node from communicating with the kube-apiserver.
    Once the communication is broken, calico-node is no longer
    able to update the policies since it cannot communicate to
    read the updated policies. It can also prevent the pod
    from starting since the policies will prevent it from
    reading the configuration.

    To ensure that this scenario does not happen, the kube-apiserver
    port is being added to the failsafe rules to ensure communication
    is always possible, regardless of the network policy configuration.

    Change-Id: I1b065a74e7ad0ba9b1fdba4b63136b97efbe98ce
    Closes-Bug: 1877166
    Related-Bug: 1877383
    Signed-off-by: Matt Peters <email address hidden>

commit bd0f14a7dfb206ccaa3ce0f5e7d9034703b3403c
Author: Robert Church <email address hidden>
Date: Tue May 5 15:11:15 2020 -0400

    Provide an update strategy for Tiller deployment

    In the case of a simplex controller configuration the current patching
    strategy for the Tiller environment will fail as the tiller ports will
    be in use when the new deployment is attempted to be applied. The
    resulting tiller pod will be stuck in a Pending state.

    This will be observed if the node becomes ready after 'helm init'
    installs the initial deployment and before the deployment is patched for
    environment checks.

    The deployment strategy provided by 'helm init' is unspecified. This
    change will allow one additional pod (current + new) and one unavailable
    pod (current) during an update. The maxUnavailable setting allows the
    tiller pod to be deleted which will release its ports, thus allowing the
    patch deployment to spin up an new pod to a Running state.

    Change-Id: I83c43c52a77...

Ghada Khalil (gkhalil)
tags: added: in-r-stx30
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.