sysinv-conductor process kill test failure with error user-defined endpoint and token, error was: The account is locked for user

Bug #1871141 reported by Anujeyan Manokeran
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Lin Shuicheng

Bug Description

Brief Description
-----------------
After running test_process automation on sysinv-conductor process unable to run any system or fm commands from cli. Sysadmin account locked error message for any commands as below. Automaton Test sysinv kills process and verify recovery multiple times. This test was not passed and not behaving this way before.

'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://[face::1]:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap'
[2020-04-06 13:39:45,235] 436 DEBUG MainThread ssh.expect :: Output:
+----------+------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+----------+----------------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+----------+----------------------------+
| 400.001 | Service group controller-services failure; sysinv-conductor(disabled, ) | service_domain=controller.service_group=controller-services.host=controller-1 | critical | 2020-04-06T13:37:25.919859 |
| 400.002 | Service group controller-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=controller-services | major | 2020-04-06T13:37:17.155549 |
| 400.002 | Service group cloud-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=cloud-services | major | 2020-04-06T13:37:17.114432 |
| 400.002 | Service group oam-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=oam-services | major | 2020-04-06T13:37:17.073423 |
| 400.002 | Service group vim-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=vim-services | major | 2020-04-06T13:37:17.032429 |
+----------+------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------+----------+----------------------------+
controller-0:~$

fm alarm-list
Must provide Keystone credentials or user-defined endpoint and token, error was: The account is locked for user: 7113e094afb54cd2a890dfb919060635. (HTTP 401) (Request-ID: req-89ef9992-b7f8-44d8-bb76-1366499b59f9)

system show
The account is locked for user: 7113e094afb54cd2a890dfb919060635. (HTTP 401) (Request-ID: req-20e59eac-5de6-43f6-81ee-5030737feb46)

Steps to Reproduce
------------------
1. Verify system health no alarms
2. Kill sysinv-conductor process multiple time using below script.
true; n=1; last_pid=''; pid=''; for((;n<5;)); do pid=$(cat /var/run/sw-patch-agent.pid 2>/dev/null); date;
                if [ "x$pid" = "x" -o "$pid" = "$last_pid" ]; then echo "stale or empty PID:$pid, last_pid=$last_pid";
                sleep 0.5; continue; fi; echo "Li69nux*" | sudo -S kill -9 $pid &>/dev/null;
                if [ $? -eq 0 ]; then echo "OK $n - $pid killed"; ((n++)); last_pid=$pid; pid=''; sleep 20;
                else sleep 0.5; fi; done; echo $pid
3. Verify process recovered after process kill
4. Fm alarm-list after killing process.
5. As description system locked message was seen.

Expected Behavior
------------------
Able to execute commands after killing process.

Actual Behavior
----------------
Unable to run commands after killing process.

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
Regular system wildcat-63-66

Branch/Pull Time/Commit
-----------------------
2020-04-04_00-10-00

Last Pass
---------
 2020-03-14_04-10-00

Timestamp/Logs
--------------
2020-04-06T13:37:25.919859

Test Activity
-------------
Regression

Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :
summary: - sysinv-conductor process kill test failure user-defined endpoint and
- token, error was: The account is locked for user
+ sysinv-conductor process kill test failure with error user-defined
+ endpoint and token, error was: The account is locked for user
Revision history for this message
Ghada Khalil (gkhalil) wrote :

There was a code change which merged on 2020-04-01 related to the keystone account lock-out.
LP: https://bugs.launchpad.net/starlingx/+bug/1853017
Gerrit Review: https://review.opendev.org/712823

I'm not sure if it's related or not, but this TC was passing a week ago before this code was merged.
Assigning to Shuicheng to investigate.

Changed in starlingx:
importance: Undecided → High
assignee: nobody → Lin Shuicheng (shuicheng)
status: New → Triaged
tags: added: stx.4.0 stx.metal
Ghada Khalil (gkhalil)
tags: added: stx.config
removed: stx.metal
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)

Fix proposed to branch: master
Review: https://review.opendev.org/718305

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/718305
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=1b50022d55a9da2bbab284b1fdda2ddc78c30c79
Submitter: Zuul
Branch: master

commit 1b50022d55a9da2bbab284b1fdda2ddc78c30c79
Author: Shuicheng Lin <email address hidden>
Date: Wed Apr 8 10:57:50 2020 +0800

    Fix account be locked due to access registry without password

    Correct code to let exception be raised when password cannot be
    got from keyring. Account is locked due to exception is not raised,
    and client try to access registry with None password, which is
    incorrect.

    Closes-Bug: #1871141
    Change-Id: Ia68b4a4f25756fdad7a198a31d5870245ff9dc1a
    Signed-off-by: Shuicheng Lin <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Yang Liu (yliu12)
tags: added: stx.retestneeded
Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :

Verified in load 2020-04-17_10-33-46.

tags: removed: stx.retestneeded
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/729809

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (f/centos8)
Download full text (22.6 KiB)

Reviewed: https://review.opendev.org/729809
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=73027425d4501a6b7785e91024c9e8ddbc03115d
Submitter: Zuul
Branch: f/centos8

commit 55c9afd075194f7669fa2a87e546f61034679b04
Author: Dan Voiculeasa <email address hidden>
Date: Wed May 13 14:19:52 2020 +0300

    Restore: disconnect etcd from ceph

    At the moment etcd is restored only if ceph data is kept.
    Etcd should be restored regardless if ceph data is kept or wiped.

    Story: 2006770
    Task 39751
    Change-Id: I9dfb1be0a83c3fdc5f1b29cbb974c5e0e2236ad3
    Signed-off-by: Dan Voiculeasa <email address hidden>

commit 003ddff574c74adf11cf8e4758e93ba0eed45a6a
Author: Don Penney <email address hidden>
Date: Fri May 8 11:35:58 2020 -0400

    Add playbook for updating static images

    This commit introduces a new playbook, upgrade-static-images.yml, used
    for downloading updating images and pushing to the local registry.

    Change-Id: I8884440261a5a4e27b40398e5a75c9d03b09d4ba
    Story: 2006781
    Task: 39706
    Signed-off-by: Don Penney <email address hidden>

commit 26fd273cf5175ba4bdd31d6b6b777814f1a6c860
Author: Matt Peters <email address hidden>
Date: Thu May 7 14:29:02 2020 -0500

    Add kube-apiserver port to calico failsafe rules

    An invalid GlobalNetworkPolicy or NetworkPolicy may prevent
    calico-node from communicating with the kube-apiserver.
    Once the communication is broken, calico-node is no longer
    able to update the policies since it cannot communicate to
    read the updated policies. It can also prevent the pod
    from starting since the policies will prevent it from
    reading the configuration.

    To ensure that this scenario does not happen, the kube-apiserver
    port is being added to the failsafe rules to ensure communication
    is always possible, regardless of the network policy configuration.

    Change-Id: I1b065a74e7ad0ba9b1fdba4b63136b97efbe98ce
    Closes-Bug: 1877166
    Related-Bug: 1877383
    Signed-off-by: Matt Peters <email address hidden>

commit bd0f14a7dfb206ccaa3ce0f5e7d9034703b3403c
Author: Robert Church <email address hidden>
Date: Tue May 5 15:11:15 2020 -0400

    Provide an update strategy for Tiller deployment

    In the case of a simplex controller configuration the current patching
    strategy for the Tiller environment will fail as the tiller ports will
    be in use when the new deployment is attempted to be applied. The
    resulting tiller pod will be stuck in a Pending state.

    This will be observed if the node becomes ready after 'helm init'
    installs the initial deployment and before the deployment is patched for
    environment checks.

    The deployment strategy provided by 'helm init' is unspecified. This
    change will allow one additional pod (current + new) and one unavailable
    pod (current) during an update. The maxUnavailable setting allows the
    tiller pod to be deleted which will release its ports, thus allowing the
    patch deployment to spin up an new pod to a Running state.

    Change-Id: I83c43c52a77...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.