Comment 50 for bug 1853017

Revision history for this message
Lin Shuicheng (shuicheng) wrote :

Hi Peng,
Could you help update test case to avoid password update immediately after host-swact? Please try to wait 3 minutes before password change after host-swact.
The issue is that, after swact, sysinv in active controller will try to check k8s network upgrade, and need pull image from registry.local:9001. Then password is changed in the same time, and lead to keystone authentication failure, then account is locked. When sysinv do the image pulling, it will start 5 threads in parallel in order to save the image pull time, so the keystone 5 times failure count is hit easily.
I have submitted patch to avoid authentication failure caused by password cache, but cannot fix the issue totally. The issue will still occur if password change is happened just after sysinv get password, but before keystone authentication.

Here is the sysinv/sm log from ALL_NODES_20200310.185444.tar:
host-swact start at 17:09:53 and finish at 17:10:23
2020-03-10T17:09:53.000 controller-1 sm: debug time[6314.840] log<450> INFO: sm[95215]: sm_service_domain_scheduler.c(1520): Swact from (controller-0) to (controller-1) start
2020-03-10T17:10:23.000 controller-1 sm: debug time[6344.397] log<781> INFO: sm[95215]: sm_node_swact_monitor.cpp(57): Swact has completed successfully.
sysinv try to do k8s network upgrade at 17:10:25
2020-03-10 17:10:25.854 681994 INFO sysinv.conductor.manager [-] _upgrade_downgrade_kube_networking executing playbook: /usr/share/ansible/stx-ansible/playbooks/upgrade-k8s-networking.yml for version v1.16.2
k8s secret is already updated with new password at 17:11:15
2020-03-10 17:11:15.367 681994 INFO sysinv.conductor.kube_app [-] Secret registry-local-secret under Namespace kube-system is updated
Keystone report authentication failure due to receive old password at 17:11:19:
2020-03-10 17:11:19.438 682342 WARNING keystone.server.flask.application [req-3a36cda2-79ee-4a42-bbec-087ada62030e - - - - -] Authorization failed. The account is locked for user: a7befe681ee64a82b71b63935b410cf7. from 192.168.204.3: AccountLocked: The account is locked for user: a7befe681ee64a82b71b63935b410cf7.
sysinv reports ansible failure due to fail download imagae at 17:11:19:
"stderr": "time=\"2020-03-10T17:11:19Z\" level=fatal msg=\"pulling image failed: rpc error: code = Unknown desc = failed to pull and unpack image \\\"registry.local:9001/quay.io/calico/node:v3.6.2\
sysinv 2020-03-10 17:17:07.929 681994 ERROR sysinv.conductor.manager [-] Failed to upgrade/downgrade kubernetes networking images: ansible-playbook returned an error: 2: Exception: ansible-playbook returned an error: 2