Hi Peng,
Could you help update test case to avoid password update immediately after host-swact? Please try to wait 3 minutes before password change after host-swact.
The issue is that, after swact, sysinv in active controller will try to check k8s network upgrade, and need pull image from registry.local:9001. Then password is changed in the same time, and lead to keystone authentication failure, then account is locked. When sysinv do the image pulling, it will start 5 threads in parallel in order to save the image pull time, so the keystone 5 times failure count is hit easily.
I have submitted patch to avoid authentication failure caused by password cache, but cannot fix the issue totally. The issue will still occur if password change is happened just after sysinv get password, but before keystone authentication.
Here is the sysinv/sm log from ALL_NODES_20200310.185444.tar:
host-swact start at 17:09:53 and finish at 17:10:23
2020-03-10T17:09:53.000 controller-1 sm: debug time[6314.840] log<450> INFO: sm[95215]: sm_service_domain_scheduler.c(1520): Swact from (controller-0) to (controller-1) start
2020-03-10T17:10:23.000 controller-1 sm: debug time[6344.397] log<781> INFO: sm[95215]: sm_node_swact_monitor.cpp(57): Swact has completed successfully.
sysinv try to do k8s network upgrade at 17:10:25
2020-03-10 17:10:25.854 681994 INFO sysinv.conductor.manager [-] _upgrade_downgrade_kube_networking executing playbook: /usr/share/ansible/stx-ansible/playbooks/upgrade-k8s-networking.yml for version v1.16.2
k8s secret is already updated with new password at 17:11:15
2020-03-10 17:11:15.367 681994 INFO sysinv.conductor.kube_app [-] Secret registry-local-secret under Namespace kube-system is updated
Keystone report authentication failure due to receive old password at 17:11:19:
2020-03-10 17:11:19.438 682342 WARNING keystone.server.flask.application [req-3a36cda2-79ee-4a42-bbec-087ada62030e - - - - -] Authorization failed. The account is locked for user: a7befe681ee64a82b71b63935b410cf7. from 192.168.204.3: AccountLocked: The account is locked for user: a7befe681ee64a82b71b63935b410cf7.
sysinv reports ansible failure due to fail download imagae at 17:11:19:
"stderr": "time=\"2020-03-10T17:11:19Z\" level=fatal msg=\"pulling image failed: rpc error: code = Unknown desc = failed to pull and unpack image \\\"registry.local:9001/quay.io/calico/node:v3.6.2\
sysinv 2020-03-10 17:17:07.929 681994 ERROR sysinv.conductor.manager [-] Failed to upgrade/downgrade kubernetes networking images: ansible-playbook returned an error: 2: Exception: ansible-playbook returned an error: 2
Hi Peng, local:9001. Then password is changed in the same time, and lead to keystone authentication failure, then account is locked. When sysinv do the image pulling, it will start 5 threads in parallel in order to save the image pull time, so the keystone 5 times failure count is hit easily.
Could you help update test case to avoid password update immediately after host-swact? Please try to wait 3 minutes before password change after host-swact.
The issue is that, after swact, sysinv in active controller will try to check k8s network upgrade, and need pull image from registry.
I have submitted patch to avoid authentication failure caused by password cache, but cannot fix the issue totally. The issue will still occur if password change is happened just after sysinv get password, but before keystone authentication.
Here is the sysinv/sm log from ALL_NODES_ 20200310. 185444. tar: 10T17:09: 53.000 controller-1 sm: debug time[6314.840] log<450> INFO: sm[95215]: sm_service_ domain_ scheduler. c(1520) : Swact from (controller-0) to (controller-1) start 10T17:10: 23.000 controller-1 sm: debug time[6344.397] log<781> INFO: sm[95215]: sm_node_ swact_monitor. cpp(57) : Swact has completed successfully. conductor. manager [-] _upgrade_ downgrade_ kube_networking executing playbook: /usr/share/ ansible/ stx-ansible/ playbooks/ upgrade- k8s-networking. yml for version v1.16.2 conductor. kube_app [-] Secret registry- local-secret under Namespace kube-system is updated server. flask.applicati on [req-3a36cda2- 79ee-4a42- bbec-087ada6203 0e - - - - -] Authorization failed. The account is locked for user: a7befe681ee64a8 2b71b63935b410c f7. from 192.168.204.3: AccountLocked: The account is locked for user: a7befe681ee64a8 2b71b63935b410c f7. "2020-03- 10T17:11: 19Z\" level=fatal msg=\"pulling image failed: rpc error: code = Unknown desc = failed to pull and unpack image \\\"registry. local:9001/ quay.io/ calico/ node:v3. 6.2\ conductor. manager [-] Failed to upgrade/downgrade kubernetes networking images: ansible-playbook returned an error: 2: Exception: ansible-playbook returned an error: 2
host-swact start at 17:09:53 and finish at 17:10:23
2020-03-
2020-03-
sysinv try to do k8s network upgrade at 17:10:25
2020-03-10 17:10:25.854 681994 INFO sysinv.
k8s secret is already updated with new password at 17:11:15
2020-03-10 17:11:15.367 681994 INFO sysinv.
Keystone report authentication failure due to receive old password at 17:11:19:
2020-03-10 17:11:19.438 682342 WARNING keystone.
sysinv reports ansible failure due to fail download imagae at 17:11:19:
"stderr": "time=\
sysinv 2020-03-10 17:17:07.929 681994 ERROR sysinv.