commit 17b5505d9ea9b149cf28236be3c1b4c263a89ffb
Author: Tao Liu <email address hidden>
Date: Mon May 3 12:32:53 2021 -0400
Fix Sub clouds going offline due to auth failure
This update contains the following changes that prevent subclouds
going offline due to authentication failure:
1. The os region client cache is cleared when a new keystone client
is created. The os region client will be re-created using the new
keystone session.
2. When the user's access info (such as role id) is changed create
new keystone client and os region clients. This could happen after
system controller keystone role ids were synced to subclouds
3. Remove get_admin_backup_session that was only required when
upgrading to stx 4.0.
4. Increase AVAIL_FAIL_COUNT_TO_ALARM to 2 as we don't want to alarm
first failure since there are cases where we expect a transient
failure in the subcloud (e.g. haproxy process restart to update
certificates)
Tested on DC-6:
1. Adding 50 subclouds twice
2. Soaking the fix over the weekend
Closes-Bug: 1927007
Signed-off-by: Tao Liu <email address hidden>
Change-Id: I86fdc9a2f062409e704bdfac2119dc488123f7de
Reviewed: https:/ /review. opendev. org/c/starlingx /distcloud/ +/789572 /opendev. org/starlingx/ distcloud/ commit/ 17b5505d9ea9b14 9cf28236be3c1b4 c263a89ffb
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 17b5505d9ea9b14 9cf28236be3c1b4 c263a89ffb
Author: Tao Liu <email address hidden>
Date: Mon May 3 12:32:53 2021 -0400
Fix Sub clouds going offline due to auth failure
This update contains the following changes that prevent subclouds backup_ session that was only required when COUNT_TO_ ALARM to 2 as we don't want to alarm
going offline due to authentication failure:
1. The os region client cache is cleared when a new keystone client
is created. The os region client will be re-created using the new
keystone session.
2. When the user's access info (such as role id) is changed create
new keystone client and os region clients. This could happen after
system controller keystone role ids were synced to subclouds
3. Remove get_admin_
upgrading to stx 4.0.
4. Increase AVAIL_FAIL_
first failure since there are cases where we expect a transient
failure in the subcloud (e.g. haproxy process restart to update
certificates)
Tested on DC-6:
1. Adding 50 subclouds twice
2. Soaking the fix over the weekend
Closes-Bug: 1927007
Signed-off-by: Tao Liu <email address hidden> 9e704bdfac2119d c488123f7de
Change-Id: I86fdc9a2f06240