StarlingX

Bug #1927007
Comment #3

Comment 3 for bug 1927007

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-04: Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/789572
Committed: https://opendev.org/starlingx/distcloud/commit/17b5505d9ea9b149cf28236be3c1b4c263a89ffb
Submitter: "Zuul (22348)"
Branch: master

commit 17b5505d9ea9b149cf28236be3c1b4c263a89ffb
Author: Tao Liu <email address hidden>
Date: Mon May 3 12:32:53 2021 -0400

Fix Sub clouds going offline due to auth failure

    This update contains the following changes that prevent subclouds
    going offline due to authentication failure:
    1. The os region client cache is cleared when a new keystone client
    is created. The os region client will be re-created using the new
    keystone session.
    2. When the user's access info (such as role id) is changed create
    new keystone client and os region clients. This could happen after
    system controller keystone role ids were synced to subclouds
    3. Remove get_admin_backup_session that was only required when
    upgrading to stx 4.0.
    4. Increase AVAIL_FAIL_COUNT_TO_ALARM to 2 as we don't want to alarm
    first failure since there are cases where we expect a transient
    failure in the subcloud (e.g. haproxy process restart to update
    certificates)

    Tested on DC-6:
    1. Adding 50 subclouds twice
    2. Soaking the fix over the weekend

Closes-Bug: 1927007

Signed-off-by: Tao Liu <email address hidden>
Change-Id: I86fdc9a2f062409e704bdfac2119dc488123f7de