StarlingX

Bug #2034610
Comment #3

Comment 3 for bug 2034610

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-09-07: Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/893978
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/e52e7e9d3caec8bda48ba043fcba58679ae7cd82
Submitter: "Zuul (22348)"
Branch: master

commit e52e7e9d3caec8bda48ba043fcba58679ae7cd82
Author: Joshua Reed <email address hidden>
Date: Wed Sep 6 12:52:59 2023 -0700

Adjust FluxCD Helm Controller Pod Termination Timeouts.

    Previously, in v0.28.0 of helm-controller, the pod would
    terminate quickly. After the update to a higher version
    of FluxCD and thus v0.35.0 for helm-controller, that no
    longer happens. Instead the pod terminates rather slowly.
    As a result, during a "system host-lock" command, StarlingX
    times out waiting on pods to be evicted/terminated from
    the node that is being locked. The lock fails. This
    behavior causes sanity testing to fail.

    Corrective action is to provide an argument to the helm
    controller deployment spec and lower its termination
    grade period.

    Test Plan:
    1. Full AIO-SX installation. Verify helm controller installs
       properly.
    2. Full AIO-DX installation. Verify helm controller installs
       properly. Lock the standby controller, and verify that
       it locks in a reasonable amount of time.
    3. On AIO-DX, perform a swact. After swact, lock the
       opposite standby controller. Verify that the host locks
       in a reasonable amount of time.

References:

    Helm Controller Release notes detailing behavioral changes:
    1. https://github.com/fluxcd/helm-controller/blob/v0.28.0/CHANGELOG.md
    2. https://github.com/fluxcd/helm-controller/blob/v0.28.1/CHANGELOG.md

    Closes-Bug: 2034610
    Change-Id: I03d1085a995155e12aa7312a5886e7f6ec8d7709
    Signed-off-by: Joshua Reed <email address hidden>