StarlingX

Bug #1917308
Comment #30

Comment 30 for bug 1917308

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-14: Related fix merged to ansible-playbooks (master)

#30

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/791093
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/d5460198dc0310a80580537fd8df76ae00e17f02
Submitter: "Zuul (22348)"
Branch: master

commit d5460198dc0310a80580537fd8df76ae00e17f02
Author: Robert Church <email address hidden>
Date: Wed May 12 22:45:38 2021 -0400

Adjust armada's tiller container liveness probe

    With the liveness probe update in the armada helm chart to test the
    connectivity to the postgres backend, adjust the periodSeconds and
    failureThreshold to align with the minimum swact time to be expected for
    postgres switching from one controller to another.

Reviewing logs from various H/W labs it appears that average postgres
swact time ranges from 9s-20s, with the mean ~15s.

    Times can be observed with:
    2021-05-09T13:32:24.475 controller-1 OCF_pgsql(postgres)[396293]: info
                                         INFO: server shutting down
    2021-05-09T13:32:33.423 controller-0 OCF_pgsql(postgres)[147541]: info
                                         INFO: server starting

    Set the periodSeconds to 4 and the failureThreshold to 2 so that if the
    postgres server is not accessible, the tiller container will be
    restarted within the 9s minimum swact time. This will ensure that the
    next time tiller is required by Armada or used by the helmv2-cli that
    the connection to postgres backend has been re-established.

    Change-Id: I7454a737771d9a608d2fe69c5136d37da022007e
    Depends-On: https://review.opendev.org/c/starlingx/integ/+/791092
    Related-Bug: #1917308
    Signed-off-by: Robert Church <email address hidden>