Comment 3 for bug 1903625

Revision history for this message
Mariyan Dimitrov (merkata) wrote :

The issue is a line lower actually, sitting at https://git.launchpad.net/charm-k8s-postgresql/tree/files/pgcharm.py#n467.

This checks that the current unit is the first out of all units that are part of the application, but there are some caveats to that:

JUJU_UNIT_NAME is not inferred from an env variable that is set by Juju, rather it is (awkwardly) constructed by joining application name and pod name, this relying on two APIs, one of Juju and one of k8s at https://git.launchpad.net/charm-k8s-postgresql/tree/files/pgcharm.py#n52

JUJU_EXPECTED_UNITS is constructed via querying the Juju API only and returning a sorted list of units.

Initially, you would get a JUJU_UNIT_NAME that equals JUJU_EXPECTED_UNITS[0] (they are /0). With every new redeployment and revision, you will drift and this line won't match.

Every time you compare JUJU_UNIT_NAME with JUJU_EXPECTED_UNITS, you are comparing a unit that is carrying the number of a pod name, and as every application is deployed as a StatefulSet, every pod will start at 0 and increment. For the expected units, they will increment from the number of the last revision.

There are two things to consider when fixing this:

- construct JUJU_UNIT_NAME properly, so that an actual unit is returned (done currently via calling hookenv.local_unit())

- ensure no race conditions occur and handle master election with spinning up pods serially via the "service": {"scalePolicy": "serial"} in the pod spec