charm-k8s-postgresql

Redeploys to same model fail

Bug #1903625 reported by Stuart Bishop on 2020-11-10

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	charm-k8s-postgresql	Fix Released	Medium	Unassigned

Bug Description

Per https://bugs.launchpad.net/juju/+bug/1903623 , the mechanism used by the charm to map unit names to pod names does not always work. In particular, if you remove a PostgreSQL deployment and redeploy it to the same model with the same name, it will fail.

Related branches

~merkata/charm-k8s-postgresql:master

Merged into charm-k8s-postgresql:master at revision 24bf1610284cd0e67025723dafd7aefe29689fe1

Tom Haddon: Approve on 2022-09-20

Arturo Enrique Seijas Fernández (community): Approve on 2022-09-19

Weii Wang (community): Approve on 2022-09-19

Franco Luciano Forneron Buschiazzo (community): Approve on 2022-09-19

Johann David Krister Andersson: Pending requested 2022-09-19

Canonical IS Reviewers: Pending requested 2022-09-17

Revision history for this message

Stuart Bishop (stub) wrote on 2020-11-19:

Likely best fixed by implementing lp:1904821

Revision history for this message

Andre Ruiz (andre-ruiz) wrote on 2021-06-14:

I just had this problem. In the logs you can see "I'm not the master, cloning the master" (eventually it times out).

Seems like the code around https://git.launchpad.net/charm-k8s-postgresql/tree/files/pgcharm.py#n466 is returning a leader when it should not.

Revision history for this message

Mariyan Dimitrov (merkata) wrote on 2022-09-16:

The issue is a line lower actually, sitting at https://git.launchpad.net/charm-k8s-postgresql/tree/files/pgcharm.py#n467.

This checks that the current unit is the first out of all units that are part of the application, but there are some caveats to that:

JUJU_UNIT_NAME is not inferred from an env variable that is set by Juju, rather it is (awkwardly) constructed by joining application name and pod name, this relying on two APIs, one of Juju and one of k8s at https://git.launchpad.net/charm-k8s-postgresql/tree/files/pgcharm.py#n52

JUJU_EXPECTED_UNITS is constructed via querying the Juju API only and returning a sorted list of units.

Initially, you would get a JUJU_UNIT_NAME that equals JUJU_EXPECTED_UNITS[0] (they are /0). With every new redeployment and revision, you will drift and this line won't match.

Every time you compare JUJU_UNIT_NAME with JUJU_EXPECTED_UNITS, you are comparing a unit that is carrying the number of a pod name, and as every application is deployed as a StatefulSet, every pod will start at 0 and increment. For the expected units, they will increment from the number of the last revision.

There are two things to consider when fixing this:

- construct JUJU_UNIT_NAME properly, so that an actual unit is returned (done currently via calling hookenv.local_unit())

- ensure no race conditions occur and handle master election with spinning up pods serially via the "service": {"scalePolicy": "serial"} in the pod spec

Tom Haddon (mthaddon) on 2022-09-19

Changed in charm-k8s-postgresql:
status:	New → Confirmed
importance:	Undecided → Medium

Revision history for this message

Tom Haddon (mthaddon) wrote on 2022-09-20:

This has been fixed in revno 20, released to the stable channel.

Changed in charm-k8s-postgresql:
status:	Confirmed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.