postgresql charm race condition in allowed-units
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
postgresql (Juju Charms Collection) |
Fix Released
|
Critical
|
Unassigned |
Bug Description
I'm working on the test-catalog charm to support postgresql in preparation for the move of the OIL infrastructure to prodstack and the test-catalog relation with postgresql is triggering a race condition.
The charms used are: lp:charms/postgresql (revno 82) and lp:~matsubara/charms/precise/test-catalog/swap-to-postgresql and the merge proposal is here: https:/
Steps to reproduce:
juju deploy --repository . local:test-catalog
juju deploy --repository . local:postgresql
juju add-relation test-catalog:db postgresql:db
At this point test-catalog's db-relation-changed fails with the following error: http://
Using juju from my laptop to canonistack, I can't reproduce this error all the time, most of the time the db-relation hooks work as intended.
Using juju from bastion (a machine inside the DC) and deploying against serverstack, I can reliably reproduce the error. If I add the following patch: http://
Any ideas on how to debug this further? Is the test-catalog changes in the MP just plain wrong? In theory (and based on the doc for the postgresql charm) allowed-units is the right way to verify the database is ready for access from related units.
Related branches
- Marco Ceppi (community): Approve
-
Diff: 432 lines (+185/-172)4 files modifiedconfig.yaml (+2/-2)
hooks/hooks.py (+23/-18)
scripts/pgbackup.py (+160/-0)
templates/dump-pg-db.tmpl (+0/-152)
description: | updated |
Changed in charms: | |
status: | New → Triaged |
importance: | Undecided → Critical |
status: | Triaged → Incomplete |
Changed in charms: | |
status: | Incomplete → Triaged |
Changed in charms: | |
status: | Triaged → In Progress |
affects: | charms → postgresql (Juju Charms Collection) |
Changed in postgresql (Juju Charms Collection): | |
status: | In Progress → Fix Released |
The client code looks fine. It will only work with a single unit PostgreSQL service, but that is what is being tested here. 'hostname' is unnecessarily being set in relation-joined, but that should be harmless.
To debug this, we either need to step through or grab the logs from the server side. In particular, the charm unit-postgresql -0.log and /var/log/ postgresql/ postgresql- 9.1-main. log.
What should be happening in the server's db-relation-changed hook:
- pg_hba.conf rewritten. Authentication credentials and allowed-units set on the relation
- PostgreSQL reloaded or restarted
The client should not see the relation data change until after the server's db-relation-changed hook has completed, so there should be no race condition between relation-set publishing allowed-units and the PostgreSQL reload. I think what must be happening is a logic error and PostgreSQL not being reloaded, or the reload is failing (eg. bad syntax in a config file), or the reload is not taking effect immediately.