I have a test case that does not involve setting up a pacemaker cluster, but should be enough to verify that the resource agent is now properly detecting the postgresql walreceiver process. Here it is. Could you please run it yourself and see if you get the same results as I did?
[Test plan]
Create a lxd container for the primary postgresql:
$ lxc launch ubuntu:jammy j1
Configure postgresql.conf and pg_hba.conf:
# pg_conftool 14 main set listen_addresses '*'
# pg_conftool 14 main set wal_level replica
# echo "host replication replicator all scram-sha-256" >> /etc/postgresql/14/main/pg_hba.conf
Create replication user (choose a password, and remember it, it will be needed again later):
# sudo -u postgres createuser --replication -P -e replicator
restart the primary postgresql:
# systemctl restart postgresql
Back on the host, create lxd container for the secondary postgresql:
$ lxc launch ubuntu:jammy j2
Configure postgresql.conf:
# pg_conftool 14 main set listen_addresses '*'
# pg_conftool 14 main set hot_standby on
Cleanup data dir:
# rm -rf /var/lib/postgresql/*/main/*
Perform initial replication as "postgres" user. The pg_basebackup command will prompt for the "replicator" password created earlier on the primary:
# sudo -u postgres -i
$ pg_basebackup -h <IP-of-primary> -D /var/lib/postgresql/14/main -U replicator -P -v -R
$ exit
Start the secondary:
# systemctl start postgresql
Verify replication: list of databases on the secondary does not have a "test" database:
# sudo -u postgres psql -l 2>/dev/null | grep test
#
On the primary, create a test database:
$ lxc shell j1
# sudo -u postgres createdb test
could not change directory to "/root": Permission denied
# sudo -u postgres psql -l 2>/dev/null | grep test
test | postgres | UTF8 | C.UTF-8 | C.UTF-8 |
On the secondary, verify that the test database now exists:
$ lxc shell j2
# sudo -u postgres psql -l 2>/dev/null | grep test
test | postgres | UTF8 | C.UTF-8 | C.UTF-8 |
Check that the secondary does have a "walreceiver" process running:
$ lxc shell j2
# ps axw|grep -E "postgres:.*wal" | grep -v grep
6001 ? Ss 0:06 postgres: 14/main: walreceiver streaming 0/7000780
Now run this long command, one line, on the secondary. With the bug present, the command will complain that the walreceiver process is NOT running:
# OCF_RESKEY_check_wal_receiver=true OCF_RESKEY_socketdir=/run/postgresql OCF_RESKEY_config=/etc/postgresql/14/main/postgresql.conf OCF_RESKEY_pgctl=/usr/lib/postgresql/14/bin/pg_ctl OCF_RESKEY_pgdata=/var/lib/postgresql/14/main OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/pgsql monitor
INFO: Don't check /var/lib/postgresql/14/main during probe
WARNING: wal receiver process is not running
With the bug fixed, the warning will not be present in the output.
I have a test case that does not involve setting up a pacemaker cluster, but should be enough to verify that the resource agent is now properly detecting the postgresql walreceiver process. Here it is. Could you please run it yourself and see if you get the same results as I did?
[Test plan]
Create a lxd container for the primary postgresql:
$ lxc launch ubuntu:jammy j1
Connect and install packages: agents- extra pacemaker-cli-utils -y
$ lxc shell j1
# apt update && apt install postgresql resource-
Configure postgresql.conf and pg_hba.conf: /14/main/ pg_hba. conf
# pg_conftool 14 main set listen_addresses '*'
# pg_conftool 14 main set wal_level replica
# echo "host replication replicator all scram-sha-256" >> /etc/postgresql
Create replication user (choose a password, and remember it, it will be needed again later):
# sudo -u postgres createuser --replication -P -e replicator
restart the primary postgresql:
# systemctl restart postgresql
Back on the host, create lxd container for the secondary postgresql:
$ lxc launch ubuntu:jammy j2
Connect and install packages: agents- extra pacemaker-cli-utils -y
$ lxc shell j2
# apt update && apt install postgresql resource-
Stop postgresql:
# systemctl stop postgresql
Configure postgresql.conf:
# pg_conftool 14 main set listen_addresses '*'
# pg_conftool 14 main set hot_standby on
Cleanup data dir: postgresql/ */main/ *
# rm -rf /var/lib/
Perform initial replication as "postgres" user. The pg_basebackup command will prompt for the "replicator" password created earlier on the primary: postgresql/ 14/main -U replicator -P -v -R
# sudo -u postgres -i
$ pg_basebackup -h <IP-of-primary> -D /var/lib/
$ exit
Start the secondary:
# systemctl start postgresql
Verify replication: list of databases on the secondary does not have a "test" database:
# sudo -u postgres psql -l 2>/dev/null | grep test
#
On the primary, create a test database:
$ lxc shell j1
# sudo -u postgres createdb test
could not change directory to "/root": Permission denied
# sudo -u postgres psql -l 2>/dev/null | grep test
test | postgres | UTF8 | C.UTF-8 | C.UTF-8 |
On the secondary, verify that the test database now exists:
$ lxc shell j2
# sudo -u postgres psql -l 2>/dev/null | grep test
test | postgres | UTF8 | C.UTF-8 | C.UTF-8 |
Check that the secondary does have a "walreceiver" process running:
$ lxc shell j2
# ps axw|grep -E "postgres:.*wal" | grep -v grep
6001 ? Ss 0:06 postgres: 14/main: walreceiver streaming 0/7000780
Now run this long command, one line, on the secondary. With the bug present, the command will complain that the walreceiver process is NOT running: check_wal_ receiver= true OCF_RESKEY_ socketdir= /run/postgresql OCF_RESKEY_ config= /etc/postgresql /14/main/ postgresql. conf OCF_RESKEY_ pgctl=/ usr/lib/ postgresql/ 14/bin/ pg_ctl OCF_RESKEY_ pgdata= /var/lib/ postgresql/ 14/main OCF_ROOT= /usr/lib/ ocf /usr/lib/ ocf/resource. d/heartbeat/ pgsql monitor postgresql/ 14/main during probe
# OCF_RESKEY_
INFO: Don't check /var/lib/
WARNING: wal receiver process is not running
With the bug fixed, the warning will not be present in the output.