Ubuntu
resource-agents package

Bug #2013084
Comment #16

Comment 16 for bug 2013084

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2023-07-04 (last edit on 2023-07-05):

#16

I have a test case that does not involve setting up a pacemaker cluster, but should be enough to verify that the resource agent is now properly detecting the postgresql walreceiver process. Here it is. Could you please run it yourself and see if you get the same results as I did?

[Test plan]
Create a lxd container for the primary postgresql:
$ lxc launch ubuntu:jammy j1

Connect and install packages:
$ lxc shell j1
# apt update && apt install postgresql resource-agents-extra pacemaker-cli-utils -y

Configure postgresql.conf and pg_hba.conf:
# pg_conftool 14 main set listen_addresses '*'
# pg_conftool 14 main set wal_level replica
# echo "host replication replicator all scram-sha-256" >> /etc/postgresql/14/main/pg_hba.conf

Create replication user (choose a password, and remember it, it will be needed again later):
# sudo -u postgres createuser --replication -P -e replicator

restart the primary postgresql:
# systemctl restart postgresql

Back on the host, create lxd container for the secondary postgresql:
$ lxc launch ubuntu:jammy j2

Connect and install packages:
$ lxc shell j2
# apt update && apt install postgresql resource-agents-extra pacemaker-cli-utils -y

Stop postgresql:
# systemctl stop postgresql

Configure postgresql.conf:
# pg_conftool 14 main set listen_addresses '*'
# pg_conftool 14 main set hot_standby on

Cleanup data dir:
# rm -rf /var/lib/postgresql/*/main/*

Perform initial replication as "postgres" user. The pg_basebackup command will prompt for the "replicator" password created earlier on the primary:
# sudo -u postgres -i
$ pg_basebackup -h <IP-of-primary> -D /var/lib/postgresql/14/main -U replicator -P -v -R
$ exit

Start the secondary:
# systemctl start postgresql

Verify replication: list of databases on the secondary does not have a "test" database:
# sudo -u postgres psql -l 2>/dev/null | grep test
#

Check that the secondary does have a "walreceiver" process running:
$ lxc shell j2
# ps axw|grep -E "postgres:.*wal" | grep -v grep
6001 ? Ss 0:06 postgres: 14/main: walreceiver streaming 0/7000780

Now run this long command, one line, on the secondary. With the bug present, the command will complain that the walreceiver process is NOT running:
# OCF_RESKEY_check_wal_receiver=true OCF_RESKEY_socketdir=/run/postgresql OCF_RESKEY_config=/etc/postgresql/14/main/postgresql.conf OCF_RESKEY_pgctl=/usr/lib/postgresql/14/bin/pg_ctl OCF_RESKEY_pgdata=/var/lib/postgresql/14/main OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/heartbeat/pgsql monitor
INFO: Don't check /var/lib/postgresql/14/main during probe
WARNING: wal receiver process is not running

With the bug fixed, the warning will not be present in the output.