Thanks for providing your configuration, it was helpful. I spent my day yesterday trying to trigger this bug and for some reason I was not able to do it. I have set up a 2-node Focal cluster, set up PostgreSQL with replica streaming, and I tried to swap the master and slave some times and it worked for me. FWIW this is my cluster configuration:
You can see the 'postgresql-receiver-status' error value in the xml output but it should not impact the reproducibility of this bug, actually the 'pgsql-data-status' is there. During this process I found an unrelated issue with the PostgreSQL resource and I filed this bug:
However, after checking the code I can see your point and it indeed seems buggy, I do not know why my attempt did not trigger it. I'd not want to spend too much more time on it, I'd like to use your configuration as the test case for the SRU and if it's possible ask you to do the validation work when the SRU team requests it. Is that OK for you? Maybe you could try to do what I did (in the pastebin link) and that would be good enough for our test case I think.
Hi Jason,
Thanks for providing your configuration, it was helpful. I spent my day yesterday trying to trigger this bug and for some reason I was not able to do it. I have set up a 2-node Focal cluster, set up PostgreSQL with replica streaming, and I tried to swap the master and slave some times and it worked for me. FWIW this is my cluster configuration:
node 1: focal01 \ status= "STREAMING| SYNC" status= LATEST /usr/lib/ postgresql/ 12/bin/ pg_ctl" pgdata= "/var/lib/ postgresql/ 12/main" psql="/ usr/bin/ psql" config= "/etc/postgresq l/12/main/ postgresql. conf" rep_mode=sync master_ ip=192. 168.3.3 repuser=replicator restart_ on_promote= true check_wal_ receiver= true node_list="focal01 focal02" \ vip_public vip_public role=Started inf: focal01 replica- psql_master- vip_public inf: vip_replica:start master_ postgresql: promote vip_public:start and_vips inf: master_postgresql vip_public vip_replica options: \ false \ 2.0.3-4b1f869f0 f \ infrastructure= corosync \ name=cluster01 \ enabled= false \ refresh= 1603195934 stickiness= 100
attributes pgsql-data-
node 2: focal02 \
attributes pgsql-data-
primitive postgresql pgsql \
params pgctl="
op monitor timeout=30 interval=2
primitive vip_public IPaddr2 \
params ip=192.168.3.4 cidr_netmask=24 \
op monitor interval=10s \
meta target-role=Started
primitive vip_replica IPaddr2 \
params ip=192.168.3.3 cidr_netmask=24 \
op monitor interval=10s \
meta target-role=Started
ms master_postgresql postgresql \
meta notify=true master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 target-role=Started
location cli-prefer-
order order-vip_
colocation psql_master_
property cib-bootstrap-
have-watchdog=
dc-version=
cluster-
cluster-
stonith-
last-lrm-
rsc_defaults rsc-options: \
resource-
And here you can see the some of the commands I ran and their respective output:
https:/ /pastebin. ubuntu. com/p/ZBBByznQf T/
You can see the 'postgresql- receiver- status' error value in the xml output but it should not impact the reproducibility of this bug, actually the 'pgsql-data-status' is there. During this process I found an unrelated issue with the PostgreSQL resource and I filed this bug:
https:/ /bugs.launchpad .net/ubuntu/ +source/ resource- agents/ +bug/1900613
However, after checking the code I can see your point and it indeed seems buggy, I do not know why my attempt did not trigger it. I'd not want to spend too much more time on it, I'd like to use your configuration as the test case for the SRU and if it's possible ask you to do the validation work when the SRU team requests it. Is that OK for you? Maybe you could try to do what I did (in the pastebin link) and that would be good enough for our test case I think.