Hi,
after an stein update the ovndb cluster is "broken" Meaning that only one node is alive (the master).
Online: [ controller-0 controller-1 controller-2 ]
GuestOnline: [ galera-bundle-0@controller-0 galera-bundle-1@controller-1 galera-bundle-2@controller-2 ovn-dbs-bundle-0@controller-0 ovn-dbs-bundle-1@controller-1 ovn-dbs-bundle-2@controller-2 rabbitmq-bundle-0@con
troller-0 rabbitmq-bundle-1@controller-1 rabbitmq-bundle-2@controller-2 redis-bundle-0@controller-0 redis-bundle-1@controller-1 redis-bundle-2@controller-2 ]
Full list of resources:
podman container set: galera-bundle [192.168.24.1:8787/rhosp15/openstack-mariadb:pcmklatest]
galera-bundle-0 (ocf::heartbeat:galera): Master controller-0
galera-bundle-1 (ocf::heartbeat:galera): Master controller-1
galera-bundle-2 (ocf::heartbeat:galera): Master controller-2
podman container set: rabbitmq-bundle [192.168.24.1:8787/rhosp15/openstack-rabbitmq:pcmklatest]
rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started controller-0
rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started controller-1
rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started controller-2
podman container set: redis-bundle [192.168.24.1:8787/rhosp15/openstack-redis:pcmklatest]
redis-bundle-0 (ocf::heartbeat:redis): Master controller-0
redis-bundle-1 (ocf::heartbeat:redis): Slave controller-1
redis-bundle-2 (ocf::heartbeat:redis): Slave controller-2
ip-192.168.24.15 (ocf::heartbeat:IPaddr2): Started controller-0
ip-10.0.0.110 (ocf::heartbeat:IPaddr2): Started controller-1
ip-172.17.1.72 (ocf::heartbeat:IPaddr2): Started controller-0
ip-172.17.1.108 (ocf::heartbeat:IPaddr2): Started controller-2
ip-172.17.3.110 (ocf::heartbeat:IPaddr2): Started controller-0
ip-172.17.4.102 (ocf::heartbeat:IPaddr2): Started controller-1
podman container set: haproxy-bundle [192.168.24.1:8787/rhosp15/openstack-haproxy:pcmklatest]
haproxy-bundle-podman-0 (ocf::heartbeat:podman): Started controller-0
haproxy-bundle-podman-1 (ocf::heartbeat:podman): Started controller-1
haproxy-bundle-podman-2 (ocf::heartbeat:podman): Started controller-2
podman container set: ovn-dbs-bundle [192.168.24.1:8787/rhosp15/openstack-ovn-northd:pcmklatest]
ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Stopped controller-0
ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Stopped controller-1
ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Master controller-2
podman container: openstack-cinder-volume [192.168.24.1:8787/rhosp15/openstack-cinder-volume:pcmklatest]
openstack-cinder-volume-podman-0 (ocf::heartbeat:podman): Started controller-1
Failed Resource Actions:
* ovndb_servers_start_0 on ovn-dbs-bundle-0 'unknown error' (1): call=8, status=Timed Out, exitreason='',
last-rc-change='Wed Oct 9 09:12:56 2019', queued=0ms, exec=200002ms
* ovndb_servers_start_0 on ovn-dbs-bundle-1 'unknown error' (1): call=8, status=Timed Out, exitreason='',
last-rc-change='Wed Oct 9 09:42:35 2019', queued=0ms, exec=200002ms
The state persist even after reboot. So we have a small cut in ctl plane, but it's still working. We lose HA though.
A simple pcs resource cleanup solve it.
Originally reported there : https://bugzilla.redhat.com/show_bug.cgi?id=1760405
Full explanation of the issue there: https://bugzilla.redhat.com/show_bug.cgi?id=1759974#c4
Fix proposed to branch: master /review. opendev. org/688212
Review: https:/