ovsdb post cluster failure/network partition recovery

Bug #1979201 reported by Dmitrii Shcherbakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
charm-ovn-central
New
Undecided
Unassigned

Bug Description

1) Deployed 3 ovn-central units (2/3 units on one node) and vault without unsealing it
2) Rebooted the node with 2 ovn-central units
3) Rebooted the node with 1 ovn-central units after the other node came back
4) Unsealed vault with auto-generation of certs
5) The model settled with a split-brain (see juju status below)

https://paste.ubuntu.com/p/jmHZvvv72C/ (bundle)

This doc describes the acceptable failure conditions https://docs.openvswitch.org/en/latest/ref/ovsdb.7/#clustered-database-service-model

However, the charm is lacking actions/documentation to provide supported cluster recovery steps after a cluster failure or network partition happens.

$ juju status
Model Controller Cloud/Region Version SLA Timestamp
default maaslab-default maaslab/default 2.9.31 unsupported 16:20:44+03:00

App Version Status Scale Charm Channel Rev Exposed Message
mysql-innodb-cluster 8.0.29 active 3 mysql-innodb-cluster 8.0/stable 26 no Unit is ready: Mode: R/W, Cluster is ONLINE and can tolerate up to ONE failure.
ovn-central 22.03.0 active 3 ovn-central 22.03/stable 31 no Unit is ready (northd: active)
vault 1.7.9 active 1 vault 1.7/stable 68 no Unit is ready (active: true, mlock: enabled)
vault-mysql-router 8.0.29 active 1 mysql-router 8.0/stable 30 no Unit is ready

Unit Workload Agent Machine Public address Ports Message
mysql-innodb-cluster/0 active idle 0/lxd/0 10.10.20.12 Unit is ready: Mode: R/W, Cluster is ONLINE and can tolerate up to ONE failure.
mysql-innodb-cluster/1 active idle 2/lxd/0 10.10.20.16 Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.
mysql-innodb-cluster/2* active idle 1/lxd/2 10.10.20.15 Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.
ovn-central/1 active idle 1/lxd/0 10.10.20.13 6641/tcp,6642/tcp Unit is ready (northd: active)
ovn-central/2* active idle 1/lxd/1 10.10.20.14 6641/tcp,6642/tcp Unit is ready (leader: ovnnb_db, ovnsb_db)
ovn-central/3 active idle 0/lxd/2 10.10.20.17 6641/tcp,6642/tcp Unit is ready (leader: ovnnb_db, ovnsb_db)
vault/0* active idle 2 10.10.20.9 8200/tcp Unit is ready (active: true, mlock: enabled)
  vault-mysql-router/0* active idle 10.10.20.9 Unit is ready

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.