Comment 1 for bug 1948680

Revision history for this message
Giuseppe Petralia (peppepetra) wrote : Re: Removing one ovn-central unit doesn't remove the server from the Southbound cluster

Subscribing field-medium.

This issue can cause major outages when sb and nb clusters can't elect a leader because of stale ovn-central units in the cluster.

Starting from 3 ovn-central, we had to remove 2 ovn-central (because of hw maintenance) and added two back. We didn't manually cluster/leave. The two raft clusters were unable to elect a leader because both SB and NB had 4 members of which 2 down and the 5th unit could not join the cluster

The 3 left ovn-central were /2 /3 /4

To recreate the clusters we followed these steps:

Recovery steps:

1. stop all units:
juju run-action ovn-central/2 pause --wait
juju run-action ovn-central/3 pause --wait
juju run-action ovn-central/4 pause --wait

2. created standalone on ovn-central/2
# ovsdb-tool cluster-to-standalone /tmp/standalone_ovnsb_db.db /var/lib/ovn/ovnsb_db.db
# ovsdb-tool cluster-to-standalone /tmp/standalone_ovnnb_db.db /var/lib/ovn/ovnnb_db.db

3. create clusters
ovsdb-tool create-cluster /var/lib/ovn/ovnsb_db.db /tmp/standalone_ovnsb_db.db ssl:<ovn-central-2-ip>:6644
ovsdb-tool create-cluster /var/lib/ovn/ovnnb_db.db /tmp/standalone_ovnnb_db.db ssl:<ovn-central-2-ip>:6643

4. Resume ovn-central/2

5. Join cluster from ovn-central/3

ovsdb-tool --cid=<new-sb-cid-took-from-ovn-central-2> join-cluster /var/lib/ovn/ovnsb_db.db OVN_Southbound ssl:<ovn-central-3-ip>:6644 ssl:<ovn-central-2-ip>:6644
ovsdb-tool --cid=<new-nb-cid-took-from-ovn-central-2> join-cluster /var/lib/ovn/ovnnb_db.db OVN_Northbound ssl:<ovn-central-3-ip>:6643 ssl:<ovn-central-2-ip>:6643

6. Resuming /3
juju run-action ovn-central/3 resume --wait

7. Fixing leader-set
juju run -u ovn-central/leader leader-set nb_cid="<new-nb-cid-took-from-ovn-central-2>"
juju run -u ovn-central/leader leader-set sb_cid="<new-sb-cid-took-from-ovn-central-2>"

8. Resuming /4
juju run-action ovn-central/4 resume --wait