ovn recovery has a race
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
High
|
Michele Baldessari |
Bug Description
Currently there is a race with the high-availability when resetting a controller. Namely, the VIP that OVN uses (namely the internal_api VIP by default) only has a colocation constraint with the master role of the ovn-dbs resource. This leaves the following race open:
1) We reboot ctrl-0 hosting the master role of ovn-dbs
2) OVN becomes master on ctrl-1 from pacemaker's POV (but the
promotion operation running in the background is not completed)
3) OVN VIP moves to ctrl-1 even though it is still in slave mode
(there is only a colocation constraint between vip and master role for ovn)
4) OVN controllers on the overcloud connect to the VIP but it is in
read-only mode because it was a slave
5) OVN controllers that connected at 4) stay in read-only forever
until they get restarted manually.
Changed in tripleo: | |
status: | Triaged → In Progress |
Fix proposed to branch: stable/stein /review. opendev. org/669803
Review: https:/