Race condition when generating ovn.env file
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
microovn |
Fix Released
|
High
|
Unassigned |
Bug Description
When member joins (or leaves) microovn cluster, rest of the existing members re-generate their $SNAP_COMMON/
```
<earlier lines omitted>
addresses := make([]string, len(servers))
remotes := s.Remotes(
for i, server := range servers {
remote, ok := remotes[
if !ok {
continue
}
addresses[i] = fmt.Sprintf(
netip.
}
return strings.
```
The final string is create from the slice `addresses` which is initiated with the length of `servers` and filled by looping over `servers` and looking up remote address of each server. However if the lookup fails (because the remote server is not properly registered in the database yet), the loop `continue`s and the `addresses` now have empty element that breaks the final string.
Expected string for cluster of 3:
"tcp:10.
Broken string for cluster of 3 where one remote address lookup failed:
"tcp:10.
This broken string then produces errors in ovn-central service like this:
May 12 11:49:02 juju-89d2ab-
My proposal is to add retry mechanism in case that remote server lookup fails. In addition, I think the slice `addresses` should be initialized with length 0 and capacity of `len(servers)` to prevent having empty elements in it.
[0]https:/
[1]https:/
Changed in microovn: | |
status: | New → Fix Committed |
importance: | Undecided → High |
Changed in microovn: | |
status: | Fix Committed → Fix Released |
The missing address happens because microcluster's `Remotes` are updated on the next heartbeat. To get the nodes at the cluster level, there is the `cluster. InternalCluster Member` type available to fetch from the dqlite database via `cluster. GetInternalClus terMembers( )`