ovn-nbctl times out after 10s with 4+ machines

Bug #2033672 reported by Max Asnaashari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
microovn
Expired
Undecided
Unassigned

Bug Description

When setting up MicroOVN with a LXD cluster, both using the `stable` channel snap, when LXD attempts to create the `ovn` network, it calls

```
ovn-nbctl --timeout=10 --db ssl:10.70.115.216:6641,ssl:10.70.115.137:6641,ssl:10.70.115.153:6641 -c /etc/ovn/cert_host -p /etc/ovn/key_host -C /etc/ovn/ovn-central.crt --wait=sb ha-chassis-group-add lxd-net2
```

with `ovn.env` from MicroOVN:
```
OVN_INITIAL_NB="10.70.115.216"
OVN_INITIAL_SB="10.70.115.216"
OVN_NB_CONNECT="ssl:10.70.115.216:6641,ssl:10.70.115.137:6641,ssl:10.70.115.153:6641"
OVN_SB_CONNECT="ssl:10.70.115.216:6642,ssl:10.70.115.137:6642,ssl:10.70.115.153:6642"
OVN_LOCAL_IP="10.70.115.137"
```

When there are 4 or more cluster members, this command occasionally times out after 10s which implies the network is unreachable:
```
Error: Failed to run: ovn-nbctl --timeout=10 --db ssl:10.70.115.216:6641,ssl:10.70.115.137:6641,ssl:10.70.115.153:6641 -c /etc/ovn/cert_host -p /etc/ovn/key_host -C /etc/ovn/ovn-central.crt --wait=sb ha-chassis-group-add lxd-net2: signal: alarm clock (2023-08-31T16:34:33Z|00003|fatal_signal|WARN|terminating with signal 14 (Alarm clock))
```

This might have to do with only 3 systems being present in `ovn.env` for the `CONNECT` strings, with the command occasionally being run on the excluded system, but I'm not sure.

Tags: lxd microcloud
Max Asnaashari (masnax)
tags: added: lxd
tags: added: microcloud
Revision history for this message
Frode Nordahl (fnordahl) wrote :

The OVN central services are intentionally only ran on 3 of the nodes due to clustered DBs using the RAFT algorithm for consensus. Subsequently the NB/SB connect string will always only contain 3 IP addresses.

Any participating node without OVN central services will connect to all of the addresses in the connect string and depending on client settings, settle with the first one it hits or hunt for the leader.

Is there something in the deployment/environment preventing the client to connect to the nodes with OVN DB servers?

Changed in microovn:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for microovn because there has been no activity for 60 days.]

Changed in microovn:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.