neutron

Bug #1969354
Comment #3

Comment 3 for bug 1969354

Revision history for this message

Jakub Libosvar (libosvar) wrote on 2022-04-26:

1) This could be also caused by the probe running on the DB side. You can bump this by changing the inactivity_probe in connection tables. These commands do that:

ovn-nbctl set connection . inactivity_probe=60000
ovn-sbctl set connection . inactivity_probe=60000

2) I don't know how is the OVN cluster managed by your tooling. If you use RAFT active/active or pacemaker managed active/backup solution. But it sounds like the DBs are loaded enough to not answer to monitoring probes in time. For the pacemaker solution you can try to change the monitor interval and timeouts on master and slave roles. Again, I don't know how it's managed in your env but these commands can be handy:

pcs resource update <your_ovn_db_resource> op monitor interval=60s role=Master timeout=180s enabled=true
pcs resource update <your_ovn_db_resource> op monitor interval=120s role=Slave timeout=180s enabled=true

That all said, it seems like a misconfiguration of the cloud leading to scaling issue rather then a bug in Neutron.