At completion of a deployment with multiple controllers, by observing the gre tunnels created in OVS by the neutron ovs-agent, one will find that some neutron nodes may miss the tunnels in between them.
This is due to ovs-agents getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from other nodes or publish updates.
The disconnection may happen following a reconfig of a rabbit node, the VIP moving over a different node, or even _during_ deployment due to rabbit cluster configuration.
Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging
At completion of a deployment with multiple controllers, by observing the gre tunnels created in OVS by the neutron ovs-agent, one will find that some neutron nodes may miss the tunnels in between them.
This is due to ovs-agents getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from other nodes or publish updates.
The disconnection may happen following a reconfig of a rabbit node, the VIP moving over a different node, or even _during_ deployment due to rabbit cluster configuration.
Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging