Comment 2 for bug 1813703

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

Not sure if the each of the sub-bugs that are listed in here can be fixed individually.
We have seen these problems at scale as well with our customers.
Probably for the purpose of fixing things, as I mentioned in one of the bugs, there are couple of items that we can separate from this discussion.
1. Make ovs-agent to openvswitchd communication robust at scale. Don't get locked or disconnected.

2. Introduce some sort of throttle mechanism for syncing the port details when there is a sync.
   ( May be suggest some config options for the rabbitmq configurations for getting rid of timeouts and handling the rpc calls)

3. On the server side make sure even if we have 2000+ ports on a single subnet it can handle it. Meanwhile the full sync might not happen from all nodes at the same time, but the issue here is with a single subnet hosting more than 2000+ ports. There may be some tuning that we can do in the DB lookup for each and every port based on the subnet/network.