Comment 0 for bug 1991817

Revision history for this message
Krzysztof Tomaszewski (labedz) wrote :

On bigger scale deployments (150+ compute hosts) neutron-ovn-metadata-agent liveness system generates CPU usage peak on OVN Southbound DB system every period of time (agent_down_time / 2). This CPU saturation time can takes dozens of seconds and it introduces a significant latency in OVN service response.

Problem is that every neutron-ovn-metadata-agent is instantly responding on event on SB_Global
table and updates it's corresponding Chassis/Chassis_Private table external_ids property.
That generate flood of SBDB updates.

Similar issue can be observed on different neutron agents that are using oslo.messaging system to deliver it's heartbeats (like neutron ovs agent) but in those cases the load generated by liveness system can be distributed in time just by different agent execution time.

neutron-ovn-metadata-agent heartbeat does not rely on the agent execute time but is triggered by general OVN event.

Solution could be to distribute neutron-ovn-metadata-agent heartbeat update time just by postponing it's answer in randomized period of time (where delay time range is not exceeding agent_down_time / 2 parameter).