This is probably a regression of bug #1194438
From several checks with parallel jobs, it seems agent loop becomes slower across iterations.
Slowness is not due to plugin RPC calls, which might be optimized anyway, but is mainly due to applying iptables/ovs configurations.
This consistently occurs in tempest tests with parallelism enabled.
Example logs here: http://logs.openstack.org/20/57420/5/experimental/check-tempest-devstack-vm-neutron-isolated-parallel/54c6db3/logs
1 - OVS AGENT - Iteration #33 starts at 1:36:03
2 - Nova API - Server POST request at 1.39.42
3 - Neutron Server - Neutron POST /ports at 1.39.44
4 - OVS Agent OVS DB monitor detects instance's tap a 1.39.45
5 - OVS Agent - Iteration #33 completes processing device filters (security groups) at 1.39.51
6 - Nova API - Server ACTIVE by at 1.39.55
7 - Neutron Server - Floating IP POST at 1.39.55
8 - Neutron L3/VPN Agent - Floating IP ready at 1.40.37 (42 seconds???? - this should be another investigation)
9 - OVS Agent - Iteration #33 on OVS agent complete processing devices at 1.40.16
NOTE: The added device was not processed because the iteration started before the device was detected
10 - TEMPEST - TIMEOUT ON TEST at 1.40.56 - connection failed because internal port not wired
11 - OVS Agent - Iteration #33 completes processing ancillary ports at 1:42:07
12 - OVS Agent - Iteration #34 starts at 1:42:08
13 - OVS Agent - Iteration #34 completes processing device filters at 1:43:35
14 - The wiring of the interface for the server is not captured by the logs as the tempest test completed in the meanwhile.
The cause for the massive amount of time needed to complete a loop is the number of calls sent from the neutron server which need to be handled.
In some cases about 1,000 incoming requests, which resulted in about 1,500 calls to neutron-server from the agent, were observed in a single tempest run (isolated and parallel).
In particular calls for security group updates and port updates trigger refresh_firewall which is a rather expensive call.
In some cases even 20 threads were concurrently running refresh firewall; all these threads synchronize on a semaphore for iptables.
This number is currently being brought down in https:/ /review. openstack. org/#/c/ 57420, by:
- ensuring messages are sent from the server to the client only when really necessary
- reworking message handling in the agent by reacting to notification in the main rpc loop rather then immediately once the message is received, thus avoiding concurrent execution of methods which will end up doing exactly the same changes to iptables
- grouping calls from the agent to the server where possible (e.g.: send a single request for device details instead of a request for each device)
Leveraging threads or external processes for tasks which do not have to be synchronous with port processing is also currently being evaluated.