This indeed seems to be related to agent_down_time being default value.
What I tested was creating new ports, which calls ping_all_chassis() with the following stacktrace:
2021-08-24 19:12:09.015 30 INFO neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver [req-f7fed993-0b19-4a89-ae9a-eda1c6d1f4e4 b8b76d39176a400d8652a013a5da1b05 78368d2238ea45b28e818918b1da7efd - - -] File "/var/lib/kolla/venv/lib/pytFile "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/green/thread.py", line 42, in __thread_body
File "/usr/lib/python3.6/threading.py", line 884, in _bootstrap
File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/green/thread.py", line 63, in wrap_bootstrap_inner
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
File "/var/lib/kolla/venv/lib/python3.6/site-packages/futurist/_thread.py", line 122, in run
File "/var/lib/kolla/venv/lib/python3.6/site-packages/futurist/_utils.py", line 52, in run
File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 359, in inner
File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/ovo_rpc.py", line 120, in dispatch_events
File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/api/rpc/handlers/resources_rpc.py", line 245, in push
File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/api/rpc/handlers/resources_rpc.py", line 251, in _push
File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/api/rpc/callbacks/version_manager.py", line 250, in get_resource_versions
File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/api/rpc/callbacks/version_manager.py", line 226, in get_resource_versions
File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/api/rpc/callbacks/version_manager.py", line 222, in _check_expiration
File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/api/rpc/callbacks/version_manager.py", line 211, in _update_consumer_versions
File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/db/agents_db.py", line 466, in get_agents_resource_versions
File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/db/agents_db.py", line 453, in _get_agents_considered_for_versions
File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 1007, in fn
File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 1072, in get_agents
File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 1040, in ping_all_chassis
So that's not every API call, but due to neutron load increasing by chassis_private updates it quickly turned into update on almost every port creation.
Now for the second part, I've added the following patch to see how long calls to self.notify_handler.notify() take:
from neutron_lib import context as neutron_context
from neutron_lib.plugins import constants
@@ -571,7 +572,10 @@ class OvnIdlDistributedLock(BaseOvnIdl): '(table: %(table)s)', {'node': self._node_uuid, 'hostname': CONF.host, 'event': event, 'row': row.uuid, 'table': row._table.name})
+ start = timeit.default_timer() self.notify_handler.notify(event, row, updates)
+ LOG.info("KK: processing time for table %(table)s: %(seconds)s",
+ {"table": row._table.name, "seconds": timeit.default_timer() - start})
@abc.abstractmethod
def post_connect(self):
This is a snapshot from one of the servers:
KK: processing time for table Chassis_Private: 4.142499528825283e-05
KK: processing time for table Chassis_Private: 4.569196607917547e-05
KK: processing time for table Chassis_Private: 5.8398989494889975e-05
KK: processing time for table Chassis_Private: 4.1155959479510784e-05
KK: processing time for table Chassis_Private: 6.496300920844078e-05
KK: processing time for table Chassis_Private: 4.8479996621608734e-05
KK: processing time for table Chassis_Private: 5.863799015060067e-05
KK: processing time for table Chassis_Private: 3.99000127799809e-05
KK: processing time for table Chassis_Private: 6.0677994042634964e-05
KK: processing time for table Chassis_Private: 3.629102138802409e-05
KK: processing time for table Chassis_Private: 4.8089015763252974e-05
KK: processing time for table Chassis_Private: 3.909901715815067e-05
KK: processing time for table Chassis_Private: 8.133298251777887e-05
KK: processing time for table Chassis_Private: 2.78400257229805e-05
KK: processing time for table Chassis_Private: 5.877105286344886e-05
KK: processing time for table Chassis_Private: 4.8115034587681293e-05
KK: processing time for table Chassis_Private: 5.627400241792202e-05
KK: processing time for table Chassis_Private: 2.924696309491992e-05
KK: processing time for table Chassis_Private: 8.873397018760443e-05
KK: processing time for table Port_Binding: 5.023000994697213e-05
KK: processing time for table Chassis_Private: 3.7371995858848095e-05
KK: processing time for table Chassis_Private: 0.00013784202747046947
KK: processing time for table Chassis_Private: 5.2426010370254517e-05
KK: processing time for table Port_Group: 2.144702011719346e-05
KK: processing time for table Logical_Switch: 2.6123016141355038e-05
KK: processing time for table Logical_Switch_Port: 1.9197002984583378e-05
KK: processing time for table Chassis_Private: 0.00011140096466988325
KK: processing time for table Chassis_Private: 6.18079793639481e-05
KK: processing time for table Chassis_Private: 5.5163982324302197e-05
KK: processing time for table Chassis_Private: 2.8703012503683567e-05
KK: processing time for table Chassis_Private: 3.352697240188718e-05
KK: processing time for table Chassis_Private: 4.7340989112854004e-05
So somehow hash ring related code running in api worker takes pretty long time sending notifications on table updates. Is this an async call, or does it wait for receiver to process notification? Even if that were the case AgentCache processing should not take that much time in general unless entire neutron server is heavily overloaded?
This indeed seems to be related to agent_down_time being default value.
What I tested was creating new ports, which calls ping_all_chassis() with the following stacktrace:
2021-08-24 19:12:09.015 30 INFO neutron. plugins. ml2.drivers. ovn.mech_ driver. mech_driver [req-f7fed993- 0b19-4a89- ae9a-eda1c6d1f4 e4 b8b76d39176a400 d8652a013a5da1b 05 78368d2238ea45b 28e818918b1da7e fd - - -] File "/var/lib/ kolla/venv/ lib/pytFile "/var/lib/ kolla/venv/ lib/python3. 6/site- packages/ eventlet/ green/thread. py", line 42, in __thread_body python3. 6/threading. py", line 884, in _bootstrap kolla/venv/ lib/python3. 6/site- packages/ eventlet/ green/thread. py", line 63, in wrap_bootstrap_ inner python3. 6/threading. py", line 916, in _bootstrap_inner kolla/venv/ lib/python3. 6/site- packages/ futurist/ _thread. py", line 122, in run kolla/venv/ lib/python3. 6/site- packages/ futurist/ _utils. py", line 52, in run kolla/venv/ lib/python3. 6/site- packages/ oslo_concurrenc y/lockutils. py", line 359, in inner kolla/venv/ lib/python3. 6/site- packages/ neutron/ plugins/ ml2/ovo_ rpc.py" , line 120, in dispatch_events kolla/venv/ lib/python3. 6/site- packages/ neutron/ api/rpc/ handlers/ resources_ rpc.py" , line 245, in push kolla/venv/ lib/python3. 6/site- packages/ neutron/ api/rpc/ handlers/ resources_ rpc.py" , line 251, in _push kolla/venv/ lib/python3. 6/site- packages/ neutron/ api/rpc/ callbacks/ version_ manager. py", line 250, in get_resource_ versions kolla/venv/ lib/python3. 6/site- packages/ neutron/ api/rpc/ callbacks/ version_ manager. py", line 226, in get_resource_ versions kolla/venv/ lib/python3. 6/site- packages/ neutron/ api/rpc/ callbacks/ version_ manager. py", line 222, in _check_expiration kolla/venv/ lib/python3. 6/site- packages/ neutron/ api/rpc/ callbacks/ version_ manager. py", line 211, in _update_ consumer_ versions kolla/venv/ lib/python3. 6/site- packages/ neutron/ db/agents_ db.py", line 466, in get_agents_ resource_ versions kolla/venv/ lib/python3. 6/site- packages/ neutron/ db/agents_ db.py", line 453, in _get_agents_ considered_ for_versions kolla/venv/ lib/python3. 6/site- packages/ neutron/ plugins/ ml2/drivers/ ovn/mech_ driver/ mech_driver. py", line 1007, in fn kolla/venv/ lib/python3. 6/site- packages/ neutron/ plugins/ ml2/drivers/ ovn/mech_ driver/ mech_driver. py", line 1072, in get_agents kolla/venv/ lib/python3. 6/site- packages/ neutron/ plugins/ ml2/drivers/ ovn/mech_ driver/ mech_driver. py", line 1040, in ping_all_chassis
File "/usr/lib/
File "/var/lib/
File "/usr/lib/
File "/var/lib/
File "/var/lib/
File "/var/lib/
File "/var/lib/
File "/var/lib/
File "/var/lib/
File "/var/lib/
File "/var/lib/
File "/var/lib/
File "/var/lib/
File "/var/lib/
File "/var/lib/
File "/var/lib/
File "/var/lib/
File "/var/lib/
So that's not every API call, but due to neutron load increasing by chassis_private updates it quickly turned into update on almost every port creation.
Now for the second part, I've added the following patch to see how long calls to self.notify_ handler. notify( ) take:
diff --git a/neutron/ plugins/ ml2/drivers/ ovn/mech_ driver/ ovsdb/ovsdb_ monitor. py b/neutron/ plugins/ ml2/drivers/ ovn/mech_ driver/ ovsdb/ovsdb_ monitor. py .335e6ea696 100644 plugins/ ml2/drivers/ ovn/mech_ driver/ ovsdb/ovsdb_ monitor. py plugins/ ml2/drivers/ ovn/mech_ driver/ ovsdb/ovsdb_ monitor. py
index 293eba7605.
--- a/neutron/
+++ b/neutron/
@@ -14,6 +14,7 @@
import abc
import datetime
+import timeit
from neutron_lib import context as neutron_context edLock( BaseOvnIdl) :
'( table: %(table)s)',
{' node': self._node_uuid, 'hostname': CONF.host,
' event': event, 'row': row.uuid, 'table': row._table.name}) default_ timer()
self. notify_ handler. notify( event, row, updates) default_ timer() - start})
from neutron_lib.plugins import constants
@@ -571,7 +572,10 @@ class OvnIdlDistribut
+ start = timeit.
+ LOG.info("KK: processing time for table %(table)s: %(seconds)s",
+ {"table": row._table.name, "seconds": timeit.
@abc. abstractmethod
def post_connect(self):
This is a snapshot from one of the servers:
KK: processing time for table Chassis_Private: 4.1424995288252 83e-05 47e-05 975e-05 784e-05 78e-05 734e-05 67e-05 9e-05 964e-05 09e-05 974e-05 67e-05 87e-05 5e-05 86e-05 293e-05 02e-05 92e-05 43e-05 13e-05 095e-05 7046947 517e-05 46e-05 038e-05 Switch_ Port: 1.9197002984583 378e-05 6988325 1e-05 197e-05 567e-05 18e-05 004e-05
KK: processing time for table Chassis_Private: 4.5691966079175
KK: processing time for table Chassis_Private: 5.8398989494889
KK: processing time for table Chassis_Private: 4.1155959479510
KK: processing time for table Chassis_Private: 6.4963009208440
KK: processing time for table Chassis_Private: 4.8479996621608
KK: processing time for table Chassis_Private: 5.8637990150600
KK: processing time for table Chassis_Private: 3.9900012779980
KK: processing time for table Chassis_Private: 6.0677994042634
KK: processing time for table Chassis_Private: 3.6291021388024
KK: processing time for table Chassis_Private: 4.8089015763252
KK: processing time for table Chassis_Private: 3.9099017158150
KK: processing time for table Chassis_Private: 8.1332982517778
KK: processing time for table Chassis_Private: 2.7840025722980
KK: processing time for table Chassis_Private: 5.8771052863448
KK: processing time for table Chassis_Private: 4.8115034587681
KK: processing time for table Chassis_Private: 5.6274002417922
KK: processing time for table Chassis_Private: 2.9246963094919
KK: processing time for table Chassis_Private: 8.8733970187604
KK: processing time for table Port_Binding: 5.0230009946972
KK: processing time for table Chassis_Private: 3.7371995858848
KK: processing time for table Chassis_Private: 0.0001378420274
KK: processing time for table Chassis_Private: 5.2426010370254
KK: processing time for table Port_Group: 2.1447020117193
KK: processing time for table Logical_Switch: 2.6123016141355
KK: processing time for table Logical_
KK: processing time for table Chassis_Private: 0.0001114009646
KK: processing time for table Chassis_Private: 6.1807979363948
KK: processing time for table Chassis_Private: 5.5163982324302
KK: processing time for table Chassis_Private: 2.8703012503683
KK: processing time for table Chassis_Private: 3.3526972401887
KK: processing time for table Chassis_Private: 4.7340989112854
So somehow hash ring related code running in api worker takes pretty long time sending notifications on table updates. Is this an async call, or does it wait for receiver to process notification? Even if that were the case AgentCache processing should not take that much time in general unless entire neutron server is heavily overloaded?