At some point during some rally test, we saw this exception in ovs agent logs:
2017-11-07 13:35:51.428 597682 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-62f85bb3-db4c-4485-b35c-b7c1cafb3970 3d527bdd3ede4c6a97f91b701393b8e3 5f753e92a5d740fc97252bd39f868561 - - -] port_delete message processed for port 3e8348d0-40e1-4146-b803-1e6c6eddba53 port_delete /usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:430
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp [req-141ecd16-22d7-4b1c-aa91-25d5077414f5 - - - - -] Agent main thread died of an exception: TypeError: int() can't convert non-string with explicit base
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp Traceback (most recent call last):
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_ryuapp.py", line 40, in agent_main_wrapper
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp ovs_agent.main(bridge_classes)
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2205, in main
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp agent.daemon_loop()
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp return f(*args, **kwargs)
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2120, in daemon_loop
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp self.rpc_loop(polling_manager=pm)
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp return f(*args, **kwargs)
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1985, in rpc_loop
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp ovs_status = self.check_ovs_status()
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 153, in wrapper
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp return f(*args, **kwargs)
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1787, in check_ovs_status
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp status = self.int_br.check_canary_table()
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/br_int.py", line 52, in check_canary_table
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp flows = self.dump_flows(constants.CANARY_TABLE)
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py", line 141, in dump_flows
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp (dp, ofp, ofpp) = self._get_dp()
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_bridge.py", line 68, in _get_dp
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp self._cached_dpid = int(new_dpid_str, 16)
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp TypeError: int() can't convert non-string with explicit base
2017-11-07 13:35:51.439 597682 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp
2017-11-07 13:35:54.861 597682 WARNING ovsdbapp.backend.ovs_idl.vlog [-] tcp:127.0.0.1:6640: receive error: Connection reset by peer: RuntimeError: OVS transaction timed out
This makes the agent crash and when restarted, perform a full sync which slows things down a lot.
Looks like the error above has to do with a previous timeout with OVS:
2017-11-07 13:35:51.377 597682 ERROR ovsdbapp. backend. ovs_idl. command TimeoutException: Commands [<ovsdbapp. schema. open_vswitch. commands. DbGetCommand object at 0x11fc6890>] exceeded timeout 10 seconds backend. ovs_idl. command
2017-11-07 13:35:51.377 597682 ERROR ovsdbapp.
2017-11-07 13:35:51.378 597682 INFO neutron. plugins. ml2.drivers. openvswitch. agent.openflow. native. ovs_bridge [req-141ecd16- 22d7-4b1c- aa91-25d5077414 f5 - - - - -] Bridge br-int changed its datapath-ID from dae36ebcec4d to None
2017-11-07 13:35:38.520 597682 ERROR neutron. plugins. ml2.drivers. openvswitch. agent.openflow. native. ofswitch [req-141ecd16- 22d7-4b1c- aa91-25d5077414 f5 - - - - -] Switch connection timeout: TimeoutException: Commands [<ovsdbapp. schema. open_vswitch. commands. ListPortsComman d object at 0xa935750>] exceeded timeout 10 seconds
2017-11-07 13:35:51.330 597682 DEBUG ovsdbapp. backend. ovs_idl. transaction [-] Running txn command(idx=0): DbGetCommand( column= datapath_ id, table=Bridge, record=br-int) do_commit /usr/lib/ python2. 7/site- packages/ ovsdbapp/ backend/ ovs_idl/ transaction. py:84 backend. ovs_idl. transaction [-] Transaction caused no change do_commit /usr/lib/ python2. 7/site- packages/ ovsdbapp/ backend/ ovs_idl/ transaction. py:110
2017-11-07 13:35:51.331 597682 DEBUG ovsdbapp.
2017-11-07 13:35:51.419 597682 ERROR neutron. agent.linux. async_process [-] Error received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport, external_ ids --format=json]: None agent.linux. async_process [-] Process [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport, external_ ids --format=json] dies due to the error: None
2017-11-07 13:35:51.419 597682 ERROR neutron.