RPCMessage timeouts when ovs agent is reporting status about many ports

Bug #1885758 reported by Slawek Kaplonski
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Confirmed
Medium
Rodolfo Alonso

Bug Description

In case when neutron-ovs-agent is handling many ports, like e.g. 1000 and configures all of them in one rpc loop iteration, it will send update_devices_status rpc message to the neutron-server and it may timeout as neutron server will process ports one by one.
Such timeout will force to do full sync on agent's side so the problem will repeat.

Error on agent side is like:

2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Error while processing VIF ports: MessagingTimeout: Timed out waiting for a reply to message ID 894728e7bda24382931b5af26a367f16
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most recent call last):
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2339, in rpc_loop
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent port_info, provisioning_needed)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 158, in wrapper
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent result = f(*args, **kwargs)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1851, in process_network_ports
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent self.treat_devices_skipped(skipped_devices)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 158, in wrapper
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent result = f(*args, **kwargs)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1783, in treat_devices_skipped
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent self.conf.host)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/neutron/agent/rpc.py", line 171, in update_device_list
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent agent_restarted=agent_restarted)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 185, in call
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent time.sleep(wait)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent self.force_reraise()
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent six.reraise(self.type_, self.value, self.tb)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/neutron/common/rpc.py", line 162, in call
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent return self._original_context.call(ctxt, method, **kwargs)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 174, in call
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent retry=self.retry)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 131, in _send
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent timeout=timeout, retry=retry)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 625, in send
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent retry=retry)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 614, in _send
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent result = self._waiter.wait(msg_id, timeout)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 506, in wait
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent message = self.waiters.get(msg_id, timeout=timeout)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 394, in get
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 'to message ID %s' % msg_id)
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent MessagingTimeout: Timed out waiting for a reply to message ID 894728e7bda24382931b5af26a367f16
2020-06-30 16:13:49.164 148737 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
2020-06-30 16:13:49.176 148737 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent rpc_loop - iteration:0 completed. Processed ports statistics: {'regular': {'updated': 0, 'added': 1250, 'removed': 0}}. Elapsed:2169.184
2020-06-30 16:13:49.177 148737 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Loop iteration exceeded interval (2 vs. 2169.18406701)! loop_count_and_wait /usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutr
on_agent.py:2063
2020-06-30 16:13:49.178 148737 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent rpc_loop - iteration:1 started
2020-06-30 16:13:49.182 148737 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] ofctl request version=0x4,msg_type=0x12,msg_len=0x38,xid=0x27b6f57f,OFPFlowStatsRequest(cookie=0,cookie_mask=0,flags=0,match=OFPMatch(oxm_fields={}),out_grou
p=4294967295,out_port=4294967295,table_id=23,type=1) result [OFPFlowStatsReply(body=[OFPFlowStats(byte_count=0,cookie=12410424124845347866L,duration_nsec=444000000,duration_sec=2186,flags=0,hard_timeout=0,idle_timeout=0,instructions=[],length=56,match=OFPMatch(oxm_fields={}),packet_count=0,priority=0,table_id=23)],fl
ags=0,type=1)] _send_msg /usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py:98
2020-06-30 16:13:49.191 148737 DEBUG neutron.agent.linux.utils [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Found cmdline ['ovsdb-client', 'monitor', 'tcp:127.0.0.1:6640', 'Interface', 'name,ofport,external_ids', '--format=json'] for rocess with PID 149081. get_cmdline_from_pid /usr/lib/python2.7/site-package
s/neutron/agent/linux/utils.py:329
2020-06-30 16:13:49.224 148737 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent rpc_loop - iteration:1 - starting polling. Elapsed:0.046
2020-06-30 16:13:49.226 148737 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-6593a4c5-3196-4d81-939c-74d8e4c49adc - - - - -] Agent out of sync with plugin!

We should probably do here something similar what is done in e.g. DHCP agent and we should report statuses about ports in batches.

Tags: ovs
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.