2017-02-22 05:30:00 |
Bjoern |
description |
After debugging a the
MessagingTimeout: Timed out waiting for a reply to message ID
issue in Kilo I realized that we do not configure the rpc settings like rpc_response_timeout for the neutron agents, which indeed use few RPC settings like rpc_workers, rpc_response_timeout and possibly others.
After I used the same rpc_response_timeout as the neutron server, the L3 agent became operation.
Error:
2017-02-21 06:26:49.503 13484 ERROR neutron.agent.l3.agent [req-d37cf492-e1cc-49ef-b729-d0f7055e238c ] Failed synchronizing routers due to RPC error
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent Traceback (most recent call last):
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 523, in fetch_and_sync_all_routers
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent routers = self.plugin_rpc.get_routers(context)
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 92, in get_routers
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent router_ids=router_ids)
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 156, in call
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent retry=self.retry)
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent timeout=timeout, retry=retry)
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 350, in send
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent retry=retry)
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 339, in _send
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent result = self._waiter.wait(msg_id, timeout)
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 243, in wait
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent message = self.waiters.get(msg_id, timeout=timeout)
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 149, in get
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent 'to message ID %s' % msg_id)
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent MessagingTimeout: Timed out waiting for a reply to message ID 86808c8fcfc9443c84a2b0fd6e6f1710
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent
It is not clear why we fixed this in master but not, even partially, back port it into the active branches. Considering the amount if time it took to troubleshoot this issue.
I will go ahead and submit a fix for Mitaka since Newton and newer is already corrected. |
After debugging a the
MessagingTimeout: Timed out waiting for a reply to message ID
issue in Kilo I realized that we do not configure the rpc settings like rpc_response_timeout for the neutron agents, which indeed use few RPC settings like rpc_workers, rpc_response_timeout and possibly others.
After I used the same rpc_response_timeout as the neutron server, the L3 agent became operational again.
Error:
2017-02-21 06:26:49.503 13484 ERROR neutron.agent.l3.agent [req-d37cf492-e1cc-49ef-b729-d0f7055e238c ] Failed synchronizing routers due to RPC error
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent Traceback (most recent call last):
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 523, in fetch_and_sync_all_routers
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent routers = self.plugin_rpc.get_routers(context)
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 92, in get_routers
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent router_ids=router_ids)
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 156, in call
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent retry=self.retry)
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent timeout=timeout, retry=retry)
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 350, in send
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent retry=retry)
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 339, in _send
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent result = self._waiter.wait(msg_id, timeout)
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 243, in wait
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent message = self.waiters.get(msg_id, timeout=timeout)
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 149, in get
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent 'to message ID %s' % msg_id)
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent MessagingTimeout: Timed out waiting for a reply to message ID 86808c8fcfc9443c84a2b0fd6e6f1710
2017-02-21 06:26:49.503 13484 TRACE neutron.agent.l3.agent
It is not clear why we fixed this in master but not, even partially, back port it into the active branches. Considering the amount if time it took to troubleshoot this issue.
I will go ahead and submit a fix for Mitaka since Newton and newer is already corrected. |
|