It appears that most of this faults are caused by Neutron because Neutron doens't use any where RPC call timeouts to wait reply from other remote Neutron service.
Take a look at Neutron L3 agent PRC API https://github.com/openstack/neutron/blob/stable/juno/neutron/agent/l3_agent.py You will see that Neutron doesn't uses timeout kwarg anywhere.
Investigating oslo.messaging workflow it appears that this problem can be addressed but only partially due to Neutron's RPC API timepouts.
Recomendation:
This should be pasted into neutron.conf at any controller https://gist.github.com/denismakogon/839105ca2487df9b837d
To be short, to each nuetron.conf: rabbit_retry_interval=12 rabbit_max_retries=5 kombu_reconnect_delay=20
From Neutron team, please consider fixing RPC API to allow deployers to configure timeouts during RPC API calls.
Next comment will contain diagnostic snapshot and rabbitqm cluster status report during smoke tests execution.
It appears that most of this faults are caused by Neutron because Neutron doens't use any where RPC call timeouts to wait reply from other remote Neutron service.
Take a look at Neutron L3 agent PRC API https:/ /github. com/openstack/ neutron/ blob/stable/ juno/neutron/ agent/l3_ agent.py
You will see that Neutron doesn't uses timeout kwarg anywhere.
Investigating oslo.messaging workflow it appears that this problem can be addressed but only partially due to Neutron's RPC API timepouts.
Recomendation:
This should be pasted into neutron.conf at any controller https:/ /gist.github. com/denismakogo n/839105ca2487d f9b837d
To be short, to each nuetron.conf:
rabbit_ retry_interval= 12
rabbit_ max_retries= 5
kombu_ reconnect_ delay=20
From Neutron team, please consider fixing RPC API to allow deployers to configure timeouts during RPC API calls.
Next comment will contain diagnostic snapshot and rabbitqm cluster status report during smoke tests execution.