Comment 9 for bug 1394576

Revision history for this message
Denis M. (dmakogon) wrote :

It appears that most of this faults are caused by Neutron because Neutron doens't use any where RPC call timeouts to wait reply from other remote Neutron service.

Take a look at Neutron L3 agent PRC API https://github.com/openstack/neutron/blob/stable/juno/neutron/agent/l3_agent.py
You will see that Neutron doesn't uses timeout kwarg anywhere.

Investigating oslo.messaging workflow it appears that this problem can be addressed but only partially due to Neutron's RPC API timepouts.

Recomendation:

This should be pasted into neutron.conf at any controller https://gist.github.com/denismakogon/839105ca2487df9b837d

To be short, to each nuetron.conf:
        rabbit_retry_interval=12
        rabbit_max_retries=5
        kombu_reconnect_delay=20

From Neutron team, please consider fixing RPC API to allow deployers to configure timeouts during RPC API calls.

Next comment will contain diagnostic snapshot and rabbitqm cluster status report during smoke tests execution.