Mirantis OpenStack

Bug #1394576
Comment #9

Comment 9 for bug 1394576

Revision history for this message

Denis M. (dmakogon) wrote on 2015-02-10:

It appears that most of this faults are caused by Neutron because Neutron doens't use any where RPC call timeouts to wait reply from other remote Neutron service.

Take a look at Neutron L3 agent PRC API https://github.com/openstack/neutron/blob/stable/juno/neutron/agent/l3_agent.py
You will see that Neutron doesn't uses timeout kwarg anywhere.

Investigating oslo.messaging workflow it appears that this problem can be addressed but only partially due to Neutron's RPC API timepouts.

Recomendation:

This should be pasted into neutron.conf at any controller https://gist.github.com/denismakogon/839105ca2487df9b837d

To be short, to each nuetron.conf:
        rabbit_retry_interval=12
        rabbit_max_retries=5
        kombu_reconnect_delay=20

From Neutron team, please consider fixing RPC API to allow deployers to configure timeouts during RPC API calls.

Next comment will contain diagnostic snapshot and rabbitqm cluster status report during smoke tests execution.