neutron

Bug #1606827
Comment #5

Comment 5 for bug 1606827

Revision history for this message

Kevin Benton (kevinbenton) wrote on 2016-07-29:

I don't think this actually changed anything and we should revert it.

The part that exponentially increases is the amount of time it will wait for a response from the server, not the amount of time it backs off.

Consider this (verified behavior on devstack):

Server goes offline
Agent report state (start timer for 60 seconds)
Agent report state timeout exception
Agent sleeps random(0, rpc.TRANSPORT.conf.rpc_response_timeout)[1]
Agent report state (start timer for 120 seconds)
Agent report state timeout exception
Agent sleeps random(0, rpc.TRANSPORT.conf.rpc_response_timeout)
Agent report state (start timer for 240 seconds)
Server resumes after 150 seconds
Server processes messages in 'reports' queue.
Agent gets report state response.

There is no point in changing the exponential timeout increase because the server will process the report state as long as its in that timeout window.

The maximum time in the worst possible case an agent will not have a report_state in the queue is the rpc.TRANSPORT.conf.rpc_response_timeout.

1. https://github.com/openstack/neutron/blob/9f4f6c8db27f4838a11b4a271e96c372f01118dd/neutron/common/rpc.py#L141