[Backport 1516260] L3 agent sync_routers timeouts may cause cluster to fall down
Bug #1536954 reported by
Oleg Bondarev
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mirantis OpenStack |
Fix Released
|
High
|
Oleg Bondarev |
Bug Description
Upstream bug: https:/
L3 agent 'sync_routers' RPC call is sent when the agent starts or when an exception occurs. It uses a default timeout of 60 seconds (An Oslo messaging config option). At scale the server can take a long time to answer, causing a timeout and the message is sent again, causing a cascading failure and the situation does not resolve itself. The sync_routers server RPC response was optimized to mitigate this, it could also be helpful to simply increase the timeout.
Fix proposed to branch: openstack- ci/fuel- 8.0/liberty /review. fuel-infra. org/16367
Change author: Oleg Bondarev <email address hidden>
Review: https:/