L3 agent sync_routers timeouts may cause cluster to fall down
Bug #1516260 reported by
Assaf Muller
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
Oleg Bondarev |
Bug Description
L3 agent 'sync_routers' RPC call is sent when the agent starts or when an exception occurs. It uses a default timeout of 60 seconds (An Oslo messaging config option). At scale the server can take a long time to answer, causing a timeout and the message is sent again, causing a cascading failure and the situation does not resolve itself. The sync_routers server RPC response was optimized to mitigate this, it could also be helpful to simply increase the timeout.
Changed in neutron: | |
status: | New → In Progress |
Changed in neutron: | |
assignee: | Assaf Muller (amuller) → Oleg Bondarev (obondarev) |
tags: | added: liberty-backport-potential |
tags: | removed: liberty-backport-potential |
To post a comment you must log in.
Duplicate of https:/ /bugs.launchpad .net/neutron/ +bug/1505575 ?