Fatal memory consumption by neutron-server with DVR at scale
Bug #1505575 reported by
Oleg Bondarev
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
Oleg Bondarev |
Bug Description
Steps to reproduce:
0. The issue is noticeable at scale (100+ nodes), DVR should be turned on
1. Run rally scenario NeutronNetworks
Initially neutron-server processes consume 100-150M, but at some point the size rapidly increases in several times. (At 200 nodes the raise was from 150M to 2G, and upto 14G in the end).
The issue may lead to OOM situation causing kernel to kill the process with highest consumption. Usually candidates are rabbit or mysql. This makes cluster completely inoperable.
summary: |
- Serious memory consumption by neutron-server with DVR at scale + Fatal memory consumption by neutron-server with DVR at scale |
Changed in neutron: | |
importance: | Undecided → High |
tags: |
added: kilo-backport-potential l3-dvr-backlog liberty-rc-potential loadimpact removed: scale |
tags: |
added: liberty-backport-potential removed: liberty-rc-potential |
Changed in neutron: | |
status: | Fix Committed → Fix Released |
tags: | removed: kilo-backport-potential liberty-backport-potential |
To post a comment you must log in.
The issue happens when one (or all) of l3 agents from controllers starts to resync. That may happen due to agent restart or some exception (like messaging exception). On resync agent requests full info for all routers scheduled to this agent - and this might be a really big amount of data if there are a lot of routers (like in create_ and_list_ routers rally scenario). This leads to 'Serious memory consumption by neutron-server'.
Usually server fails to complete request within 60 seconds which leads to timeout on agent side and agent sends yet another sync_routers() request. This leads to a loop until server consumes all available memory and cluster fails.
The idea of the fix is to request routers info by chunks of configured size.