neutron-dhcp-agent memory leak on network sync failure
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Medium
|
Unassigned |
Bug Description
neutron version: 15.0.2 (still presents in the latest release)
I've found a very interesting memory leak issue in neutron-dhcp-agent:
When dhcp-agent tries to sync network state, it makes an rpc call to neutron-server, if there's something wrong on neutron-server's side(database access failure, for example), an error will be returned to dhcp-agent and deserialized to an RemoteError object.
The RemoteError will be added to neutron.
if self.needs_
# be careful to avoid a race with additions to list
# from other threads
for net, r in reasons.items():
There's a trap here: since "reasons" is a defaultdict object, "reasons.keys()" will hold a reference to "reasons", thus the self.sync_state method frame will hold an indirect reference to the previous RemoteError object.
When this self.sync_state is invoked, another RemoteError will be raised since neutron-server is still malfunctioning. The RemoteError object's tracebacks has a reference to sync_state frame which still holds a reference to the previous RemoteError. So the history RemoteError will never be garbage collected.
I've generated a reference graph using objgraph, which helps to understand the reference chain. Please see the attachment.
One proposed fix is to modify self.sync_
Another way is adding str(reason) to self.needs_
Both of them breaks the reference chain.
description: | updated |
description: | updated |
Changed in neutron: | |
importance: | Undecided → Medium |
Hello Xiaojun:
That was an impressive analysis and, IMO, correct.
I think we can implement both measures: resync_ reasons" dict. There is no need to waist space storing the whole exception instance. resync_ helper" we are creating a new "needs_ resync_ reasons" dictionary and using the older one to pass the network IDs. Back in py27 that wasn't a problem but in py36 ".keys()" is a generator that implies passing a copy of the dictionary. In any case, it is safer just to pass a new generated list.
- Store a string in the "needs_
- In "_periodic_
Thanks a lot for this detailed bug description.