2022-04-16 09:57:25 |
Xiaojun Lin |
bug |
|
|
added bug |
2022-04-16 09:58:39 |
Xiaojun Lin |
attachment added |
|
reference graph https://bugs.launchpad.net/neutron/+bug/1969270/+attachment/5581063/+files/backref.svg |
|
2022-04-16 10:04:34 |
Xiaojun Lin |
description |
neutron version: 15.0.2 (still presents in the latest release)
I've found a very interesting memory leak issue in neutron-dhcp-agent:
When dhcp-agent tries to sync network state, it makes an rpc call to neutron-server, if there's something wrong on neutron-server's side(database access failure, for example), an error will be returned to dhcp-agent and deserialized to an RemoteError object.
The RemoteError will be added to neutron.agent.dhcp.agent.DhcpAgent.needs_resync_reasons for periodic resync. The following code in methond neutron.agent.dhcp.agent.DhcpAgent._periodic_resync_helper handles network resync:
if self.needs_resync_reasons:
# be careful to avoid a race with additions to list
# from other threads
reasons = self.needs_resync_reasons
self.needs_resync_reasons = collections.defaultdict(list)
for net, r in reasons.items():
if not net:
net = "*"
LOG.debug("resync (%(network)s): %(reason)s",
{"reason": r, "network": net})
self.sync_state(reasons.keys())
There's a trap here: since "reasons" is a defaultdict object, "reasons.keys()" still holds a reference to "reasons", thus the self.sync_state method frame will hold an indirect reference to the previous RemoteError object.
When this self.sync_state is invoked, another RemoteError will be raised since neutron-server is still malfunctioning. The RemoteError object has a reference to sync_state frame which still holds a reference to the previous RemoteError. So the history RemoteError will never be garbage collected.
I've generated a reference graph using objgraph, which helps to understand the reference chain. Please see the attachment.
One proposed fix is to modify self.sync_state(reasons.keys()) to self.sync_state(list(reasons.keys())) in DhcpAgent._periodic_resync_helper
Another way is adding str(reason) to self.needs_resync_reasons instead of reason object itself, in DhcpAgent.schedule_resync
Both of them breaks the reference chain. |
neutron version: 15.0.2 (still presents in the latest release)
I've found a very interesting memory leak issue in neutron-dhcp-agent:
When dhcp-agent tries to sync network state, it makes an rpc call to neutron-server, if there's something wrong on neutron-server's side(database access failure, for example), an error will be returned to dhcp-agent and deserialized to an RemoteError object.
The RemoteError will be added to neutron.agent.dhcp.agent.DhcpAgent.needs_resync_reasons for periodic resync. The following code in methond neutron.agent.dhcp.agent.DhcpAgent._periodic_resync_helper handles network resync:
if self.needs_resync_reasons:
# be careful to avoid a race with additions to list
# from other threads
reasons = self.needs_resync_reasons
self.needs_resync_reasons = collections.defaultdict(list)
for net, r in reasons.items():
if not net:
net = "*"
LOG.debug("resync (%(network)s): %(reason)s",
{"reason": r, "network": net})
self.sync_state(reasons.keys())
There's a trap here: since "reasons" is a defaultdict object, "reasons.keys()" still holds a reference to "reasons", thus the self.sync_state method frame will hold an indirect reference to the previous RemoteError object.
When this self.sync_state is invoked, another RemoteError will be raised since neutron-server is still malfunctioning. The RemoteError object's tracebacks has a reference to sync_state frame which still holds a reference to the previous RemoteError. So the history RemoteError will never be garbage collected.
I've generated a reference graph using objgraph, which helps to understand the reference chain. Please see the attachment.
One proposed fix is to modify self.sync_state(reasons.keys()) to self.sync_state(list(reasons.keys())) in DhcpAgent._periodic_resync_helper
Another way is adding str(reason) to self.needs_resync_reasons instead of reason object itself, in DhcpAgent.schedule_resync
Both of them breaks the reference chain. |
|
2022-04-16 10:07:15 |
Xiaojun Lin |
description |
neutron version: 15.0.2 (still presents in the latest release)
I've found a very interesting memory leak issue in neutron-dhcp-agent:
When dhcp-agent tries to sync network state, it makes an rpc call to neutron-server, if there's something wrong on neutron-server's side(database access failure, for example), an error will be returned to dhcp-agent and deserialized to an RemoteError object.
The RemoteError will be added to neutron.agent.dhcp.agent.DhcpAgent.needs_resync_reasons for periodic resync. The following code in methond neutron.agent.dhcp.agent.DhcpAgent._periodic_resync_helper handles network resync:
if self.needs_resync_reasons:
# be careful to avoid a race with additions to list
# from other threads
reasons = self.needs_resync_reasons
self.needs_resync_reasons = collections.defaultdict(list)
for net, r in reasons.items():
if not net:
net = "*"
LOG.debug("resync (%(network)s): %(reason)s",
{"reason": r, "network": net})
self.sync_state(reasons.keys())
There's a trap here: since "reasons" is a defaultdict object, "reasons.keys()" still holds a reference to "reasons", thus the self.sync_state method frame will hold an indirect reference to the previous RemoteError object.
When this self.sync_state is invoked, another RemoteError will be raised since neutron-server is still malfunctioning. The RemoteError object's tracebacks has a reference to sync_state frame which still holds a reference to the previous RemoteError. So the history RemoteError will never be garbage collected.
I've generated a reference graph using objgraph, which helps to understand the reference chain. Please see the attachment.
One proposed fix is to modify self.sync_state(reasons.keys()) to self.sync_state(list(reasons.keys())) in DhcpAgent._periodic_resync_helper
Another way is adding str(reason) to self.needs_resync_reasons instead of reason object itself, in DhcpAgent.schedule_resync
Both of them breaks the reference chain. |
neutron version: 15.0.2 (still presents in the latest release)
I've found a very interesting memory leak issue in neutron-dhcp-agent:
When dhcp-agent tries to sync network state, it makes an rpc call to neutron-server, if there's something wrong on neutron-server's side(database access failure, for example), an error will be returned to dhcp-agent and deserialized to an RemoteError object.
The RemoteError will be added to neutron.agent.dhcp.agent.DhcpAgent.needs_resync_reasons for periodic resync. The following code in methond neutron.agent.dhcp.agent.DhcpAgent._periodic_resync_helper() handles network resync:
if self.needs_resync_reasons:
# be careful to avoid a race with additions to list
# from other threads
reasons = self.needs_resync_reasons
self.needs_resync_reasons = collections.defaultdict(list)
for net, r in reasons.items():
if not net:
net = "*"
LOG.debug("resync (%(network)s): %(reason)s",
{"reason": r, "network": net})
self.sync_state(reasons.keys())
There's a trap here: since "reasons" is a defaultdict object, "reasons.keys()" will hold a reference to "reasons", thus the self.sync_state method frame will hold an indirect reference to the previous RemoteError object.
When this self.sync_state is invoked, another RemoteError will be raised since neutron-server is still malfunctioning. The RemoteError object's tracebacks has a reference to sync_state frame which still holds a reference to the previous RemoteError. So the history RemoteError will never be garbage collected.
I've generated a reference graph using objgraph, which helps to understand the reference chain. Please see the attachment.
One proposed fix is to modify self.sync_state(reasons.keys()) to self.sync_state(list(reasons.keys())) in DhcpAgent._periodic_resync_helper()
Another way is adding str(reason) to self.needs_resync_reasons instead of reason object itself, in DhcpAgent.schedule_resync()
Both of them breaks the reference chain. |
|
2022-04-18 14:46:12 |
Jakub Libosvar |
neutron: importance |
Undecided |
Medium |
|
2022-04-19 11:30:13 |
OpenStack Infra |
neutron: status |
New |
In Progress |
|
2022-04-20 17:48:58 |
OpenStack Infra |
neutron: status |
In Progress |
Fix Released |
|
2022-04-26 20:37:05 |
OpenStack Infra |
tags |
|
in-stable-xena |
|
2022-04-26 20:37:11 |
OpenStack Infra |
tags |
in-stable-xena |
in-stable-wallaby in-stable-xena |
|
2022-04-26 20:37:17 |
OpenStack Infra |
tags |
in-stable-wallaby in-stable-xena |
in-stable-victoria in-stable-wallaby in-stable-xena |
|
2022-04-26 20:37:23 |
OpenStack Infra |
tags |
in-stable-victoria in-stable-wallaby in-stable-xena |
in-stable-ussuri in-stable-victoria in-stable-wallaby in-stable-xena |
|
2022-04-26 20:37:28 |
OpenStack Infra |
tags |
in-stable-ussuri in-stable-victoria in-stable-wallaby in-stable-xena |
in-stable-train in-stable-ussuri in-stable-victoria in-stable-wallaby in-stable-xena |
|
2022-04-27 09:49:40 |
OpenStack Infra |
tags |
in-stable-train in-stable-ussuri in-stable-victoria in-stable-wallaby in-stable-xena |
in-stable-train in-stable-ussuri in-stable-victoria in-stable-wallaby in-stable-xena in-stable-yoga |
|