neutron-metadata-agent the memory usage is increasing

Bug #1987377 reported by liujinxin
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Expired
Undecided
Unassigned

Bug Description

env:
branch: stable/victoria

The memory footprint becomes smaller after restarting the metadata-agent, but as it runs longer, the memory footprint becomes larger and larger until it is killed by oom

kubectl top pod neutron-metadata-agent-default-6nz79 -nopenstack
NAME CPU(cores) MEMORY(bytes)
neutron-metadata-agent-default-6nz79 4m 7121Mi

kubectl top pod -nopenstack neutron-metadata-agent-default-7znzp
NAME CPU(cores) MEMORY(bytes)
neutron-metadata-agent-default-7znzp 3m 24321Mi

Tasks: 12 total, 1 running, 11 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.5 us, 1.3 sy, 0.0 ni, 94.2 id, 0.0 wa, 0.0 hi, 0.3 si, 0.7 st
KiB Mem : 32885820 total, 3087452 free, 28965316 used, 833052 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 3446688 avail Mem

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
      1 neutron 20 0 1020 4 0 S 0.0 0.0 0:00.07 pause
 314636 neutron 20 0 193348 83160 4328 S 0.0 0.3 14:47.42 neutron-metadat
 314649 neutron 20 0 3246420 2.988g 972 S 0.0 9.5 8:37.78 neutron-metadat
 314650 neutron 20 0 3225648 2.970g 3184 S 0.0 9.5 8:36.11 neutron-metadat
 314651 neutron 20 0 3228576 2.970g 0 S 0.0 9.5 8:37.24 neutron-metadat
 314652 neutron 20 0 3223508 2.966g 1316 S 0.0 9.5 8:35.71 neutron-metadat
 314653 neutron 20 0 3216512 2.959g 844 S 0.0 9.4 8:37.38 neutron-metadat
 314654 neutron 20 0 3265104 3.006g 976 S 0.0 9.6 8:40.20 neutron-metadat
 314655 neutron 20 0 3180172 2.924g 280 S 0.0 9.3 8:33.43 neutron-metadat
 377345 neutron 20 0 193348 83388 4556 S 0.0 0.3 0:00.01 neutron-metadat

liujinxin (scilla)
description: updated
Revision history for this message
Lajos Katona (lajos-katona) wrote :

Isn't this one is related to https://bugs.launchpad.net/neutron/+bug/1987060 ? that one also mentiones high load, but I am not sure if that is related to metadata agent or some other processes that consume resources on the host.

Revision history for this message
Oleg Bondarev (obondarev) wrote :

MetadataProxyHandler has a cache for better performance - it uses oslo_cache for this purpose. This cache may grow over time, but default expiration time is 600 seconds [1]. Other than that the agent is quite simple and I can hardly think what else may cause a memory leak.

[1] https://github.com/openstack/oslo.cache/blob/master/oslo_cache/_opts.py#L27

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Is memory used by haproxy services included in that footprint? If so, it may growth over time as You will have more networks and/or routers on the host as for each of them there is new haproxy service started.

Changed in neutron:
status: New → Incomplete
Revision history for this message
liujinxin (scilla) wrote :

I found that the memory growth may be related to the operation of the health probe.

Readiness: exec [python /tmp/health-probe.py --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/metadata_agent.ini]

Steps to Reproduce:

kubectl top pods -nopenstack neutron-metadata-agent-default-b97p5
NAME CPU(cores) MEMORY(bytes)
neutron-metadata-agent-default-b97p5 40m 301Mi

kubectl exec -it -nopenstack neutron-metadata-agent-default-b97p5 bash -c 'for ((i=0;i<5000;i++)) do python /tmp/health-probe.py --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/metadata_agent.ini; done'

1 hour later

kubectl top pods -nopenstack neutron-metadata-agent-default-b97p5
NAME CPU(cores) MEMORY(bytes)
neutron-metadata-agent-default-b97p5 37m 5808Mi

Revision history for this message
Oleg Bondarev (obondarev) wrote :

Sorry, can you please point to where this health probe is defined?

Revision history for this message
Brian Haley (brian-haley) wrote :

This is from openstack-helm, template defining the code is at https://opendev.org/openstack/openstack-helm/src/branch/master/neutron/templates/bin/_health-probe.py.tpl

I would think you could disable the probe(s) and see if it helps?

Revision history for this message
liujinxin (scilla) wrote :

Yes, it's true that turning off health probe will no longer result in significant memory overflows, but shouldn't the memory overflows be addressed at the root of the problem?

Revision history for this message
Brian Haley (brian-haley) wrote :

Hi Liu, the first step was just to identify what was causing the problem. So this is just a periodic metadata request being done, correct? We'll have to trace what the agent is doing when the request arrives, etc.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.