Regiond memory leak when Prometheus metrics are enabled

Bug #1927941 reported by Victor Tapia
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Medium
Victor Tapia
2.6
Won't Fix
Undecided
Unassigned
2.7
Won't Fix
Undecided
Unassigned
2.8
Fix Committed
Undecided
Unassigned
2.9
Fix Released
Medium
Victor Tapia
3.0
New
Undecided
Unassigned

Bug Description

After enabling the prometheus metrics endpoint with:

maas admin maas set-config name=prometheus_enabled value=true

Every request allocates memory that is never released. The simplest reproducer is to run curl indefinitely in the background. For instance, run this script with nohup:

#!/bin/bash
while true; do
        curl http://localhost:5240/MAAS/metrics >/dev/null 2>&1
done

Leaving this script running for a while will show how the RSS for the regiond processes increases constantly, only to be released when regiond is restarted (it's easier to see and track when there's only one worker).

Using objgraph I could see that a dict type and MmapedValue (from prometheus-python-client) are the ones being allocated:

2021-04-29 15:45:27 regiond: [info] 127.0.0.1 GET /MAAS/metrics HTTP/1.1 --> 200 OK (referrer: -; agent: curl/7.68.0)
2021-04-29 15:45:33 stdout: [info] dict 224666
2021-04-29 15:45:33 stdout: [info] MmapedValue 193362
2021-04-29 15:45:33 stdout: [info] function 39308
2021-04-29 15:45:33 stdout: [info] tuple 32844
2021-04-29 15:45:53 regiond: [info] 127.0.0.1 GET /MAAS/metrics HTTP/1.1 --> 200 OK (referrer: -; agent: curl/7.68.0)
2021-04-29 15:45:59 stdout: [info] dict 224707
2021-04-29 15:45:59 stdout: [info] MmapedValue 193403
2021-04-29 15:45:59 stdout: [info] function 39308
2021-04-29 15:45:59 stdout: [info] tuple 32844
2021-04-29 15:46:45 regiond: [info] 127.0.0.1 GET /MAAS/metrics HTTP/1.1 --> 200 OK (referrer: -; agent: curl/7.68.0)
2021-04-29 15:46:50 stdout: [info] dict 224748
2021-04-29 15:46:50 stdout: [info] MmapedValue 193444
2021-04-29 15:46:50 stdout: [info] function 39308
2021-04-29 15:46:50 stdout: [info] tuple 32844

Related branches

Revision history for this message
Alberto Donato (ack) wrote :

Hi Victor, could you please provide the changes you made to have objgraph print out that log?

Revision history for this message
Victor Tapia (vtapia) wrote :

Hi Alberto, here's the diff (master):

diff --git i/src/maasserver/prometheus/stats.py w/src/maasserver/prometheus/stats.py
index 7ac3a352c..f01bab714 100644
--- i/src/maasserver/prometheus/stats.py
+++ w/src/maasserver/prometheus/stats.py
@@ -26,6 +26,7 @@ from provisioningserver.prometheus.utils import (
     MetricDefinition,
     PrometheusMetrics,
 )
+import objgraph

 log = LegacyLogger()

@@ -127,6 +128,10 @@ def prometheus_stats_handler(request):
         update_handlers=[update_prometheus_stats],
         registry=prom_cli.CollectorRegistry(),
     )
+
+ objgraph.show_most_common_types(limit=4)
+ objgraph.show_growth(limit=3)
+
     return HttpResponse(
         content=metrics.generate_latest(), content_type="text/plain"
     )

Alberto Donato (ack)
Changed in maas:
status: New → Triaged
importance: Undecided → Medium
Alberto Donato (ack)
Changed in maas:
assignee: nobody → Victor Tapia (vtapia)
status: Triaged → In Progress
Changed in maas:
milestone: none → next
status: In Progress → Fix Committed
Revision history for this message
Victor Tapia (vtapia) wrote :

I just confirmed that this affects all previous releases down to 2.6, I'll start preparing the backports

Changed in maas:
milestone: next → 3.0.1
Changed in maas:
milestone: 3.0.1 → 3.2.0-beta1
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.