oslo.messaging has possible memory leak

Bug #1475307 reported by gordon chung
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceilometer
Invalid
Critical
Unassigned
oslo.messaging
Fix Released
Critical
Joshua Harlow

Bug Description

running devstack for a few hours with short polling intervals causes collector memory to spike. my current setup is:
- core services (minus neutron)
- standard pipeline (polling every 10s)
- 8VM

top (pollsters-no-transform):
15618 gchung 20 0 2511140 1.879g 3460 R 47.6 6.0 17:59.85 /usr/bin/python /usr/bin/ceilometer-agent-notification --config-file /etc/ceilometer/ceilometer.co+
 2183 gchung 20 0 7361488 5.436g 3372 R 42.9 17.3 36:21.30 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
19797 gchung 20 0 1891324 1.539g 3460 R 33.3 4.9 13:37.85 /usr/bin/python /usr/bin/ceilometer-agent-notification --config-file /etc/ceilometer/ceilometer.co+
 8678 gchung 20 0 5936620 4.655g 3628 R 28.6 14.9 28:54.98 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
23751 gchung 20 0 1257168 997.9m 3460 R 28.6 3.1 8:14.34 /usr/bin/python /usr/bin/ceilometer-agent-notification --config-file /etc/ceilometer/ceilometer.co+
27205 gchung 20 0 834808 673932 5500 R 28.6 2.1 2:58.35 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
30884 gchung 20 0 8568116 5.980g 3560 R 28.6 19.1 42:27.71 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
21575 gchung 20 0 350304 57148 6340 S 4.8 0.2 51:14.66 /usr/bin/python /usr/bin/ceilometer-agent-compute --config-file /etc/ceilometer/ceilometer.conf
27101 gchung 20 0 9186532 5.050g 3308 R 29.8 16.1 77:33.10 /usr/bin/python /usr/bin/ceilometer-agent-notification --config-file /etc/ceilometer/ceilometer.co+
23751 gchung 20 0 1257788 998.2m 3460 S 27.2 3.1 8:15.16 /usr/bin/python /usr/bin/ceilometer-agent-notification --config-file /etc/ceilometer/ceilometer.co+
27205 gchung 20 0 837064 676044 5500 S 26.5 2.1 2:59.15 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
 2183 gchung 20 0 7363452 5.437g 3372 R 24.8 17.3 36:22.05 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
 8678 gchung 20 0 5938908 4.657g 3628 S 24.2 14.9 28:55.71 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
19797 gchung 20 0 1892548 1.540g 3460 S 21.9 4.9 13:38.51 /usr/bin/python /usr/bin/ceilometer-agent-notification --config-file /etc/ceilometer/ceilometer.co+
30884 gchung 20 0 8570356 5.983g 3560 R 21.5 19.1 42:28.36 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
21575 gchung 20 0 350304 57148 6340 S 9.3 0.2 51:14.94 /usr/bin/python /usr/bin/ceilometer-agent-compute --config-file /etc/ceilometer/ceilometer.conf
15618 gchung 20 0 2511140 1.879g 3460 S 7.0 6.0 18:00.06 /usr/bin/python /usr/bin/ceilometer-agent-notification --config-file /etc/ceilometer/ceilometer.co+
27101 gchung 20 0 9186664 5.050g 3308 S 10.9 16.1 77:33.43 /usr/bin/python /usr/bin/ceilometer-agent-notification --config-file /etc/ceilometer/ceilometer.co+
30884 gchung 20 0 8571212 5.983g 3560 S 9.3 19.1 42:28.64 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
 2183 gchung 20 0 7364148 5.438g 3372 S 8.9 17.3 36:22.32 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
27205 gchung 20 0 837768 676836 5500 S 7.0 2.1 2:59.36 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
 8678 gchung 20 0 5939484 4.658g 3628 S 6.3 14.9 28:55.90 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
19797 gchung 20 0 1892548 1.540g 3460 R 3.3 4.9 13:38.61 /usr/bin/python /usr/bin/ceilometer-agent-notification --config-file /etc/ceilometer/ceilometer.co+
21557 gchung 20 0 223460 63244 3812 S 3.0 0.2 14:39.41 /usr/bin/python /usr/bin/ceilometer-agent-central --config-file /etc/ceilometer/ceilometer.conf

Revision history for this message
gordon chung (chungg) wrote :
Revision history for this message
gordon chung (chungg) wrote :

rough starting memory usage (usually lower):

29566 gchung 20 0 348416 76704 19148 R 81.2 0.2 0:31.13 /usr/bin/python /usr/bin/ceilometer-agent-compute --config-file /etc/ceilometer/ceilometer.conf
29488 gchung 20 0 279712 121848 5672 R 56.2 0.4 0:15.92 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
29489 gchung 20 0 276792 118892 5660 S 56.2 0.4 0:15.16 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
29487 gchung 20 0 275020 116892 5672 R 50.0 0.4 0:14.73 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
29490 gchung 20 0 280984 122976 5672 R 50.0 0.4 0:16.12 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
29472 gchung 20 0 198544 56408 8984 S 6.2 0.2 0:02.65 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
29510 gchung 20 0 269988 118976 6372 S 6.2 0.4 0:34.27 /usr/bin/python /usr/bin/ceilometer-agent-notification --config-file /etc/ceilometer/ceilometer.co+
29566 gchung 20 0 348416 76704 19148 S 63.2 0.2 0:33.04 /usr/bin/python /usr/bin/ceilometer-agent-compute --config-file /etc/ceilometer/ceilometer.conf
29487 gchung 20 0 276308 118476 5672 S 19.5 0.4 0:15.32 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
29490 gchung 20 0 282944 124824 5672 S 19.5 0.4 0:16.71 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
29488 gchung 20 0 280988 123168 5672 S 18.9 0.4 0:16.49 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
29489 gchung 20 0 278268 120212 5660 S 16.6 0.4 0:15.66 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
29510 gchung 20 0 269988 118976 6372 S 5.6 0.4 0:34.44 /usr/bin/python /usr/bin/ceilometer-agent-notification --config-file /etc/ceilometer/ceilometer.co+
29510 gchung 20 0 269988 118976 6372 S 6.6 0.4 0:34.64 /usr/bin/python /usr/bin/ceilometer-agent-notification --config-file /etc/ceilometer/ceilometer.co+
29516 gchung 20 0 212908 64660 9092 S 5.3 0.2 0:04.33 /usr/bin/python /usr/bin/ceilometer-agent-central --config-file /etc/ceilometer/ceilometer.conf
29489 gchung 20 0 278268 120212 5660 S 3.0 0.4 0:15.75 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
29490 gchung 20 0 282944 124824 5672 S 2.3 0.4 0:16.78 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf
29488 gchung 20 0 280988 123168 5672 S 1.0 0.4 0:16.52 /usr/bin/python /usr/bin/ceilometer-collector --config-file /etc/ceilometer/ceilometer.conf

Revision history for this message
gordon chung (chungg) wrote :

i'm adding oslo.messaging. it seems definitely related to 1.17.x release. i downgraded to 1.16.0 and memory seems to be holding steady.

i am up 3MB over an hour with same 10s polling, 8VM setup. described above.

gordon chung (chungg)
summary: - collector has possible memory leak
+ oslo.messaging has possible memory leak
Revision history for this message
gordon chung (chungg) wrote :

i think it's this... it must be this... i hope it's this... how can it not be this... it never stops... it should stop right... it just grows... how can it just grow...

https://github.com/openstack/oslo.messaging/blob/master/oslo_messaging/_executors/impl_pooledexecutor.py#L87

basically i've noticed self._incomplete does nothing but grow. if i run len(self._incomplete) it gets to tens of thousands really really easily... and it never seems to die... in the collector another instance -- i have single workers -- come to life and that started to grow like crazy.

i'll try with 1.16.0 tag

Changed in oslo.messaging:
assignee: nobody → Joshua Harlow (harlowja)
status: New → In Progress
gordon chung (chungg)
Changed in ceilometer:
status: New → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/202202
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=02a3a398145c4923915f7eaa5e601d6f970df403
Submitter: Jenkins
Branch: master

commit 02a3a398145c4923915f7eaa5e601d6f970df403
Author: Joshua Harlow <email address hidden>
Date: Wed Jul 15 10:26:06 2015 -0700

    Ensure callback variable capture + cleanup is done correctly

    It appears the the callback variable that was being called on
    future done was not the right one, due to the lambda capture
    mechanism referring to a lazy variable which would potentially
    be a different callback by the time the future would finish so
    make sure we capture the right one and ensure the future has access
    to it.

    This adds a helper method that submission will go through
    to ensure that the callback variable is correctly captured
    in the created lambda and also ensures that the incomplete futures
    list is cleaned up (when the future is done).

    The notification listener tests use now eventlet to show up this
    issue that doesn't occur with blocking executor.

    Closes-bug: #1474943

    Closes-bug: #1475307

    Change-Id: I23e393d504662532572b5b344b87387be6d7bcb1

Changed in oslo.messaging:
status: In Progress → Fix Committed
Mehdi Abaakouk (sileht)
Changed in oslo.messaging:
importance: Undecided → High
importance: High → Critical
Changed in oslo.messaging:
milestone: none → 2.0.0
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.