Dashboard memory leaks

Bug #1793411 reported by Xingchao Yu
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Dashboard (Horizon)
Fix Released
Low
Unassigned

Bug Description

1.Issue description

Recently, we found the server which hosts horizon dashboard had serveral times OOM caused by horizon services. After restarting the dashboard, the memory usage goes up very quickly if we access /project/network_topology/ path.

2.How to reproduce

Login into the dashboard and go to 'Network Topology' tab, then leave it there (autorefresh 10s by default), now monitor the memory changes on the host.

3.Versions and Components

Dashboard: Stable/Pike
Server: uWSGI 1.9.17-1
OS: Ubuntu 14.04 trusty
Python: 2.7.6

As the codes of memoized has little changes since Pike, if you use Queen/Rocky release, you may also succeed to reproduce it.

4.The investigation

The root cause of the memory leak is the decorator memorized(horizon/utils/memoized.py) which is used to cache function calls in Horizon.

After disable it, the memory increases has been controlled.

The following is the comparison of memory change(with guppy) for each request of /project/network_topology:

 - original (no code change) 684kb

 - do garbage collection manually 185kb

 - disable memorize cache 10kb

As we known, memoized uses weakref to cache objects. A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, garbage collection is free to destroy the referent and reuse its memory for something else.

In the memory, we could see lots of weakref stuffs, the following is a example:

Partition of a set of 394 objects. Total size = 37824 bytes.
 Index Count % Size % Cumulative % Kind (class / dict of class)
     0 197 50 18912 50 18912 50 _cffi_backend.CDataGCP
     1 197 50 18912 50 37824 100 weakref.KeyedRefq

But the rest of them are not. the following result is the memory objects changes of per /project/network_topology access with garbage collection manually.

Partition of a set of 1017 objects. Total size = 183680 bytes.
 Index Count % Size % Cumulative % Referrers by Kind (class / dict of class)
     0 419 41 58320 32 58320 32 dict (no owner)
     1 100 10 23416 13 81736 44 list
     2 135 13 15184 8 96920 53 <Nothing>
     3 2 0 6704 4 103624 56 urllib3.connection.VerifiedHTTPSConnection
     4 2 0 6704 4 110328 60 urllib3.connectionpool.HTTPSConnectionPool
     5 1 0 3352 2 113680 62 novaclient.v2.client.Client
     6 2 0 2096 1 115776 63 OpenSSL.SSL.Connection
     7 2 0 2096 1 117872 64 OpenSSL.SSL.Context
     8 2 0 2096 1 119968 65 Queue.LifoQueue
     9 12 1 2096 1 122064 66 dict of urllib3.connectionpool.HTTPSConnectionPool

The most of them are dicts. Followings are the dicts sorted by class, as you can see most of them are not weakref objects:

Partition of a set of 419 objects. Total size = 58320 bytes.
 Index Count % Size % Cumulative % Class
     0 362 86 50712 87 50712 87 unicode
     1 27 6 3736 6 54448 93 list
     2 5 1 2168 4 56616 97 dict
     3 22 5 1448 2 58064 100 str
     4 2 0 192 0 58256 100 weakref.KeyedRef
     5 1 0 64 0 58320 100 keystoneauth1.discover.Discover

5.The issue

So the problem is that memoized does not work like what we expect. It allocates memory to cache objects but some of them could not be released.

Xingchao Yu (yuxcer)
description: updated
description: updated
Ivan Kolodyazhny (e0ne)
Changed in horizon:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Adrian Turjak (adriant-y) wrote :

I just did a quick bit of testing to see what functions don't have any weakrefs and therefore can't be cleaned up:
http://paste.openstack.org/show/730680/

This isn't anywhere near a full list most likely, just me adding the print, and then running tests, as well as running horizon and looking at some pages.

I did also print the number of items in the cache of each function, and they didn't really increase much on refreshes, but logging in as a different user or switching to other projects did. So lots of concurrent users or multiple projects will mean lots of cached keys of very similar things that will never clear and just continue to grow and grow.

Changed in horizon:
assignee: nobody → Radomir Dopieralski (deshipu)
Revision history for this message
Radomir Dopieralski (deshipu) wrote :
Changed in horizon:
assignee: Radomir Dopieralski (deshipu) → nobody
status: Confirmed → New
status: New → Fix Committed
Revision history for this message
Noam Assouline (assoulin) wrote :

hi @deshipu,

i understand that this fix is for 'Stein' version, but I'm wondering if there is a fix for 'Queens' and 'Rocky' as well?

I've been trying to integrate the changes in memoized.py and settings.py in 'Rocky' version, but the issue still exists... (MEM USAGE go up when using Network Topology, but never goes down).

is there any way to fix this memory leak issue for 'Queens' and 'Rocky'?

Revision history for this message
Vishal Manchanda (vishalmanchanda) wrote :

As mentioned in the above comments it is already fixed by https://review.openstack.org/c/614893/.
So changing bug status to Fix Released and regarding the same issue in Queen and Rocky branch, you
can apply the same fix in your env. I don't have an evn. to check if above patch not work in the old branches. Last but not least Rocky and Queen are EOL, So I am not sure it is good to purpose backport
to old branches now.
Feel Free to add your suggestions.

Changed in horizon:
status: Fix Committed → Fix Released
importance: High → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.