vmware: nova compute memory grows continuously with creation and deletions on VMs

Bug #1316433 reported by Ishant Tyagi
20
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Critical
Radoslav Gerganov
Icehouse
Fix Released
Undecided
Unassigned
VMwareAPI-Team
New
Undecided
Unassigned

Bug Description

nova-compute memory grows to 3gb in a Scaled environment (5000 vm) where continuous VM operations were executed for 72 hours.

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22181 nova 20 0 3325m 3.0g 6588 R 34.2 19.0 3984:23 nova-compute

To test the memory leak we created and deleted same number of vms for some iterations. Here is the data .

VM Count Memory (KB )
0 516528
20 518520
40 518520
60 522192
80 524068
100 526904
0 526904
20 526904
40 526904
60 526904
80 526904
100 529372
0 530972
20 530972
40 530972
60 530972
80 530972
100 533484
0 533484
20 533484
40 533484
60 533484
80 533484
100 535112
0 535112

When analysed with objgraph, it showed that suds library objects occupied most of the memory.

Tracy Jones (tjones-i)
Changed in nova:
importance: Undecided → High
Tracy Jones (tjones-i)
Changed in nova:
assignee: nobody → Eric Brown (ericwb)
Eric Brown (ericwb)
Changed in nova:
assignee: Eric Brown (ericwb) → Radoslav Gerganov (rgerganov)
Revision history for this message
Radoslav Gerganov (rgerganov) wrote :

The reason for the memory leak is that we use suds objects as keys for the _datastore_browser_mapping cache. Suds objects do not implement __eq__ and __hash__ properly for VIM types such as ManagedObjectReference, we always have cache miss and thus _datastore_browser_mapping grows with every created instance.

This is how it looks like after two spawn() operations:

(Pdb) self._datastore_browser_mapping
{(obj){
   value = "datastore-10"
   _type = "Datastore"
 }: (val){
   value = "datastoreBrowser-datastore-10"
   _type = "HostDatastoreBrowser"
 }, (obj){
   value = "datastore-10"
   _type = "Datastore"
 }: (val){
   value = "datastoreBrowser-datastore-10"
   _type = "HostDatastoreBrowser"
 }}

The solution is to use the 'value' property of the MoRef as we do for the _datastore_dc_mapping.
I will submit a patch shortly.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/97164

Changed in nova:
status: New → In Progress
tags: added: havana-backport-potential icehouse-backport-potential
Revision history for this message
Gary Kotton (garyk) wrote :

Great find!
Problem only exists in Icehouse.

Changed in nova:
importance: High → Critical
tags: removed: havana-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/97164
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bceb3f96b06ff8a048598724494299cf111bcaf8
Submitter: Jenkins
Branch: master

commit bceb3f96b06ff8a048598724494299cf111bcaf8
Author: Radoslav Gerganov <email address hidden>
Date: Mon Jun 2 10:21:45 2014 +0300

    VMware: Fix memory leaks caused by caches

    Using suds objects as keys for the DatastoreBrowser cache is incorrect
    because they don't implement __eq__ and __hash__ for the VIM types.
    This always results in cache miss and the cache grows with every spawn()
    operation.

    This patch fix this by using the 'value' property (which is string) of
    the MoRef as key.

    Change-Id: I2bcaf87e733d51055566aee41bb0a7e254027ba9
    Closes-Bug: 1316433

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/97707

Revision history for this message
Ishant Tyagi (ishanttyagi) wrote :

This bug was found in Havana . The cache fixes done does not apply to havana as that code is not there in havana.
The fix does not addresses the cache miss in https://github.com/openstack/nova/blob/master/nova/virt/vmwareapi/vmops.py#L1557 . This code exists in havana and might be the cause of leak .

Revision history for this message
Radoslav Gerganov (rgerganov) wrote :

Ishant, thanks for the update. I will prepare another patch and will backport it to havana.

tags: added: havana-backport-potential
Revision history for this message
Radoslav Gerganov (rgerganov) wrote :

I am not able to reproduce a memory leak on Havana. What Ishant pointed in comment #6 is not a problem because 'ds_ref' is a string, not a MoRef.

I have been using the default image for DevStack Havana (debian-2.6.32-i686) and the following commands (repeated many times):

  nova boot --flavor m1.nano --image 6be7cf2c-4ff3-4e2b-a6c3-125323724f81 --security-groups default foobar
  nova delete foobar

I didn't observe any leaks with objgraph.

Ishant, could you please tell us which specific suds type you see leaking? What is the output of:

  objgraph.show_growth(limit=20, shortnames=False)

after every let's say 10 operations?

Revision history for this message
Ishant Tyagi (ishanttyagi) wrote :

Radoslav , I will run the tests again on havana and will let you know the results.

Thierry Carrez (ttx)
Changed in nova:
milestone: none → juno-1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/icehouse)

Reviewed: https://review.openstack.org/97707
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4820dbb4fdb39476a9b4dcd8dc42070f69bdd599
Submitter: Jenkins
Branch: stable/icehouse

commit 4820dbb4fdb39476a9b4dcd8dc42070f69bdd599
Author: Radoslav Gerganov <email address hidden>
Date: Mon Jun 2 10:21:45 2014 +0300

    VMware: Fix memory leaks caused by caches

    Using suds objects as keys for the DatastoreBrowser cache is incorrect
    because they don't implement __eq__ and __hash__ for the VIM types.
    This always results in cache miss and the cache grows with every spawn()
    operation.

    This patch fix this by using the 'value' property (which is string) of
    the MoRef as key.

    Closes-Bug: 1316433
    (cherry picked from commit bceb3f96b06ff8a048598724494299cf111bcaf8)

    Conflicts:

     nova/tests/virt/vmwareapi/test_imagecache.py
     nova/tests/virt/vmwareapi/test_vmops.py

    Change-Id: I2bcaf87e733d51055566aee41bb0a7e254027ba9

tags: added: in-stable-icehouse
Chuck Short (zulcss)
tags: removed: icehouse-backport-potential
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.