garbage collection problem

Bug #269966 reported by kilian
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
IPython
Fix Released
High
Fernando Perez

Bug Description

If a variable is created within a script that was executed using the %run command, there is a reference to it created somewhere
that prevents it from being garbage collected after deleting of the variable.

I am not sure if this is really a bug or just a misunderstanding on my side. However, it would be good if it were possible to delete
a variable created inside a script without deleting the whole name space using %reset.

This example is from an email thread in ipython-user: http://lists.ipython.scipy.org/pipermail/ipython-user/2008-July/005599.html

example:
create the following script named test_destructor.py and execute it using the ipython %run command:

kilian@chebang:~$ cat test_destructor.py
class C(object):
    def __del__(self):
        print 'deleting object...'

c = C()

kilian@chebang:~$ python test_destructor.py
deleting object...

now, let's try in ipython:

In [1]: run test_destructor.py

In [2]: del c

In [3]: import gc

In [4]: gc.collect()
Out[4]: 47

(object still not deleted)

In [5]: %reset
Once deleted, variables cannot be recovered. Proceed (y/[n])? y
deleting object...

Finally!

Related branches

Revision history for this message
matthew arnison (mra-cisra) wrote :

I discovered the same issue. I think it is a serious memory leak.

For example, if the script you are running with "run" creates large arrays or images. In a typical situation, I am running the same script over and over as I improve and test it. Because of this bug, each script run leaks whatever memory is attached to the objects created by the script. This can add up to a large amount of memory in the sort of lengthy interactive session that IPython is so good at encouraging and supporting.

I tracked it down to IPython/Magic.py:1570 (in Ipython release 0.8.4) in the magic_run function. Here is an extract:

        if opts.has_key('i'):
            # Run in user's interactive namespace
            prog_ns = self.shell.user_ns
            __name__save = self.shell.user_ns['__name__']
            prog_ns['__name__'] = '__main__'
            main_mod = FakeModule(prog_ns)
        else:
            # Run in a fresh, empty namespace
            if opts.has_key('n'):
                name = os.path.splitext(os.path.basename(filename))[0]
            else:
                name = '__main__'
            main_mod = FakeModule()
            prog_ns = main_mod.__dict__
            prog_ns['__name__'] = name
            # The shell MUST hold a reference to main_mod so after %run exits,
            # the python deletion mechanism doesn't zero it out (leaving
            # dangling references)
            self.shell._user_main_modules.append(main_mod)

The last line appends the namespace of the script to a list. The list is only cleaned up if the user runs the %reset command. But that is overkill -- generally I want to keep the prompt's namespace but I am happy to overwrite the namespace when I re-run a script.

I found that commenting out the last line fixes the problem. I disagree with the comment above the last line.

Revision history for this message
Fernando Perez (fdo.perez) wrote :

This is indeed a problem, however disabling the above line is not a viable solution, as you can see here. If you comment out that last line, you'll get this behavior:

In [9]: cat tclass.py
"""Simple script to instantiate a class for testing %run"""

class foo: pass

def f():
    return foo()

x = f()

In [10]: run tclass

In [11]: x
Out[11]: <__main__.foo instance at 0x7fc79a42a638>

In [12]: f()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)

/home/fperez/ipython/repo/trunk-dev/IPython/tests/tclass.py in <module>()
----> 1
      2
      3
      4
      5

/home/fperez/ipython/repo/trunk-dev/IPython/tests/tclass.py in f()
      4
      5 def f():
----> 6 return foo()
      7
      8 x = f()

TypeError: 'NoneType' object is not callable

If that main namespace isn't kept somewhere, it becomes impossible to later use certain things defined in the script (such as instantiating classes).

I still don't have a good solution for this, I'm afraid. Any ideas welcome...

Changed in ipython:
assignee: nobody → fdo.perez
importance: Undecided → High
status: New → Confirmed
Revision history for this message
matthew arnison (mra-cisra) wrote :

Ah now I understand better why that line is there. Thank you for explaining it.

I don't really understand the error but I can see that the error could easily be triggered in normal use of %run.

As a work around, maybe you could add a magic function which clears the namespace cache? At the moment that's not possible without clearing the prompt namespace as well. The documentation for the new magic function could point out the risk of this error ocurring.

But ideally there would be some passive method for avoiding the memory leak.

OK after some digging I see that the problem is that the function f() carries its own copy of the global namespace as it was at the time that f() was defined.

In [65]: f.func_globals
Out[65]:
{'__builtins__': {'ArithmeticError': <type 'exceptions.ArithmeticError'>,
                  'AssertionError': <type 'exceptions.AssertionError'>,
                  'AttributeError': <type 'exceptions.AttributeError'>,

-<snip>-

 '__doc__': None,
 '__file__': None,
 '__nonzero__': None,
 'f': None,
 'foo': None,
 'x': None}

In [67]: f.func_globals['foo'] is None
Out[67]: True

Maybe the prompt namespace copy of foo can be grafted onto f, something like this:

In [68]: f.func_globals['foo'] = foo

In [69]: f()
Out[69]: <__main__.foo instance at 0x7d9fbc0>

Perhaps this could be automated by searching for functions in the script namespace.

Revision history for this message
Fernando Perez (fdo.perez) wrote : Re: [Bug 269966] Re: garbage collection problem

On Thu, Mar 12, 2009 at 7:46 PM, matthew arnison <email address hidden> wrote:
> Ah now I understand better why that line is there. Thank you for
> explaining it.
>
> I don't really understand the error but I can see that the error could
> easily be triggered in normal use of %run.
>
> As a work around, maybe you could add a magic function which clears the
> namespace cache? At the moment that's not possible without clearing the
> prompt namespace as well. The documentation for the new magic function
> could point out the risk of this error ocurring.
>
> But ideally there would be some passive method for avoiding the memory
> leak.

Don't worry. I finally found a full, robust and fully automatic solution.

I'll commit the code tomorrow once I finish documenting it and adding tests.

Thanks for all the feedback!

Cheers,

f

Revision history for this message
Fernando Perez (fdo.perez) wrote :
Changed in ipython:
status: Confirmed → Fix Committed
Revision history for this message
matthew arnison (mra-cisra) wrote :

I read over your patch, and the main functional change as I see it that the cache of old modules is now a dict instead of a list. The dict is keyed by the %run script path. This means that only 1 module copy is cached per script path.

Nice solution. Thanks for continuing to share your work on IPython.

Revision history for this message
Fernando Perez (fdo.perez) wrote :

On Sun, Mar 15, 2009 at 4:07 PM, matthew arnison <email address hidden> wrote:
> I read over your patch, and the main functional change as I see it that
> the cache of old modules is now a dict instead of a list. The dict is
> keyed by the %run script path. This means that only 1 module copy is
> cached per script path.

Yup :) It's the dead-simple, obvious solution in retrospect, but it
took me a while to realize that. I had a really nice 'aha' moment
when I understood it, and it's a great little example of the power of
Python's data structures.

In fact, the original fix was simply to change in iplib, around line 345:
       self._user_main_modules = []
to:
       self._user_main_modules = {}

and then in %run, instead of:
       self.shell._user_main_modules.append(main_mod)
use:
       self.shell._user_main_modules[os.path.abspath(main_mod.__file__)]=main_mod

So the whole patch, at its core, was 2 lines. I ended up coding a
fancier version, with a real API to access this so that magics don't
go mucking around with private internals of the shell object, but in
the end this was the whole idea, 2 lines :)

I think I'll write up a little blog post about this, it's a good
illustration of the (well known) fact that thinking clearly about your
data structures can save you a lot of time, and how Python exposes
naturally data structures that are very well adapted to many problems.

> Nice solution. Thanks for continuing to share your work on IPython.

My pleasure. Thanks for the feedback!

f

Changed in ipython:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.