python-poppler doesn't close files

Bug #316722 reported by A.G. Nienhuis on 2009-01-13
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Poppler Python Bindings
Gian Mario Tagliaretti
python-poppler (Ubuntu)

Bug Description

This app crashes with "too many open files" after a while:

import os
import poppler

for i in range(1000000):
    uri = "file://" + os.path.abspath("test.pdf")
    doc = poppler.document_new_from_file(uri, None)

Is there a way to close a document?

A.G. Nienhuis (a-g-nienhuis) wrote :

'del doc' or 'gc.collect()' don't help

Gian Mario Tagliaretti (gianmt) wrote :

I've looking into this one for a long time with no real solution but I feel the GC could be the one to blame, did you investigate further?

I did investigate further and did some testing:

Current release:

- all python objects get destroyed automatically
- files stay open
- memory leak per document_new_from_file(): 40 kB (for a 300 kB pdf file)
- gc.collect() does nothing

If you call g_object_unref(hash(doc)) after each call to
document_new_from_file() the problems go away:

- all python objects get destroyed automatically
- files are closed
- memory leak per document_new_from_file(): 70 bytes (for a 300 kB pdf file)
- gc.collect() does nothing
- doc is completely usable after g_object_unref(...)

BTW: hash(obj) returns the pointer to the gobject of a PyGobject.

Here is a simple test case:

import os
import poppler
from ctypes import *

glib = CDLL("")

uri = "file://" + os.path.abspath("test.pdf")
for i in xrange(1000000):
    print i
    doc = poppler.document_new_from_file(uri, None)

On Fri, Apr 10, 2009 at 11:23 PM, Gian Mario Tagliaretti
<email address hidden> wrote:
> I've looking into this one for a long time with no real solution but I
> feel the GC could be the one to blame, did you investigate further?

Gian Mario Tagliaretti (gianmt) wrote :

I have files a bug with a patch that adds a poppler.Document.release() that will solve this issue, hopefully it will get in poppler-0.12, valgrind shows no complaints

==6750== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1248 from 9)
==6750== malloc/free: in use at exit: 3,470,220 bytes in 11,812 blocks.
==6750== malloc/free: 54,593 allocs, 42,781 frees, 9,141,246 bytes allocated.
==6750== For counts of detected errors, rerun with: -v
==6750== searching for pointers to 11,812 not-freed blocks.
==6750== checked 3,999,188 bytes.
==6750== LEAK SUMMARY:
==6750== definitely lost: 452 bytes in 15 blocks.
==6750== possibly lost: 43,208 bytes in 71 blocks.
==6750== still reachable: 3,426,560 bytes in 11,726 blocks.
==6750== suppressed: 0 bytes in 0 blocks.

how did you measure the leaks?

A.G. Nienhuis (a-g-nienhuis) wrote :

> how did you measure the leaks?

Leave it running for a few minutes while looking at gnome-system-monitor

Gian Mario Tagliaretti (gianmt) wrote :

I'm still waiting for my patch to be pushed into poppler itself, let's cross fingers :)

Changed in poppler-python:
assignee: nobody → Gian Mario Tagliaretti (gianmt)
status: New → In Progress
Changed in poppler-python:
importance: Undecided → Medium
milestone: none → development

Ehm? Why is a patch to poppler needed in any way?
(this is also basically the same as #509408, but that is known ...)

 As far as I can tell the only issue is that poppler-python is missing the
  (caller-owns-return #t)
all over the place in the poppler.defs file. This means that while the python object is destroyed, the C Object will stay alive, and keep the file opened. If this is added to the different functions, the C object will be destroyed too, and the file is closed just fine.

I would say the following functions will at least need the hint:
 * poppler_document_new_from_file
 * poppler_document_new_from_data
 * poppler_document_get_page (should fix #509408)
 * poppler_document_get_page_by_label
 * poppler_document_find_dest
 * poppler_document_get_form_field
 * poppler_index_iter_copy
 * poppler_index_iter_get_action

I have not done any extensive testing, but was able to open, and get the number of pages of several thousands of PDF files with this changed.

And likely the thumbnail getter too. Not sure how exactly the GList* handling is done by the binding generator.

alexandervdm (alexandervdm) wrote :

I just compiled the latest python-poppler revision including the solution presented by BenjaminBerg (Thanks!) and can confirm it is working. Both my own application and the demo program in the bug description show no increased memory usage and the proc/{id}/fd folder shows the objects are succesfully destroyed.

Hopefully this can be accepted into trunk and distro-repositories soon.

tags: added: patch
Changed in python-poppler (Ubuntu):
status: New → Triaged
logari81 (logari81) wrote :

Are there any objections to the solution proposed in comment #7? I can confirm that the patch from comment #8 solves all memleaks that I have encountered (see bug #509408). Who is responsible for accepting and applying this patch into the upstream project?

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package python-poppler - 0.12.1-8

python-poppler (0.12.1-8) unstable; urgency=low

  * uploading to unstable

 -- Andrea Gasparini <email address hidden> Sun, 29 Apr 2012 17:51:08 +0200

Changed in python-poppler (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers