python-poppler doesn't free pages

Bug #509408 reported by Yannick Voglaire
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Poppler Python Bindings
Confirmed
High
Gian Mario Tagliaretti

Bug Description

Hi,

This app (modified from that of bug #316722 "python-poppler doesn't close files") takes up more and more memory during its execution :

---------------------------
import os
import poppler
from ctypes import *

glib = CDLL("libgobject-2.0.so")

uri = "file://" + os.path.abspath("test.pdf")
doc = poppler.document_new_from_file(uri, None)

for i in range(1000000):
 page = doc.get_page(0)
---------------------------

For a 24 Kb pdf file, it ends up taking 173 Mb of RAM.
Unreferencing with glib helps, but not as much as we would like: with this app

---------------------------
import os
import poppler
from ctypes import *

glib = CDLL("libgobject-2.0.so")

uri = "file://" + os.path.abspath("test.pdf")
doc = poppler.document_new_from_file(uri, None)

for i in range(1000000):
 page = doc.get_page(0)
 glib.g_object_unref(hash(page))
 del page
---------------------------

we "only" end up with 108 Mb used. (More precisely, to get these numbers, I just added at the end the lines
import time
time.sleep(5)
and checked in "top" the RES memory usage. Looking at the VIRT column gives similar results.)

This seems to be really due to python-poppler and not poppler, as I ran a corresponding C code (derived from test-poppler-glib.cc in poppler source), using only poppler-glib, and
1) without "g_object_unref (G_OBJECT (page));", the memory usage grows steadily (culminating at around 70 Mb) ;
2) with "g_object_unref (G_OBJECT (page));", the memory usage stays the same throughout the execution, so in this case it completely solves the problem.

---------------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <poppler.h>

#define FAIL(msg) \
 do { fprintf (stderr, "FAIL: %s\n", msg); exit (-1); } while (0)

int main (int argc, char *argv[])
{
  PopplerDocument *document;
  PopplerPage *page;
  GError *error;

  if (argc != 3)
    FAIL ("usage: test-poppler-glib file://FILE PAGE");

  g_type_init ();

  error = NULL;
  document = poppler_document_new_from_file (argv[1], NULL, &error);
  if (document == NULL)
    FAIL (error->message);

  for (gint i=0; i<=1000000; i++)
  {
    g_print("%d", i);
    page = poppler_document_get_page_by_label (document, argv[2]);
    g_object_unref (G_OBJECT (page));
  }

  g_object_unref (G_OBJECT (document));

  return 0;
}
---------------------------

I ran these tests with poppler-0.12.0 and python-poppler-0.10.0 on Ubuntu 9.10.

Revision history for this message
Yannick Voglaire (yannickv) wrote :

For the first python code, the lines 3 to 5 are of course not needed.

Revision history for this message
Yannick Voglaire (yannickv) wrote :

Sorry for testing on an old version (the default in Ubuntu 9.10).
I just tested with python-poppler-0.12.1 (but still poppler-0.12.0) with the same results.

Revision history for this message
Gian Mario Tagliaretti (gianmt) wrote :

Yannick,

thanks for the time reporting this bug, I think is the same reason as the one explained in #316722, the bindings shouldn't be responsible to clean up the page and the document, for example in pygtk we never clean after widgets are not used anymore or similar situations, the library should it.

I didn't have the time (yet) to look at how poppler-glib itself works, I only proposed an API to close the documents but still i don't think is the right solution, I will look into it as soon as I can.

Changed in poppler-python:
assignee: nobody → Gian Mario Tagliaretti (gianmt)
importance: Undecided → High
milestone: none → development
status: New → Confirmed
Revision history for this message
alexandervdm (alexandervdm) wrote :

Any news regarding this bug? My application renders a newly created pdf every x seconds, so my stake in getting this solved is probably higher than that of others. I wasn't able to find your proposal for a document.release() function (described in #316722) on the poppler bugzilla/malinglists. An update on this bug's status would be very much appreciated, thanks in advance.

Revision history for this message
Gian Mario Tagliaretti (gianmt) wrote : Re: [Bug 509408] Re: python-poppler doesn't free pages

On Thu, Apr 8, 2010 at 3:42 PM, alexandervdm <email address hidden> wrote:

Hi Alexander,

> Any news regarding this bug? My application renders a newly created pdf
> every x seconds, so my stake in getting this solved is probably higher
> than that of others.

I never got an answer from poppler devs actually

> I wasn't able to find your proposal for a
> document.release() function (described in #316722) on the poppler
> bugzilla/malinglists. An update on this bug's status would be very much
> appreciated, thanks in advance.

Here is the bug
https://bugs.freedesktop.org/show_bug.cgi?id=21970

cheers
--
Gian Mario Tagliaretti
GNOME Foundation member
<email address hidden>

description: updated
Revision history for this message
alexandervdm (alexandervdm) wrote :

The solution in #316722 should fix this issue as well.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.