pdftk fails with "output" option

Bug #779908 reported by Stefan Löffler
54
This bug affects 11 people
Affects Status Importance Assigned to Milestone
gcj-4.5 (Ubuntu)
Confirmed
Undecided
Unassigned
pdftk (Ubuntu)
Confirmed
Undecided
Johann Felix Soden

Bug Description

Binary package hint: pdftk

When invoking pdftk with `pdftk a.pdf output /tmp/b.pdf` (I also tried `pdftk a.pdf cat output /tmp/b.pdf`), pdftk fails with the following messages:

Unhandled Java Exception:
Unhandled Java Exception:
java.lang.NullPointerException
   at gnu.gcj.runtime.NameFinder.lookup(libgcj.so.11)
   at java.lang.VMThrowable.getStackTrace(libgcj.so.11)
   at java.lang.Throwable.getStackTrace(libgcj.so.11)
   at java.lang.Throwable.stackTraceString(libgcj.so.11)
   at java.lang.Throwable.printStackTrace(libgcj.so.11)
   at java.lang.Throwable.printStackTrace(libgcj.so.11)

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: pdftk 1.44-1
ProcVersionSignature: Ubuntu 2.6.38-8.42-generic 2.6.38.2
Uname: Linux 2.6.38-8-generic i686
Architecture: i386
Date: Mon May 9 14:51:36 2011
ProcEnviron:
 LANGUAGE=de_DE:en
 PATH=(custom, user)
 LANG=de_AT.utf8
 LC_MESSAGES=de_DE.UTF-8
 SHELL=/bin/bash
SourcePackage: pdftk
UpgradeStatus: Upgraded to natty on 2011-04-28 (11 days ago)

Revision history for this message
Stefan Löffler (st.loeffler) wrote :
Revision history for this message
Johann Felix Soden (johfel) wrote :

Thank you for taking the time to report this bug.

It seems that a double exception occurs: First something unknown throws an exception, then the Stack-Tracer fails.

To track the bug down, here two questions:
1) Have you tried different input pdf files?
2) Does something like pdftk a.pdf dump_data output /tmp/a.txt work?

Revision history for this message
Stefan Löffler (st.loeffler) wrote :

1) Yes, I have tried with different files, in particular: one generated by gnuplot, one from the internet, and one hand-written according to the examples in the PDf specifications

2) dump_data works, with and without output, and even when there is no "info dictionary" as in the case for the hand-written file.

FWIW, I ran `pdftk poppler-data.pdf output /tmp/a.pdf verbose` and got the following (not sure if it helps):
Command Line Data is valid.

Input PDF Filenames & Passwords in Order
( <filename>[, <password>] )
   poppler-data.pdf

The operation to be performed:
   filter - Apply 'filters' to a single, input PDF based on output args.
      (When the operation is omitted, this is the default.)

The output file will be named:
   /tmp/a.pdf

Output PDF encryption settings:
   Output PDF will not be encrypted.

No compression or uncompression being performed on output.

Creating Output ...
Unhandled Java Exception:
Unhandled Java Exception:
java.lang.NullPointerException
   at gnu.gcj.runtime.NameFinder.lookup(libgcj.so.11)
   at java.lang.VMThrowable.getStackTrace(libgcj.so.11)
   at java.lang.Throwable.getStackTrace(libgcj.so.11)
   at java.lang.Throwable.stackTraceString(libgcj.so.11)
   at java.lang.Throwable.printStackTrace(libgcj.so.11)
   at java.lang.Throwable.printStackTrace(libgcj.so.11)

Revision history for this message
Fabian Kretzschmar (fabian-kretzschmar) wrote :

A similar error here:

pdftk org.pdf output test.pdf
works fine.

pdftk org.pdf cat 1-3 output test.pdf verbose:
Command Line Data is valid.

Input PDF Filenames & Passwords in Order
( <filename>[, <password>] )
   org.pdf

The operation to be performed:
   cat - Catenate given page ranges into a new PDF.

The output file will be named:
   test.pdf

Output PDF encryption settings:
   Output PDF will not be encrypted.

No compression or uncompression being performed on output.

Creating Output ...
   Adding page 1 X0X from org.pdf
Unhandled Java Exception:
Unhandled Java Exception:
java.lang.NullPointerException
   at gnu.gcj.runtime.NameFinder.lookup(libgcj.so.11)
   at java.lang.VMThrowable.getStackTrace(libgcj.so.11)
   at java.lang.Throwable.getStackTrace(libgcj.so.11)
   at java.lang.Throwable.stackTraceString(libgcj.so.11)
   at java.lang.Throwable.printStackTrace(libgcj.so.11)
   at java.lang.Throwable.printStackTrace(libgcj.so.11)

If i run pdfposter on org.pdf the pdftk works. Are there any ideas how to check what causes this error?

Revision history for this message
Johann Felix Soden (johfel) wrote :

Thanks for the answers and the new report!

Using the current Ubuntu Live CD, I could sadly not really reproduce this bug. Only when the output file is the same as the input file but with different path ( pdftk /tmp/x.pdf output /tmp/./x.pdf ), pdftk crashes analog to the here described behaviour - which seems to be another (maybe related) bug.

I made a version of pdftk containing more debugging information- the stack-trace printing should work with it better. The ppa can be found at https://launchpad.net/~johfel/+archive/pdftk . Please install additionally the libgcj11-dbg package. Thanks!

Changed in pdftk (Ubuntu):
assignee: nobody → Johann Felix Soden (johfel)
Revision history for this message
Stefan Löffler (st.loeffler) wrote :

Thanks for providing the debugging packages.

I just found out that on my amd64 system at home, pdftk (same version) works as expected. So I have to wait until next week to try on the i386 system at work.

It seems to me, however, that this can't be a bug in the pdftk core code. Instead, it looks like a problem either in the i386 package or one of its dependencies/libraries...

Changed in pdftk (Ubuntu):
status: New → Confirmed
status: Confirmed → Incomplete
Revision history for this message
Stefan Löffler (st.loeffler) wrote :

I finally got around to running the debug pdftk. Here's the output:

$ pdftk base14-fonts.pdf output /tmp/a.pdf verbose
Command Line Data is valid.

Input PDF Filenames & Passwords in Order
( <filename>[, <password>] )
   base14-fonts.pdf

The operation to be performed:
   filter - Apply 'filters' to a single, input PDF based on output args.
      (When the operation is omitted, this is the default.)

The output file will be named:
   /tmp/a.pdf

Output PDF encryption settings:
   Output PDF will not be encrypted.

No compression or uncompression being performed on output.

Creating Output ...
Unhandled Java Exception:
Unhandled Java Exception:
java.lang.NullPointerException
   at gnu.gcj.runtime.NameFinder.lookup(NameFinder.java:201)
   at java.lang.VMThrowable.getStackTrace(natVMThrowable.cc:44)
   at java.lang.Throwable.getStackTrace(Throwable.java:524)
   at java.lang.Throwable.stackTraceString(Throwable.java:419)
   at java.lang.Throwable.printStackTrace(Throwable.java:365)
   at java.lang.Throwable.printStackTrace(Throwable.java:354)

$ apt-cache policy pdftk
pdftk:
  Installiert: 1.44-1withdbg2
  Kandidat: 1.44-1withdbg2
  Versionstabelle:
 *** 1.44-1withdbg2 0
        500 http://ppa.launchpad.net/johfel/pdftk/ubuntu/ natty/main i386 Packages
        100 /var/lib/dpkg/status
     1.44-1 0
        500 http://gd.tuwien.ac.at/opsys/linux/ubuntu/archive/ natty/universe i386 Packages

$ apt-cache policy libgcj11-dbg
libgcj11-dbg:
  Installiert: 4.5.2-8ubuntu1
  Kandidat: 4.5.2-8ubuntu1
  Versionstabelle:
 *** 4.5.2-8ubuntu1 0
        500 http://gd.tuwien.ac.at/opsys/linux/ubuntu/archive/ natty/main i386 Packages
        100 /var/lib/dpkg/status

Revision history for this message
Stefan Löffler (st.loeffler) wrote :

FWIW: Trying to debug the problem myself, I built pdftk from the package myself (`apt-get source pdftk`, `make -f Makefile.Debian`).
The resulting stack trace was a little more helpful:
$ ./pdftk base14-fonts.pdf output a.pdf verbose
Command Line Data is valid.

Input PDF Filenames & Passwords in Order
( <filename>[, <password>] )
   base14-fonts.pdf

The operation to be performed:
   filter - Apply 'filters' to a single, input PDF based on output args.
      (When the operation is omitted, this is the default.)

The output file will be named:
   a.pdf

Output PDF encryption settings:
   Output PDF will not be encrypted.

No compression or uncompression being performed on output.

Creating Output ...
Unhandled Java Exception:
java.lang.ArrayIndexOutOfBoundsException: 4
   at java.text.SimpleDateFormat.formatWithAttribute(SimpleDateFormat.java:793)
   at java.text.SimpleDateFormat.format(SimpleDateFormat.java:845)
   at java.text.DateFormat.format(DateFormat.java:419)
   at com.lowagie.text.Document.addCreationDate(pdftk)
   at com.lowagie.text.pdf.PdfDocument.<init>(pdftk)
   at com.lowagie.text.pdf.PdfStamperImp.<init>(pdftk)

Ultimately, I tracked the problem down to java/com/lowagie/text/Document.java l.644 - the addCreationDate() function. There, a new SimpleDateFormat object is created, with the format string "EEE MMM dd HH:mm:ss zzz yyyy" and no locale. When subsequently its format method is applied to "new Date()" (which is valid), said exception occurs. If the "MMM" part is removed, everything succeeds as it should. I also tried explicitly specifying the en_US locale, but to no avail (I'm using de_AT myself).

Revision history for this message
Stefan Löffler (st.loeffler) wrote :

After searching for the ArrayIndexOutOfBoundsException, I found https://bugs.launchpad.net/ubuntu/+source/gcj-4.4/+bug/487922 which seems quite closely related. However, natty comes with gcj-4.5, not 4.4, and apparently a fix for that problem was released. So either that fix got discarded somewhere on the way to natty, or it no longer fixes this issue.

Revision history for this message
Stefan Löffler (st.loeffler) wrote :

After some additional checking, I found that the patch mentioned in comment #9 is indeed applied - so LANG=C is set. So this isn't it. Strangely, during my tests I seem to remember that I got de_AT (i.e., German) output when printing the dates to the command line (for debugging purposes). So maybe LANG=C doesn't kick in, at least not in SimpleDateFormat?

Revision history for this message
Johann Felix Soden (johfel) wrote :

It seems that the gcj-4.5 java library is initialized before pdftk can set LANG=C. So only setting LANG=C in the environment before starting the pdftk seems to work. However, with gcj-4.6 (Debian sid) the workaround seems to work again.

The error in the gcj java runtime library can be easily reproduced by the attached small java program.

It seems that all month abbreviation for LANG=de_DE are empty and it crashes for LANG=de_AT between April and December which is clearly a bug in gcj.

 $ gcj -C TestDateFormat.java
 $ LANG=de_DE gij TestDateFormat
 So. 29 12:57:20 MESZ 2011
 => empty Month abbreviation

 $ LANG=de_AT gij TestDateFormat
 Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 4
   at java.text.SimpleDateFormat.formatWithAttribute(SimpleDateFormat.java:793)
   at java.text.SimpleDateFormat.format(SimpleDateFormat.java:845)
   at java.text.DateFormat.format(DateFormat.java:419)
   at TestDateFormat.main(TestDateFormat.java:12)
  => crash

 $ LANG=de_AT faketime "2010-3-1" gij TestDateFormat
 Mo. M�r 01 00:00:00 GMT+01:00 2010
 => between January and March it "works", but with wrong encoding

 $ LANG=C gij TestDateFormat
 Sun May 29 12:54:42 GMT+02:00 2011
 => works anytime

Changed in pdftk (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
sdaau (sd-imi) wrote :

Don't know if this is the same bug - but what pisses me off is this:

$ pdftk "my_big_file.pdf" output "/tmp/a.pdf" verbose
Error: Failed to open PDF file:
   my_big_file.pdf
Errors encountered. No output created.
Done. Input errors, so no output created.

Hello - give me some "verbose" please... Something ! :( WHAT are these "Errors encountered."?????

And it is the debug ppa:johfel/pdftk version I installed :( And

LANG=C pdftk "my_big_file.pdf" output "/tmp/a.pdf" verbose

apparently makes no difference here..

If anyone has any idea how to make pdftk spit out the actual error it is experiencing, I'd appreciate that...

Cheers!

Revision history for this message
baumtopf (baumtopf) wrote :

Hello

I get this problem:
I tried to work with pdftk and pdf chain. But I get often this error message:

baumtopf@ubuntu:~$ pdftk eee.pdf fff.pdf ggg.pdf cat output merged2.pdf
Unhandled Java Exception:
Unhandled Java Exception:
java.lang.NullPointerException
   at gnu.gcj.runtime.NameFinder.lookup(NameFinder.java:201)
   at java.lang.VMThrowable.getStackTrace(natVMThrowable.cc:44)
   at java.lang.Throwable.getStackTrace(Throwable.java:524)
   at java.lang.Throwable.stackTraceString(Throwable.java:419)
   at java.lang.Throwable.printStackTrace(Throwable.java:365)
   at java.lang.Throwable.printStackTrace(Throwable.java:354)

and I cannot take a look at the merged pdf file and this error message comes:

"PDF document is damaged"

Later I tried this:
baumtopf@ubuntu:~$ LANG=C pdftk eee.pdf fff.pdf ggg.pdf cat output merged2.pdf
Unhandled Java Exception:
Unhandled Java Exception:
java.lang.NullPointerException
   at gnu.gcj.runtime.NameFinder.lookup(NameFinder.java:201)
   at java.lang.VMThrowable.getStackTrace(natVMThrowable.cc:44)
   at java.lang.Throwable.getStackTrace(Throwable.java:524)
   at java.lang.Throwable.stackTraceString(Throwable.java:419)
   at java.lang.Throwable.printStackTrace(Throwable.java:365)
   at java.lang.Throwable.printStackTrace(Throwable.java:354)

I refer to this link:

https://bugs.launchpad.net/ubuntu/+source/gcj-4.4/+bug/487922

But this error message comes always.
And the links where I can download PDFTK from jauntry don't work:

http://de.archive.ubuntu.com/ubuntu/pool/universe/p/pdftk/pdftk_1.41-3ubuntu1_i386.deb
http://de.archive.ubuntu.com/ubuntu/pool/universe/p/pdftk/pdftk_1.41-3ubuntu1_amd64.deb

Also I looked for these packages pdftk_1.41-3ubuntu1_i386.deb and pdftk_1.41-3ubuntu1_amd64.deb in Google. But I cannot find Downloads of these packages.

Please help, what can I do?

Regards, Juergen

Revision history for this message
Johann Felix Soden (johfel) wrote :

Thanks for all testing and the helpful reports!

I hope that I have found the reason now why the patch has stopped to work:
Since Ubuntu Natty (10.4) the language setting is no longer only by the LANG environment variable, but now
additionally by the LC_MESSAGES, which pdftk does not overwrite yet.

So please try, if
 LC_MESSAGES=C pdftk ...
or
 LC_MESSAGES= pdftk ...
works.

Revision history for this message
Stefan Löffler (st.loeffler) wrote :

overriding LC_MESSAGES did not have the wanted effect for me. However, it did work if I unset LC_ALL, such as
   LC_ALL= pdftk...

I did a bit more research, and it seems this was set in /etc/default/locale. After clearing that and filling it again via the UI "Language" dialog of Ubuntu, LC_ALL is no longer set in that file (or elsewhere, for that matter; `locale` shows LC_ALL as empty), and pdftk works smoothly now. Thanks for the work.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in gcj-4.5 (Ubuntu):
status: New → Confirmed
Revision history for this message
Scott T Rogers (scott-t-rogers) wrote :

I just ran into this error today using Ubuntu 12.04 with all patches and updates:

pdftk A=my_input.pdf cat A0-2 output my_output.pdf verbose
Command Line Data is valid.

Input PDF Filenames & Passwords in Order
( <filename>[, <password>] )
   my_input.pdf

The operation to be performed:
   cat - Catenate given page ranges into a new PDF.

The output file will be named:
   my_output.pdf

Output PDF encryption settings:
   Output PDF will not be encrypted.

No compression or uncompression being performed on output.

Creating Output ...
   Adding page 0 X0X from my_input.pdf
Unhandled Java Exception:
Unhandled Java Exception:
java.lang.NullPointerException
   at gnu.gcj.runtime.NameFinder.lookup(libgcj.so.12)
   at java.lang.Throwable.getStackTrace(libgcj.so.12)
   at java.lang.Throwable.stackTraceString(libgcj.so.12)
   at java.lang.Throwable.printStackTrace(libgcj.so.12)
   at java.lang.Throwable.printStackTrace(libgcj.so.12)

Revision history for this message
Jorge Sivil (jorgex0-o) wrote :

I just had a similar problem with java Exceptions.

My PDF was a single image PDF coming from an ActiveX Plugin that scans and converts to PDF.

The first problem was that PDFTK 1.44 was hanging with dump_data. That was fixed by appending a x0A next to the final x0D.

Then I've tried to update the info, and was getting exceptions.

It came to my attention that the info object number 7, wasn't being closed by endobj, but it wasn't the problem. But while looking at it, and comparing to a PDF that was the result of the failing PDF passed through exiftool and adding Author key (which was able to update it with PDFTK) I saw that instead of x0D, the working one had 0A to separate between sentences.

So I replaced the x0D IN and SURROUNDING the object number 7 (info object, metadata) and I was able to update it with PDFTK and with no exceptions.

Later, I just brought the original PDF again and just ran this: sed -i 's/\x0D/\x0A/' 5372.pdf

Then I tried dump_data and update_info and it was completely succesful.

Hope this can help someone.

Revision history for this message
Alexander Baron (thealexbaron) wrote :

I did not mean to change to "Fix Released". Just trying to understand the UI.

Any updates on this bug for anyone? LC_ALL was already blank for my user.

Changed in pdftk (Ubuntu):
status: Confirmed → Fix Released
William Grant (wgrant)
Changed in pdftk (Ubuntu):
status: Fix Released → Confirmed
Revision history for this message
Hugh Parker (hcp) wrote :

Alex Baron - you asked for an update: I'm still getting this same problem, sadly.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.