open office pdf has invalid characters

Bug #807409 reported by madtom1999
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libitext-java (Ubuntu)
New
Undecided
Unassigned
Nominated for Lucid by Rolf Leggewie
pdftk (Ubuntu)
Fix Released
Undecided
Johann Felix Soden
Nominated for Lucid by Rolf Leggewie

Bug Description

I have been trying to use pdftk to append some files to a small pdf generated by open office 3.2 writer.
the program errors with
gnu.xml.dom.DomDOMException: That character is not permitted.
More Information: þÿOpenOffice.org 3.2; modified using iText 2.1.7 by 1T3XT
   at gnu.xml.dom.DomDocument.checkChar(libgcj.so.10)
   at gnu.xml.dom.DomDocument.checkChar(libgcj.so.10)
   at gnu.xml.dom.DomDocument.createTextNode(libgcj.so.10)
   at com.lowagie.text.xml.xmp.XmpReader.setNodeText(itext-2.1.7.jar.so)
   at com.lowagie.text.xml.xmp.XmpReader.replace(itext-2.1.7.jar.so)
   at com.lowagie.text.pdf.PdfStamperImp.close(itext-2.1.7.jar.so)

I can reproduce this error with most OpenOffice generated PDF files but not those from other sources.

Revision history for this message
penalvch (penalvch) wrote :

madtom1999, thank you for taking the time to report this bug and helping to make Ubuntu better. Please execute the following command, as it will automatically gather debugging information, in a terminal:
apport-collect 807409
When reporting bugs in the future please use apport by using 'ubuntu-bug' and the name of the package affected. You can learn more about this functionality at https://wiki.ubuntu.com/ReportingBugs.

Could you please attach the files that when appended by pdftk cause this problem?

affects: openoffice.org (Ubuntu) → pdftk (Ubuntu)
Changed in pdftk (Ubuntu):
status: New → Incomplete
Revision history for this message
madtom1999 (tompotts) wrote :

I've found that any pdf made by OO3.2 that I try to attach to causes problems
I've attached a pdf - use any file for ooerror.odt and using at as follows:
pdftk ooerror.pdf attach_files ooerror.odt output oo.pdf
Unhandled Java Exception:
gnu.xml.dom.DomDOMException: That character is not permitted.
More Information: þÿOpenOffice.org 3.2; modified using iText 2.1.7 by 1T3XT
   at gnu.xml.dom.DomDocument.checkChar(libgcj.so.10)
   at gnu.xml.dom.DomDocument.checkChar(libgcj.so.10)
   at gnu.xml.dom.DomDocument.createTextNode(libgcj.so.10)
   at com.lowagie.text.xml.xmp.XmpReader.setNodeText(itext-2.1.7.jar.so)
   at com.lowagie.text.xml.xmp.XmpReader.replace(itext-2.1.7.jar.so)
   at com.lowagie.text.pdf.PdfStamperImp.close(itext-2.1.7.jar.so)

Revision history for this message
penalvch (penalvch) wrote :

madtom1999, could you please also attach the ooerror.odt file?

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for pdftk (Ubuntu) because there has been no activity for 60 days.]

Changed in pdftk (Ubuntu):
status: Incomplete → Expired
Revision history for this message
penalvch (penalvch) wrote :

madtom1999, we are closing this bug report because it lacks the information we need to investigate the problem, as described in previous comments #1 & #3. Please reopen it if you can give us the missing information, and don't hesitate to submit bug reports in the future. To reopen the bug report you can click on the current status, under the Status column, and change the Status back to "New". Thanks again!

Changed in pdftk (Ubuntu):
status: Expired → Invalid
Revision history for this message
madtom1999 (tompotts) wrote :

My apologies - I didnt receive the mail informing me the file was required...see attachment

Revision history for this message
penalvch (penalvch) wrote :

madtom1999, this issue is unreproducible in LibreOffice Writer, in both Ubuntu 11.04 and 11.10 via the Terminal:

cd ~/Desktop && wget https://bugs.launchpad.net/ubuntu/+source/pdftk/+bug/807409/+attachment/2371063/+files/ooerror.odt && unoconv -f pdf ooerror.odt && pdftk ooerror.pdf attach_files ooerror.odt output oo.pdf

No errors. Does LibreOffice work for you? If using Lucid or Maverick feel free to type at the Terminal:

sudo add-apt-repository ppa:libreoffice/ppa && sudo apt-get update && sudo apt-get -y upgrade && sudo apt-get -y install libreoffice-writer

As well, please perform the requested action in comment #1 if still a problem.

lsb_release -rd
Description: Ubuntu 11.04
Release: 11.04

 apt-cache policy libreoffice-writer
libreoffice-writer:
  Installed: 1:3.3.3-1ubuntu2
  Candidate: 1:3.3.3-1ubuntu2
  Version table:
 *** 1:3.3.3-1ubuntu2 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty-updates/main i386 Packages
        100 /var/lib/dpkg/status
     1:3.3.2-1ubuntu4 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty/main i386 Packages

apt-cache policy unoconv
unoconv:
  Installed: 0.3-6
  Candidate: 0.3-6
  Version table:
 *** 0.3-6 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty/universe i386 Packages
        100 /var/lib/dpkg/status

apt-cache policy pdftk
pdftk:
  Installed: 1.44-1
  Candidate: 1.44-1
  Version table:
 *** 1.44-1 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty/universe i386 Packages
        100 /var/lib/dpkg/status

lsb_release -rd
 Description: Ubuntu oneiric (development branch)
 Release: 11.10

 apt-cache policy libreoffice-writer
libreoffice-writer:
  Installed: 1:3.4.2-2ubuntu3
  Candidate: 1:3.4.2-2ubuntu3
  Version table:
 *** 1:3.4.2-2ubuntu3 0
        500 http://us.archive.ubuntu.com/ubuntu/ oneiric/main i386 Packages
        100 /var/lib/dpkg/status

 apt-cache policy unoconv
 unoconv:
 Installed: 0.4-1
 Candidate: 0.4-1
 Version table:
 *** 0.4-1 0
 500 http://us.archive.ubuntu.com/ubuntu/ oneiric/universe i386 Packages
 100 /var/lib/dpkg/status

apt-cache policy pdftk
pdftk:
  Installed: 1.44-3
  Candidate: 1.44-3
  Version table:
 *** 1.44-3 0
        500 http://us.archive.ubuntu.com/ubuntu/ oneiric/universe i386 Packages
        100 /var/lib/dpkg/status

Changed in pdftk (Ubuntu):
status: Invalid → Incomplete
Revision history for this message
Johann Felix Soden (johfel) wrote :

I can reproduce this bug with pdftk 1.41+dfsg-11 but not with the current pdftk 1.44-3 version.
The reason is probable, that pdftk 1.44-* uses its own itext library (itext-paulo) like the 1.41-* versions in opposite to pdftk 1.41+dfsg-* which uses libitext-java.

Changed in pdftk (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Johann Felix Soden (johfel) wrote :

The bug is in libitext-java (iText 2.1.7) and affects only the 1.41+dfsg-* versions of pdftk.

libitext-java changes the metada of the output pdf file (which is forced by its license). It includes the text "modified using iText 2.1.7 [...]" into the producer field.

The metadata in a pdf file can additionally be saved in XMP (Extensible Metadata Platform).

OpenOffice seems to add XMP by default, Libreoffice only if it is asked (e.g. by using PDF/A-1a output).
Both write the normal metadata using UTF-16.

libitext-java converts the metadata to UTF-8 but does not delete the UTF-16 BOM which leads only to a wrong displayed producer entry if there is no XMP data, but to a fail in the XML writer code with XMP.

Changed in pdftk (Ubuntu):
assignee: nobody → Johann Felix Soden (johfel)
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.