PDF import changes font case randomly

Bug #199689 reported by jbeale
4
Affects Status Importance Assigned to Milestone
Inkscape
Fix Released
Undecided
Unassigned

Bug Description

Filename of downloaded executagle: Inkscape0803061655.7z
Version: Inkscape 0.46+devel, built Mar 6 2008. Running on Windows XP SP2, 2 GB RAM.

Problem: Imported PDF file shows some letters which should be capital letters as lower-case letters (and spacing is off, probably related). File looks fine in Adobe Acrobat 7.0 Pro. Screenshots of the same PDF page in both Inkscape .46 and Acrobat 7 is attached, showing the difference. PDF file type is %PDF-1.4

The problem PDF contains these embedded fonts:
EdwardianScriptITC (Subset)
GoudyOldStyleT-Bold-SC700 (Subset)
GoudyOldStyleT-Italic (Subset)
GoudyOldStyleT-Regular (Subset)
GoudyOldStyleT-Regular-SC700 (Subset)

I can upload the problem PDF somewhere if it stays private. Not sure my client would be cool with having it world-readable.

Revision history for this message
jbeale (beale) wrote :
Revision history for this message
jbeale (beale) wrote :

Note: Ghostview 4.8 / Ghostscript 8.54, which is a version from 2006 also renders the PDF document correctly.

Revision history for this message
jbeale (beale) wrote :

Another note: the original PDF has several pages. I exported just the example page (p.4) into another PDF file from Acrobat by using "print to Adobe PDF". The new single-page PDF file now shows no embedded fonts. Acrobat 7 renders it correctly, gs 8.54 renders it correctly, but Inkscape renders it with the wrong fonts (looks like everything becomes Arial, I think). Screenshot from Inkscape enclosed.

Revision history for this message
Mark Everitt (mark-s-everitt) wrote : A (sort of) solution

One way I've found of getting around this issue (I googled it a while back and found it) is to get ghostscript to embed the fonts in the document. My understanding is that a certain set of fonts are supposed to be a part of every pdf viewer, so these are often not embedded in the document itself. This trick tells ghostscript to change the text into curves etc. and hard code them into your document. The side effect is that you can no longer select text, but at least it looks identical.

gs -sDEVICE=pswrite -dNOCACHE -sOutputFile=nofont-MyDocument.ps -q -dbatch -dNOPAUSE MyDocument.pdf -c quit
ps2pdf nofont-MyDocument.ps nofont-MyDocument.pdf

Where you replace MyDocument.pdf with your pdf or ps document. I've not tried this on windows, but it works on Linux and OSX.

Revision history for this message
jazzynico (jazzynico) wrote :

Not reproduced with build 20932, Windows XP SP3.
Tested with a scribus generated PDF, embedding GoudyOldStyleT-Regular and GoudyOldStyleT-Italic fonts, as fonts (not vectors).
The fonts are converted to a default Sans font (close to Bitstream Vera Sans, but not exactly), but capital letters are not changed ito lower case.
Could you confirm you still have this bug, and add a failing PDF file?

Changed in inkscape:
status: New → Incomplete
Revision history for this message
jbeale (beale) wrote :

I no longer have the original PDF from my original bug submission a year ago. However, the attached simple example PDF shows problems with EdwardianScriptITC font, when imported into Inkscape 0.46 (built April 1 2008).

The GoudyOldStyleT-Regular font renders OK this time, although it did not in the original example and I'm not sure what's different. The current example PDF was generated by MS Word using Adobe PDF as a printer device.

Revision history for this message
jazzynico (jazzynico) wrote :

Strange font behavior confirmed with build 21024, Windows XP sp3.

Both fonts are installed on my machine, but:
1. EdwardianScriptITC text isn't rendered correctly. The text value (with XML editor) is "hÇáxÜ cÜÉzÜtÅÅ"...
2. Fonts attributes are ok (font-family:Edwardian Script ITC;-inkscape-font-specification:EdwardianScriptITC), but they are shown as Sans in the font list (in the font tool control bar).
3. Selecting a text with the text tool generates a console warning message:
** (inkscape.exe:24532): WARNING **: Family name Sans does not have an entry in the font lister.

Changed in inkscape:
status: Incomplete → Confirmed
Revision history for this message
jazzynico (jazzynico) wrote :
Revision history for this message
jbeale (beale) wrote :

The problems of the letters being scrambled, at least with "Edwardian Script ITC" is not limited to Inkscape. In a simple test PDF generated by Ghostscript using that font, it displays fine in Adobe Acrobat (Windows XP), but highlighting and copying that text string in the PDF viewer then pasting into Notepad, gives a mangled string "Y|Üáà átÅÑÄx áxÇàxÇvxA fxvÉÇw tààxÅÑàA" which is exactly how it appears in Inkscape after PDF import. See attached files.

Revision history for this message
jbeale (beale) wrote :

Here is the PDF file using "Edwardian Script ITC" font, shown in the screen capture on my previous comment.

Revision history for this message
jbeale (beale) wrote :

I thought this might be a bug in Ghostscript, which generated the above PDF ( http://launchpadlibrarian.net/25800877/test3.pdf ) so I filed the problem as a bug with Ghostscript. According to Ken Sharp's reply to my bug, it seems to be a non-standard encoding problem with the font, see below:

---------
http://bugs.ghostscript.com/show_bug.cgi?id=690440
<email address hidden> changed:

Status: RESOLVED
Resolution: INVALID

------- Additional Comments From <email address hidden> 2009-04-21 10:50 -------
The font in question is a TrueType font embedded as a subset without a ToUnicode
CMap, and using a custom encoding. For example /Y (capital Y) is encoded at
position 1. In addition the glyph names in the encoding are not what one would
expect, I would expect to see /F, /i /r, /s, /t and so on. Instead I see /Y /bar
/Udieresis /aacute etc.

So there is no Unicode information, and the encoding is non standard. In this
case Acrobat falls back to translating the glyph names into their ASCII
equivalents (when possible). Using the Encoding to map from the character codes
to the glyph names we see that we get /Y /bar /Udieresis /aacute /agrave /space
/aacute /t and so on, which matches what you get when you copy and paste.

Its impossible to tell from the PDF file why the file was created this way, one
would have to guess that the file was created from a PostScript file which had
re-encoded the font like this, so that the PDF file had to be made the same way.

I don't see a bug here, possibly (given that the PDF file was created by GS
8.63) there is a bug in pdfwrite which caused the encoding oddness, btu that
can't be determined without seeing the PostScript file.
---

Revision history for this message
jbeale (beale) wrote :

The PDF was generated from Microsoft Office Word 2003 (11.8169.8172) SP3 printing to CutePDF
http://www.cutepdf.com/products/cutepdf/Writer.asp#download. I assume CutePDF pretends to be a plain-vanilla PS printer device, and it then uses PS2PDF with Ghostscript as a backend to write out the PDF file. I assume there is a temporary PS file generated somewhere in the process, but CutePDF deletes it after use.

I'm ready to believe that MS Word does any kind of nasty thing when subsetting the font, especially given it's just printing, and doesn't realize it's actually making a PDF file.

Revision history for this message
jazzynico (jazzynico) wrote :

I've tried with OpenOffice.org 3 (and Edwardian Script ITC), and the generated PDF doesn't show those strange strings.
Don't know which backend it uses, but I guess our bug isn't caused by Inkscape itself.

Revision history for this message
jbeale (beale) wrote :

Confirmed here as well. OpenOffice 3 "File->Export as PDF..." works OK. However, print to PS printer and then convert via pdfwrite in Ghostscript, or print to CutePDF (basically same thing) gives garbled text, due to a font subset with nonstandard character encoding. This problem is specific to some fonts: a more standard one ("Times New Roman") works fine even via CutePDF, etc.

Bottom line: OpenOffice PDF export generates better quality PDF than print-to-PDF from either OO or MS Word. I guess it is not too surprising.

Revision history for this message
jbeale (beale) wrote :

To followup, I was able to import the PDF generated by OO3 PDF Export correctly in "Inkscape 0.46+devel r21167, built Apr 17 2009" with the "PDF Import Settings" checkbox selected "Replace PDF fonts by closest-named installed fonts". The imported PDF renders correctly, and the text could be selected and edited. Success!

The earlier version (Inkscape 0.46, built Apr 1 2008) rendered the text in the wrong font (Arial?), because it failed to load the font "EdwardianScriptITC" when it really wanted "Edwardian Script ITC".

Revision history for this message
su_v (suv-lp) wrote :

> I was able to import the PDF generated by OO3 PDF Export
> correctly in "Inkscape 0.46+devel r21167, built Apr 17 2009"

@jbeale - can this report be closed as 'Fix Released' (Milestone 0.47), or do you still have the same or similar issue(s) in the current stable version (0.48) or recent development builds?

Changed in inkscape:
status: Confirmed → Incomplete
Revision history for this message
jbeale (beale) wrote :

No issues with viewing and editing my OO3 sample PDF using the current stable version, Inkscape 0.48.0 r9654
As far as I am concerned, the bug is fixed so please close the report, thanks!

Revision history for this message
jbeale (beale) wrote : Re: [Bug 199689] Re: PDF import changes font case randomly

I have no issues with viewing and editing my OO3 sample PDF using the current stable version, Inkscape 0.48.0 r9654
As far as I am concerned, the bug is fixed so please close the report, thanks!

-John Beale

> -------Original Message-------
> From: ~suv <email address hidden>
> To: <email address hidden>
> Subject: [Bug 199689] Re: PDF import changes font case randomly
> Sent: 04 Nov '10 06:57
>
> > I was able to import the PDF generated by OO3 PDF Export
> > correctly in "Inkscape 0.46+devel r21167, built Apr 17 2009"
>
> @jbeale - can this report be closed as 'Fix Released' (Milestone 0.47),
> or do you still have the same or similar issue(s) in the current stable
> version (0.48) or recent development builds?
>
> ** Changed in: inkscape
>        Status: Confirmed => Incomplete
>
> --
> PDF import changes font case randomly
> https://bugs.launchpad.net/bugs/199689
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Inkscape: A Vector Drawing Tool: Incomplete
>
> Bug description:
> Filename of downloaded executagle:  Inkscape0803061655.7z
> Version: Inkscape 0.46+devel, built Mar 6 2008. Running on Windows XP SP2, 2 GB RAM.
>
> Problem: Imported PDF file shows some letters which should be capital letters as lower-case letters (and spacing is off, probably related).  File looks fine in Adobe Acrobat 7.0 Pro.  Screenshots of the same PDF page in both Inkscape .46 and Acrobat 7 is attached, showing the difference.  PDF file type is %PDF-1.4
>
> The problem PDF contains these embedded fonts:
> EdwardianScriptITC (Subset)
> GoudyOldStyleT-Bold-SC700 (Subset)
> GoudyOldStyleT-Italic (Subset)
> GoudyOldStyleT-Regular (Subset)
> GoudyOldStyleT-Regular-SC700 (Subset)
>
> I can upload the problem PDF somewhere if it stays private. Not sure my client would be cool with having it world-readable.
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/inkscape/+bug/199689/+subscribe
>

Revision history for this message
su_v (suv-lp) wrote :

Thank you for testing and confirming the issue solved.

Changed in inkscape:
milestone: none → 0.47
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.