ghostscript fails to correctly substitute cidf fonts

Bug #1438494 reported by Bill
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
poppler-data (Ubuntu)
Triaged
Undecided
Unassigned

Bug Description

I reported this bug first with ghostscript (http://bugs.ghostscript.com/show_bug.cgi?id=695874), but it is apparently a packaging issue.

The general issue is that when there is a CIDF font in a .pdf that is not embedded in the document, ghostscript will stop processing the file at that font (see the file attached to the ghostscript bug). The packaging issue is that CIDF fonts that are not on the system are in the default Ubuntu cidfmap file.

System info is:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.10
Release: 14.10
Codename: utopic

$ ghostscript --version
9.14

Revision history for this message
cliddell (cjl) wrote :

The root problem here is that the Ubuntu package contains cidfmap mappings that provide substitutions for various CIDFonts that may not be embedded in incoming files. But the Ghostscript package does not depend on the package(s) containing those font files, so those font files are often not available.

Historically, Ghostscript has assumed that such "system level" configuration was correct, and did minimal error checking on them, thus by the time Ghostscript realises the font files are not available, it is too late to recover gracefully, and we have to error out.

The most recent Ghostscript releases are more rigorous in that area, and should cope better.

Nevertheless, it would be preferable if the fonts references in the mappings were made dependencies of the Ghostscipt package.

*Or* remove those mappings altogether, as they are much less relevant since we now have a built-in CIDFont substitution in Ghostscript, using DroidSansFallback.ttf.

A final suggestion would be to remove the mappings from the default Ghostscript package, and rely on the DroidSansFallback.ttf substitution, and move the existing mappings to a separate package which holds the mapping configuration, and depends on the packages containing the relevant font files.

I feel the last suggestion would be the most desirable, since the DroidSansFallback.ttf substitution will work work just fine for the vast majority of people, who don't need to be forced to install a load of KANJI fonts, but allows the flexibility for those who genuinely need more accurate CIDFont substitution than simply falling back to DroidSansFallback.ttf.

Finally, the "generic" mappings such as "Adobe-Identity" and "Adobe-Japan1" should be removed altogether as, except in very rare circumstances, those should all fall through to the DroidSansFallback.ttf substitution.

I can provide a complete list of those "generic" mappings if required.....

Chris

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Chris, thank you for the info.

I would be very grateful for further help. Which font mapping file do I have to remove/move out into a separate package? Which font mappings (in which files?) do I have to remove altogether? When packaging Ghostscript in the many years up to now I did nearly no changes in included font mappings or fonts as it usually worked, so I do not have much experience in modifying font mappings.

Revision history for this message
cliddell (cjl) wrote :

Apologies, Till, for the delayed reply - I *thought* replying on a bug also subscribed me to it, but clearly not! (I have subscribed now).

There is a bit of guess work here, as I don't fully understand the file locations.

We are mainly concerned with the cidfmap file. Now, there is a set of cidfmap files in "/etc/ghostscript/cidfmap.d/" and those (it appears) are used by the "/usr/sbin/update-gsfontmap" script (poor name, as it adds to the confusion that Fonts and CIDFonts are the same thing!), to update the *actual* cidfmap which is in "/var/lib/ghostscript/fonts/cidfmap". It is not at all clear to me how the "update-gsfontmap" script gets run - possibly only as a package post-install step?

The files in "/etc/ghostscript/cidfmap.d/" are as follows (file name + TTF font(s) referenced):

90gs-cjk-resource-cns1.conf - ukai.ttc, uming.ttc
90gs-cjk-resource-gb1.conf - ukai.ttc, uming.ttc
90gs-cjk-resource-japan1.conf - fonts-japanese-mincho.ttf, fonts-japanese-gothic.ttf
90gs-cjk-resource-japan2.conf - ttf-japanese-mincho.ttf, ttf-japanese-gothic.ttf
90gs-cjk-resource-korea1.conf - NanumMyeongjo.ttf, NanumBarunGothic.ttf, NanumBarunGothicBold.ttf, NanumGothic.ttf

NOTE: there is some inconsistency (possibly bitrot) there with "fonts-japanese-*.ttf" used in one file and "ttf-japanese-*.ttf" used in another - clearly the same font, but likely different "generations" of name.

My two alternate solutions are that the Ghostscript package should be augmented to include the fonts listed above (with the names and paths updated to reflect the current directory tree etc) as dependencies, thus they always get installed with Ghostscript.

*Or* to split off (I *think*) the files in "/etc/ghostscript/cidfmap.d/" into something like a "ghostscript-cjk-cidfonts" package, which has those fonts listed above as dependencies (again with names and paths revised for a modern system).

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

The update-gsfontmap is indeed run in a post-install script, in the one of the ghostscript package. It should get run on every change in the directories /etc/ghostscript/cidfmap.d/ and /etc/ghostscript/fontmap.d/ as other packages than ghostscript, for example font packages, could drop files here. Indeed the files in /etc/ghostscript/cidfmap.d/ come from the poppler-data package. I could not determine which package(s) hold the actal fonts though.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Font packages in Ubuntu, providing the needed CJK fonts are (except the redundant/obsolete names of 90gs-cjk-resource-japan2.conf):

fonts-arphic-ukai:
/usr/share/fonts/truetype/arphic/ukai.ttc

fonts-arphic-uming:
/usr/share/fonts/truetype/arphic/uming.ttc

fonts-takao-pgothic:
/usr/share/fonts/truetype/takao-gothic/TakaoPGothic.ttf
/usr/share/fonts/truetype/fonts-japanese-gothic.ttf -> /etc/alternatives/fonts-japanese-gothic.ttf -> /usr/share/fonts/truetype/takao-gothic/TakaoPGothic.ttf

fonts-hanazono:
/usr/share/fonts/truetype/hanazono/HanaMinA.ttf
/usr/share/fonts/truetype/fonts-japanese-mincho.ttf -> /etc/alternatives/fonts-japanese-mincho.ttf -> /usr/share/fonts/truetype/hanazono/HanaMinA.ttf

fonts-nanum:
/usr/share/fonts/truetype/nanum/NanumGothic.ttf
/usr/share/fonts/truetype/nanum/NanumBarunGothicBold.ttf
/usr/share/fonts/truetype/nanum/NanumMyeongjo.ttf
/usr/share/fonts/truetype/nanum/NanumBarunGothic.ttf

So splitting the fontmap files out of poppler-data and letting the new binary file depend on the above listed packages should fix this bug.

Changed in ghostscript (Ubuntu):
status: New → Triaged
affects: ghostscript (Ubuntu) → poppler-data (Ubuntu)
Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Ghostscript will falkl back to DroidSansFallback.ttf, but the question is whether Poppler will do it, too.

Revision history for this message
cliddell (cjl) wrote :

Two things:

1) I *really* don't understand why Ghostscript configuration file are being installed by poppler. It would be worth finding out how (and even if) poppler actually uses them, because I rather feel poppler and Ghostscript configurations *should* be separate. For example, if I get time, I'll probably be tweaking the capabilities of cidfmap at some point, which could, potentially, break poppler's use of these files.

2) the question of whether poppler will fall back to some other substitute CIDFont is moot since, if poppler *does* use those configuration files, it won't (normally) find the font files they reference anyway. So even if poppler does use them, splitting them off into a separate package and fixing the dependencies will work better for poppler, too.

Revision history for this message
Thorsten (thorstenr-42) wrote :

Does this bug also occur if there are no embedded fonts in a pdf? Because my canon scanner seems to produce pdfs without embedded fonts, at least pdffonts shows no fonts:
$ pdffonts SCN_0002.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------

 and i get a very similar error when trying to minimize the filesize with ghostscript:

$ gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -sOutputFile=out.pdf SCN_0002.pdf
GPL Ghostscript 9.16 (2015-03-30)
Copyright (C) 2015 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
   **** Error reading a content stream. The page may be incomplete.
   **** File did not complete the page properly and may be damaged.

   **** This file had errors that were repaired or ignored.
   **** The file was produced by:
   **** >>>> MP540 series <<<<
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

With the minimal gs version from http://www.ghostscript.com/download/gsdnld.html its working flawlessly.

i attached a pdf created by my scanner to this comment.

(i m using ubuntu 15.10)

Revision history for this message
cliddell (cjl) wrote :

Your file contains neither fonts nor CIDFonts, it is simply one big image. Whilst it is common practice for scanner produced PDF to use OCR to overlay the scanned image with non-marking characters (obviously, being non-marking, the actual font used does not really matter), this file does not do so. I'd guess that's because either the OCR function was disabled, or it simply could not recognise the handwritten characters.

Anyway, as our 9.16 release fails with the same error as you saw (i.e. the error is not caused by the Ubuntu packaging), but the 9.18 release (which I assume is what you tested) works without error, I had a hunt, and found that the fix is this one:
http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=668406a5

I would suggest, if you want the maintainer to pull in this patch, you *may* want to open a new bug report, referencing the above commit.

Revision history for this message
Thorsten (thorstenr-42) wrote :

Thanks a lot! I created a new bug report, which is hopefully accurate enough: https://bugs.launchpad.net/ubuntu/+source/ghostscript/+bug/1525225

Revision history for this message
wang haisheng (edwin-uestc) wrote :

➜ example git:(master) ✗ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenial
➜ pdf2xml-viewer git:(master) ✗ pdftohtml
pdftohtml version 0.41.0
Copyright 2005-2016 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2011 Glyph & Cog, LLC

poppler-data is already the newest version (0.4.7-7).

➜ example git:(master) ✗ pdffonts test.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
OCVNVZ+KaiTi_GB2312 TrueType WinAnsi yes yes yes 19 0
JSRZNG+SimSun TrueType WinAnsi yes yes yes 8 0

➜ example git:(master) ✗ pdftohtml -c -hidden -enc UTF-8 -xml test.pdf test-utf8.xml
Page-1

i could not get correct Chinese characters

test file is here
link: https://pan.baidu.com/s/1dFiSrDn
password: ai5u

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.