Significant Differences in exported PDF File Size from 0.46 to 0.47pre3/4

Bug #469180 reported by ManuP
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Inkscape
Confirmed
Medium
Unassigned

Bug Description

I do export svg-files from command-line to pdf with the -A option. My pdf-files always had approx 600kb (using Inkscape 0.46). I switched to 0.47pre3. The same svg-file produces then an pdf of 2MB. Same in pre4.
People on IRC told me, this is because of embedded fonts, but both pdf-files show in the properties the same embedded fonts!

Since i had the same result with 0.46 and less use of hard disk space, i would like to have it back in 0.47. If you need to create hundrets of these pdf files, you are glad, if they take less space. Espacially if you need to upload/download to a server.

In my common sense, a file, which is lager than an other but "contains" the same, is worse than the smaller. What do you think?
Can you provide an command-line switch to use the export settings of 0.46?

I am using winxp, if you wish, i can test on linux debian.

su_v (suv-lp)
tags: added: cairo exporting
removed: export large
Revision history for this message
su_v (suv-lp) wrote :

Please attach a sample SVG and PDF (0.46 and 0.47pre) file. This is important to figure out if some hidden bug in Inkscape or maybe cairo is triggered by your specific document structure. IMHO It is unusual even in 0.47pre that a PDF file with text objects converted to paths is far smaller than the same PDF exported with text as text (cf. to the conversation on irc).

Revision history for this message
ManuP (web-zocker) wrote :

I created a sample file which might not be a minimal sample, but which shows, what i mean.
The files are sample0.46.svg which is converted with "C:\Programme\Inkscape0.46\inkscape.exe" -f "D:\tmp\tmp\sample0.46.svg" -A "D:\tmp\tmp\sample0.46.pdf" to sample0.46.pdf. The resulting file size is 21KB.

The file sample0.47pre3 is converted with "C:\Programme\Inkscape0.47pre3\inkscape.exe" -f "D:\tmp\tmp\sample0.47pre3.svg" -A "D:\tmp\tmp\sample0.47pre3.pdf" to sample0.47pre3.pdf, the resulting file size is 143KB. About factor 7, which is even more than in my original case.

I hope this helps,
for further informations, feel free to ask.

Revision history for this message
ManuP (web-zocker) wrote :

sample svg file for 0.47pre3

Revision history for this message
ManuP (web-zocker) wrote :

resulting pdf from 0.46

Revision history for this message
ManuP (web-zocker) wrote :

resulting pdf from 0.47pre3

I forgot to mention, that the problem persists in 0.47pre4.

Revision history for this message
su_v (suv-lp) wrote :

attaching PDF file exported with Inkscape 0.46+devel r22547 on OS X 10.5.8 (cairo 1.8.8)
file size: 7'703 bytes, even smaller than your 0.46 PDF file (!)

Seems like a win32-only issue, maybe related to the cairo issue revealed in bug #271695:
«It looks like cairo is embedding a new font for each string of text drawn by inkscape. The cairo freetype font backend has some code that checks if the font face is the same as a previously used font face and merges them together when embedding. The win32 font backend is not as clever.»

tags: added: regression win32
Revision history for this message
su_v (suv-lp) wrote :

The original SVG file uses 'font-family:Bitstream Vera Sans', but comparing the exported PDFs shows a difference of the embeded fonts between PDF(win32) and PDF(osx):

win32: Arial, TrueType(CID), Identity-H
osx: BitstreamVeraSans, TrueType(CID), Identity-H

Revision history for this message
ManuP (web-zocker) wrote :

I repeated the test since i forgot to assign a font to the text in svg file.
I now chose Tahoma font, issue persists.

Revision history for this message
ManuP (web-zocker) wrote :

second pdf with 0.47pre3

Revision history for this message
ManuP (web-zocker) wrote :

and the source file

Revision history for this message
su_v (suv-lp) wrote :

attaching PDF file exported with Inkscape 0.46+devel r22547 on OS X 10.5.8 (cairo 1.8.8)
file size: 10'255 bytes

Revision history for this message
Adrian Johnson (ajohnson-redneon) wrote :

Acroread seems to automatically strip repeated embedded fonts from the show fonts dialog but pdffonts (from the poppler package) shows the problem.

> pdffonts sample0.46.pdf
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
Arial CID TrueType yes no yes 5 0
Arial CID TrueType yes no yes 6 0

> pdffonts sample0.47pre3.pdf
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
Arial CID TrueType yes no yes 5 0
Arial CID TrueType yes no yes 6 0
Arial CID TrueType yes no yes 7 0
Arial CID TrueType yes no yes 8 0
Arial CID TrueType yes no yes 9 0
Arial CID TrueType yes no yes 10 0
Arial CID TrueType yes no yes 11 0
Arial CID TrueType yes no yes 12 0
Arial CID TrueType yes no yes 13 0
Arial CID TrueType yes no yes 14 0
Arial CID TrueType yes no yes 15 0
Arial CID TrueType yes no yes 16 0
Arial CID TrueType yes no yes 17 0
Arial CID TrueType yes no yes 18 0
Arial CID TrueType yes no yes 19 0
Arial CID TrueType yes no yes 20 0
Arial CID TrueType yes no yes 21 0

It is the same issue as bug 271695. Inkscape 0.47 seems to be creating a new cairo_font_face more frequently than in 0.46. But in any case it is a cairo bug. Cairo should be merging multiple uses of the same font into the same subset.

Revision history for this message
su_v (suv-lp) wrote :

reported upstream with cairo:
Bug 24849 – Inkscape PS/EPS/PDF files exported on win32 have too many fonts embedded:
<https://bugs.freedesktop.org/show_bug.cgi?id=24849>

Revision history for this message
su_v (suv-lp) wrote :

Setting status to 'Confirmed' in Inkscape because «(…). It may be possible for inkscape to do a better job of using pango and cairo to draw text». For details see this comment by Adrian Johnson <https://bugs.launchpad.net/inkscape/+bug/271695/comments/42>.

Changed in inkscape:
importance: Undecided → Medium
status: New → Confirmed
summary: - Significant Differences in exported File Size from 0.46 to 0.47pre3/4
+ Significant Differences in exported PDF File Size from 0.46 to
+ 0.47pre3/4
Revision history for this message
su_v (suv-lp) wrote :

@Adrian - pdffonts used on the PDFs generated on osx reveals that the original font name 'Tahoma' in sample2 is not kept. Is this to be expected or does it reveal a font configuration issue on osx?

LeWitt:bug suv$ pdffonts 469180-sample0.46+devel-r22547-osx.pdf
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
BitstreamVeraSans CID TrueType yes no yes 5 0
LeWitt:bug suv$
LeWitt:bug suv$ pdffonts 469180-sample2-0.46+devel-r22547-osx.pdf
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
CairoFont-0-0 CID TrueType yes no yes 5 0
LeWitt:bug suv$

Fontbook Font Info for 'Bitstream Vera Sans':
PostScript name BitstreamVeraSans-Roman
      Full name Bitstream Vera Sans
         Family Bitstream Vera Sans
          Style Roman
           Kind TrueType
        Version Release 1.10
       Location /Users/suv/Library/Fonts/Vera.ttf
    Unique name Bitstream Vera Sans
 Copy protected No
     Embeddable Yes

Fontbook Font Info for 'Tahoma':
PostScript name Tahoma
      Full name Tahoma
         Family Tahoma
          Style Regular
           Kind TrueType
        Version Version 5.01.2x
       Location /Library/Fonts/Tahoma.ttf
    Unique name Microsoft Tahoma Regular
 Copy protected No
     Embeddable Yes

Revision history for this message
Adrian Johnson (ajohnson-redneon) wrote :

There are a couple of reasons why cairo would use the name "CairoFont-x-y" in a subsetted font:
1) error while subsetting the font
2) error getting the font name

Cairo subsets fonts by stripping out unused glyphs. The used glyphs are preserved in their original form so all hinting is preserved. As this requires a detailed knowledge of the font format to go though the font and pick out the required parts, it can happen that cairo finds something it doesn't know how to deal with, either due to a bug in the font, or maybe part of the font spec that cairo has not implemented. In this case a fallback font is embedded with the name CairoFont-x-y.

A fallback font is a font that is created by obtaining the paths for each used glyph from freetype or windows and creating a new font where the glyphs are created from these paths. This has the downside of losing all the hinting but it is almost always guaranteed to work since if you can see the font on the screen freetype/windows must be able to obtain the glyph paths and turning these paths into a font is not very hard to do.

Fallback fonts are always embedded in PDF as CID CFF ("CID Type 0C" in pdffonts). Since your Tahoma font is embedded as TrueType, the subsetting worked but getting the font name out of the font failed.

See also this message explaining how to interpret the font type displayed by pdffonts on cairo generated PDFs:
http://lists.cairographics.org/archives/cairo/2007-September/011496.html

Revision history for this message
su_v (suv-lp) wrote :

JFTR - On 5/11/09 12:51, Adrian Johnson wrote:
> ~suv wrote:
>> I tested all system-installed fonts with Inkscape 0.47pre4. Exported to
>> PDF, 'Tahoma' seems the only one that pdffonts lists as embedded
>> 'CairoFont' fallback font.
>
> I've found the problem. Fonts can contain a Mac platform name (the font
> name in MacRoman encoding) and/or a Windows platform name (the font name
> in unicode encoding). Cairo only uses the Mac name. Tahoma only contains
> the Windows name. I'll fix cairo sometime to use the Windows name as well.

Linking this report as duplicate to bug #271695 “Kerned text embeds too many fonts and blows up the file size in EPS/PS”:
<https://bugs.launchpad.net/inkscape/+bug/271695>.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.