Symbol font character for A0-FF generally wrong

Bug #948245 reported by David Mathog
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Inkscape
Incomplete
Low
David Mathog

Bug Description

WIndows XP SP3, trunk (last week) and 0.48.2

Entering characters with Symbol font for values <A0 results in the display of the correct character.

Entering characters with values A0 to FF generally results in the wrong character being displayed. I have not been
able to figure out the pattern, some show the right glyph, most don't. The value entered seems to be stored correctly. For instance ^U B3 appears as a "superscript 3" when it should be "greater than or equal to". However, save as EMF, import into
powerpoint, and the expected ">=" character appears.

The character shown can change between inkscape versions. In last week's trunk symbol A5 shows up as what looks like a times symbol (an X), whereas in 0.48.2 it is a capital Y with two horizontal lines through it.

Tags: fonts win32

Related branches

Revision history for this message
David Mathog (mathog) wrote :
Revision history for this message
David Mathog (mathog) wrote :

Screen shot, last week's trunk, showing what happens for two characters. The should be uses Arial and other unicodes, the shows as uses symbol with the given character code.

Revision history for this message
David Mathog (mathog) wrote :

This may be a related problem. In order to try to debug this issue an EMF was prepared (attached) with the character symbols for three rows of symbol font, starting at A0, B0, and C0. Inkscape crashes when it tries to open it.

Revision history for this message
David Mathog (mathog) wrote :

Yuck, gimp 2.6.10 on the same PC has the same font issues as 0.48.2. Pango issue?

Revision history for this message
David Mathog (mathog) wrote :

Managed to import rows A,B,C into Inkscape using WMF (which did not blow up, whereas EMF did). Comparison of screen shots
of this in inkscape vs. Powerpoint attached. (PPT view is what it should look like).

Revision history for this message
David Mathog (mathog) wrote :

Modified test svg, with rows A,B,C of symbol and same in Arial. The Arial font characters are shown properly, but the Symbol
font ones do not match the font table.

Revision history for this message
David Mathog (mathog) wrote :

Screen shot of modified test file

su_v (suv-lp)
tags: added: fonts win32
Revision history for this message
su_v (suv-lp) wrote :

> In last week's trunk symbol A5 shows up as what looks like a times
> symbol (an X), whereas in 0.48.2 it is a capital Y with two horizontal
> lines through it.

Just for the record: attaching screenshots with 0.48.2 and current trunk on OS X Lion.

Installed 'Symbol' font:
  /System/Library/Fonts/Symbol.ttf
  © 1990-99 Apple Computer Inc. © 1990-91 Bitstream Inc.

-> the observed changes on Windows are possibly related to r10742 (to address bug #165665).

Revision history for this message
su_v (suv-lp) wrote :
Revision history for this message
su_v (suv-lp) wrote : Re: [Bug 948245] Re: Symbol font character for A0-FF generally wrong

Attaching the first line of text in the sample SVG file
'symbol_font_bug2.svg' as seen in the preview of the stock font selector
of GTK+ (2.24.10) from gtk-demo, once with the font 'Symbol' and once
with the font 'Arial'.

Since the font chooser previews the text with the same symbols as
Inkscape renders it on-canvas, I'd say this is an issue outside of Inkscape.

Revision history for this message
David Mathog (mathog) wrote :

Seems to be a Pango problem. I wrote some little test programs based on "Hello World" with some UTF8 characters
replacing part of the string and built them against devlibs. The Cairo version properly rendered using the symbol font. Oddly the Pango version not only messed up the UTF8 characters (with the substitutions noted in post 6 above) but also failed to render any other latin characters in Greek (r not rho, for instance). So not only does it seem to be in Pango,but Pango seems to do slightly different things in Inkscape and the test program.

The attached zip has Pango, cairo, and gtk test programs (there is also one in there called just "test"
which won't build, not sure why.)

Revision history for this message
David Mathog (mathog) wrote :

Hmm, output from the Pango tests look the same for "Symbol", "Wingdings", and "Bookshelf Symbol 7". I think it must be ignoring these and silently falling back to some default font.

Revision history for this message
David Mathog (mathog) wrote :

Test SVG referenced in the next post. It consists of the following characters in Symbol format and
displays as described:

hex (desired glyph) (displayed glyph) (displayed font) (displayed glyph # (hex))
61 lc alpha lc alpha SymbolMT 44
62 lc beta lc beta SymbolMT 45
a5 infinity X (like a times symbol) SymbolMT 75
b3 ge (>=) superscript 3 BitstreamVeraSans-Roman F2

Revision history for this message
David Mathog (mathog) wrote :

Debugging print commands were placed in various locations in the libnrtype code as described in the attache file (see the "in..." lines).

The test file of the preceding post was opened, and then Inkscape closed. Nothing else was done.

One thing I am sure of - this line in FontInstance.cpp

    if ( theFace ) {
        FT_Select_Charmap(theFace,ft_encoding_unicode) && FT_Select_Charmap(theFace,ft_encoding_symbol);
    }

is not doing anything. In the test it was removed and replaced with consecutive tests trying to set unicode, then apple roman, then symbol. Also tests forcing apple_roman or symbol, or not setting any charmap at all (There were hex dunmps of the charmap before and after, these have been edited out of the attachment for clarity). The screen output never changed. Pango is evidently overriding/ignoring the freetype charmap. Also, ft_encoding_symbol sets the charmap in the PUA region 0xF020-0xF0FF, whereas apple roman sets it in 0x20-0xFF. Other than the offset, the maps are the same. Since I already have code to thunk Symbol characters down from the PUA to the latin area on EMF input, I have been using Apple Roman for most of these tests. (Not that it mattered, in the end.) (Yes, tested it with unicode in the PUA range, and those
characters were invisible.)

The first thing we see in the test is that font_instance::InitTheFace is called 3 times with SymbolMT. I'm not sure what calls it. All the calls are for Symbol. (Note, charmap <X> document each call to select_charmap, with a different charmap, with the return status shown. 0 is success.)

This is then followed by several more calls to InitTheFace from Layout::Calculator::_buildPangoItemizationForPara. The first
one calls once with SymbolMT, and the second one calls 3 times with BitstreamVeraSans-Roman.

This is then followed by two more calls to InitTheFace from who knows where, one for SymbolMT one for BitstreamVeraSans-Roman.

Anyway, by this time Pango has done something bizarre to the simple 4 character string, having mapped it onto two fonts
and having changed the glyphs for the two above 0x80. This can be seen in the LoadGlyph calls.

There are then a whole lot more calls to Layout::bounds - 8 sets of 4 (1 per glyph). This follows the set of 4 involved
in the actions of the previous paragraph.

Bottom line:

1. InitTheFace seems to be called many, many more times than is needed.
2. Pango is ignoring Freetype's Charmap and coming up with its own.
3. LoadGlyph works as it should, it is only called once per glyph
4. Layout::bounds is also being called many more times than is needed. (9 times per string)

also...

5. the lines of code cited above use a deprecated flag to set the charmap (not that it matters, as it turns out). Should be
FT_ENCODING_APPLE_ROMAN, FT_ENCODING_UNICODE

I have no experience with this sort of code, but it looks seriously broken to me, with routines being called repeatedly, presumably because the results are not being stored. Or because the code is just wrong. There must be a performance penalty for all of these calls, and this was on an example with just one line of 4 characters!

Revision history for this message
David Mathog (mathog) wrote :

This is a similar analysis as the preceding, but after the input file has been changed from Symbol -> Arial for all characters. Again, just open and close the test file. It has the same general pattern, with what looks like lots of extra calls.

Revision history for this message
David Mathog (mathog) wrote :

The debug_small.txt file in post 14 had some BEFORE and AFTER lines in it. Ignore them, they were associated with the charmap dump and should have (also) been edited out.)

Kris (kris-degussem)
Changed in inkscape:
status: New → In Progress
importance: Undecided → Low
assignee: nobody → David Mathog (mathog)
Revision history for this message
su_v (suv-lp) wrote :

@David - could you please comment on the current status of this report?
(Was it addressed with the merge of your branch in revision 12488, or - if not - do you plan to continue working on it?)

Changed in inkscape:
status: In Progress → Incomplete
Revision history for this message
David Mathog (mathog) wrote :

The issue has not been resolved per se. I am not sure that it needs to be.

The issue is that Symbol is not a unicode font. If one uses the ^U method to enter A5 for symbol "greater than or equal to" is falls over to some other font, possibly Arial, and produces a "superscript 3" glyph (unicode code point A5).

Inkscape has Symbol as a supported font. (It isn't really though, only the first 128 values are. ) On import from EMF, for instance, such characters are common. However, since libunicode-convert was added to lp988601 these are automatically converted to a font other than Symbol, and a unicode code point. So even though Symbol font is still selectable in Inkscape, there is no reason to use it anymore, at least, not as far as going into or out of EMF is concerned. Instead one would use ^U2265 to enter the "greater than or equal to" code point. If this drawing was exported to EMF, it would (optionally) be converted to Symbol A5 on the fly, if it stays in SVG, it keeps using the Unicode value.

I imagine the issue may still come in the context of other imports/exports, for instane, from CGM or PDF.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.