trunk: kerning of Telugu fonts is broken (rev >= 12488)

Bug #1282968 reported by వీవెన్ on 2014-02-21
This bug affects 3 people
Affects Status Importance Assigned to Milestone
David Mathog

Bug Description

Inkscape 0.48+devel r13012 on Windows 7 machine

Telugu fonts are not kerned in Windows dev builds. I didn't get a chance to check in GNU\Linux yet.

Attached SVG file contains the word "Inkscape" written in Telugu as a test case.

వీవెన్ (veeven) wrote :
వీవెన్ (veeven) wrote :

This is how r13012 renders it.

వీవెన్ (veeven) wrote :

Here is rendering from latest stable, Inkscape 0.48.4 r9939

Please note, this is also not fully correct. ZWNJ character should not have produced space.

వీవెన్ (veeven) wrote :

This is the expected rendering.

su_v (suv-lp) wrote :

Reproduced on OS X 10.7.5 (the rendering in trunk on OS X though looks slightly different in comparison with the image exported with Windows dev build - see attached screenshot with stable (left) and trunk (right)).

Lohit Tegelu font downladed from here:

tags: added: regression text
Changed in inkscape:
milestone: none → 0.91
status: New → Confirmed
su_v (suv-lp) wrote :

Based on tests with archived builds:
- not reproduced with rev <= 12487
- reproduced with rev >= 12488
this regression was introduced with the merge of the EMF/WMF branch:

summary: - kerning of Telugu fonts is broken
+ trunk: kerning of Telugu fonts is broken (rev >= 12488)
Changed in inkscape:
importance: Undecided → Medium
jazzynico (jazzynico) wrote :

Reproduced on Windows XP, Inkscape trunk revision 13064.
Not reproduced with revision 12541 (not wonderful, see attachment).

Changed in inkscape:
status: Confirmed → Triaged
su_v (suv-lp) wrote :

@David Mathog - any chance you could take a closer look at this report (bug #1304602 is likely a duplicate, or at least closely related) to figure out what changed with the merge of the EMF/WMF branch in rev 12488?

David Mathog (mathog) wrote :

I will have a look anyway, but please tell me exactly which font this is and where to obtain it. Problems with fonts often come down to hidden font substitutions, so I want to be sure that my test system has exactly the same "Telugu" as the OP uses. (At the moment there is no font by that name on any of my systems.)

Questions about that language:

1. Is that language Right to Left or Left to Right?
2 Does it have characters that modify or overwrite other characters without advancing the position?

The EMF/WMF branch should not in general have affected font rendering from an SVG file, but I recall also fixing problems with Right to Left languages at the same time and modifiers (Hebrew vowels) that might have caused this. That issue did affect kerning because hebrew vowels are written over the primary character and the kerning code had to be modified to recognize which unicode characters called for an advance and which didn't. Off the top of my head, and looking at the first png, which seems to be overwritten, that is most likely the issue here. (Different advance/no advance rules for this language, possibly some other special subclass of unicode codes for those letters?)

David Mathog (mathog) wrote :

Also, will the OP please save that test phrase into an EMF file (for instance, from Powerpoint, select, save as picture, format EMF). Then if it looks OK in windows Preview, or when read back in by PowerPoint, read it into Inkscape and see what it does. The modifications for "Mark no advance" characters are in libTERE which EMF/WMF use to reassemble incoming text into editable text (as opposed to an assemblage of placed text snippets that cannot be edited as a whole). I'm wondering if the final SVG produced that way will have the same issue.

I think this note in Layout-TNG-Compute.cpp is probably near the source of the problem:

                    /* Notes as of 4/29/13. Pango_shape is not generating English language ligatures, but it is generating
                    them for Hebrew (and probably other similar languages). In the case observed 3 unicode characters (a base
                    and 2 Mark, nonspacings) are merged into two glyphs (the base + first Mn, the 2nd Mn). All of these map
                    from glyph to first character of the log_cluster range. This destroys the 1:1 correspondence between
                    characters and glyphs. A big chunk of the conditional code which immediately follows this call
                    is there to clean up the resulting mess.

If I had to bet, it would be that the rules to clean up the mess are different for Hebrew and whatever languages use the Telugu font.

David Mathog (mathog) wrote :

This patch seems to fix it. I have not committed it because it needs to be tested on other languages. It seems to be OK with the single Telugu word in the test case, English, Hebrew, and mixed English and Hebrew, but it would not surprise me if this change does something odd in yet another language. The OP should examine an extensive section of Telugu text to see if it all looks OK.

The root of the issue is the need to be able to kern text in languages that hang modifying glyphs on a base character glyph. When Pango processes a UTF-8 string it returns a series of glyphs that may not be in the same order as the UTF-8. This first showed up in Hebrew, where the vowel glyphs were sometimes being placed sequentially before the base character glyph. In Hebrew each "logical cluster" that Pango returned contained a single base character and multiple modifiers, all modifiers having zero width. Their order was indeterminant. This broke kerning elsewhere in Inkscape. To make it work the glyphs were sorted by width so that the base character came first and all the modifers followed. (The order of the modifiers was irrelevant.)

Telugu has logical clusters of many characters. The example has 5 glyphs from 27 UTF-8 bytes and 7 unicode code points. One of the unicode values is a "zero width nonjoiner" which prevents two unicode values from merging into a ligature. It does not show up in the final glyph list. All the others do. They all have nonzero width, even though some of them modify base characters much like the vowels do in Hebrew.

When the Telugu example hit the sort section that allowed Hebrew kerning it scrambled the glyph order. That was the first bug. The second bug was that there was an implicit assumption elsewhere about how to convert logical clusters into spans which was dropping some advance values for Telugu. The patch corrects (hopefully) both of these issues.

David Mathog (mathog) on 2014-08-13
Changed in inkscape:
assignee: nobody → David Mathog (mathog)
వీవెన్ (veeven) wrote :

Sorry, I could not follow up. From #11, it seems @mathog figured out the issue.

> The OP should examine an extensive section of Telugu text to see if it all looks OK.

OP means original poster? Is there any way I can test with your patch applied? A binary for Windows 7 would be great.

From #10:
> Also, will the OP please save that test phrase into an EMF file (for instance, from Powerpoint, select, save as picture, format EMF). Then if it looks OK in windows Preview, or when read back in by PowerPoint, read it into Inkscape and see what it does.

Reading the EMF file back worked fine in PowerPoint, Office Picture Manager and Paint. I got only white "path" object in Inkscape.

వీవెన్ (veeven) wrote :

Here is several lines of Telugu text that has issues when input into Inkscape devel.

వీవెన్ (veeven) wrote :

He is the expected rendering of the the text in HTML in comment #13.

su_v (suv-lp) on 2014-11-06
Changed in inkscape:
milestone: 0.91 → 0.91.1
jazzynico (jazzynico) wrote :

Patch from comment #11 tested successfully on Crunchbang Waldorf, Inkscape 0.91.x rev. 13647.
The patch also fixes Bug #1382747 (Complex script vowel signs mangled) and Bug #1304602 (Sara Am ( U+0E33 Unicode ) is misbehavior).

Changed in inkscape:
status: Triaged → In Progress
tags: removed: telugu
jazzynico (jazzynico) wrote :

Patch committed in the trunk (rev. 13729) so that we can test it properly before backporting to the 0.91.x branch.

su_v (suv-lp) wrote :

Fix for this report (bug #1282968), bug #1304602 and bug #1382747 confirmed with Inkscape 0.91+devel r13729 on OS X 10.7.5 (on-canvas text rendering with X11- as well as Quartz-backend).

A detail I noticed with regard to this report (about kerning of Telugu fonts): the incorrect spacing as noted in comment #3 (and as rendered in current stable 0.48.5): "ZWNJ character should not have produced space" is also present with 0.91+devel r13729. This probably should be tracked separately (if confirmed as bug in Inkscape and not in one of the external libraries used by inkscape).

jazzynico (jazzynico) wrote :

~suv> "ZWNJ character should not have produced space" is also present with 0.91+devel r13729. This probably should be tracked separately.

Already tracked in Bug #168673 "Support for ZWNJ"?

jazzynico (jazzynico) wrote :

> Already tracked in Bug #168673 "Support for ZWNJ"?

And Bug #1362366 "Support ZWNJ (u200c) on Inkscape text" (probably duplicates).

su_v (suv-lp) on 2014-11-27
tags: added: backport-proposed
ScislaC (scislac) wrote :

Backported to 0.91.x in r13666.

tags: removed: backport-proposed
Changed in inkscape:
milestone: 0.91.1 → 0.91
su_v (suv-lp) on 2014-11-29
Changed in inkscape:
status: In Progress → Fix Committed
Bryce Harrington (bryce) on 2015-02-23
Changed in inkscape:
status: Fix Committed → Fix Released
su_v (suv-lp) wrote :

Follow-up report (AFAICT new regression):
- Bug #1425387 “Incorrect vowel & consonant position while typing in Thai”

poju (popjussi) wrote :

The space differed from #3 to #4 was caused by this kind of code.

                                if ((new_glyph.width == 0) && (para.pango_items[unbroken_span.pango_item_index].font))
                                    new_glyph.width = new_span.font_size * para.pango_items[unbroken_span.pango_item_index].font->Advance(unbroken_span.glyph_string->glyphs[glyph_index].glyph, false);

with this comment

                                // for some reason pango returns zero width for invalid glyph characters (those empty boxes), so go to freetype for the info

I commented the part and got this attached result. I am not sure why it needs to look into freetype for this, but the advance_width will take it for a start hence it adds the extra space. Could someone explain why it needs to deal with invalid glyph characters rather fix the font for that? What kind of invalid glyph is this?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers