Comment 11 for bug 1282968

Revision history for this message
David Mathog (mathog) wrote :

This patch seems to fix it. I have not committed it because it needs to be tested on other languages. It seems to be OK with the single Telugu word in the test case, English, Hebrew, and mixed English and Hebrew, but it would not surprise me if this change does something odd in yet another language. The OP should examine an extensive section of Telugu text to see if it all looks OK.

The root of the issue is the need to be able to kern text in languages that hang modifying glyphs on a base character glyph. When Pango processes a UTF-8 string it returns a series of glyphs that may not be in the same order as the UTF-8. This first showed up in Hebrew, where the vowel glyphs were sometimes being placed sequentially before the base character glyph. In Hebrew each "logical cluster" that Pango returned contained a single base character and multiple modifiers, all modifiers having zero width. Their order was indeterminant. This broke kerning elsewhere in Inkscape. To make it work the glyphs were sorted by width so that the base character came first and all the modifers followed. (The order of the modifiers was irrelevant.)

Telugu has logical clusters of many characters. The example has 5 glyphs from 27 UTF-8 bytes and 7 unicode code points. One of the unicode values is a "zero width nonjoiner" which prevents two unicode values from merging into a ligature. It does not show up in the final glyph list. All the others do. They all have nonzero width, even though some of them modify base characters much like the vowels do in Hebrew.

When the Telugu example hit the sort section that allowed Hebrew kerning it scrambled the glyph order. That was the first bug. The second bug was that there was an implicit assumption elsewhere about how to convert logical clusters into spans which was dropping some advance values for Telugu. The patch corrects (hopefully) both of these issues.