Comment 8 for bug 1627523

Revision history for this message
David Mathog (mathog) wrote :

I wish I had time to work on this, but I don't.

If somebody else wants to go for it here is my fading recollection of the root of many of these problems. Somewhere or other in the Pango pipeline, at around pango_shape(), there is a step that breaks a unicode string down into not just a series of glyphs, but a series of glyph logical clusters. The latter is stored in a second array (like 1,2,3,3,3,4,5..., where the 3,3,3 indicates that the 3,4, and 5th glyphs are in the same logical cluster):

https://developer.gnome.org/pango/stable/pango-Glyph-Storage.html

For European languages glyphs are pretty much 1:1 with the logical clusters but that is very much not so for other languages. Further along Inkscape somewhere or other makes an implicit assumption that the series of glyphs all play by the same rules, that is that the logical clustering can be ignored. Problems follow. I'm pretty sure I broke some of the Indic languages in an attempt to make the vowels and diacriticals in Hebrew go where they should, by rearranging them within a cluster. That is bug #1282968. The correct fix would have been to retain the cluster information longer so that at a later point code wasn't trying to place glyphs by sequentially offsetting by the width of the previous glyph. That is true lots of places: kerning, character selection, character formatting, all of these things should be done on the logical cluster, not the glyph. Any place you find that works with a line of text converted to just a series of glyphs it isn't quite right for these other languages.

I believe pango has functions for finding the appropriate offsets of clusters, but Inkscape may not be using these in all locations.

A possibly related/confounding issue concerns inconsistent use of unicode's "Mark, Nonspacing". See the discussion in bug #500343 at post 5.