TNG-compute splitting a span that need not be split

Bug #1169345 reported by David Mathog
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Inkscape
New
Undecided
Unassigned

Bug Description

The attached SVG example contains mixed English and Hebrew in the order "English1 Hebrew English2".
Somewhere in Layout-TNG-Compute.cpp during the span calculation an error occurs and 4 spans are created instead of the 3 which should be. The first two are as they should be, but the 3rd one (ideally) is split into two pieces: the space that immediately follows the Hebrew phrase, then all the rest of the English. Oddly enough this happens when the SVG file is opened, but the issue is not evident when the XML is inspected. Instead, when one tries to save the SVG to another format,
in my case to EMF, 4 spans are found where 3 were expected. I was able to trace the problem back into
Layout-TNG-Compute.cpp, and to see it happens when the SVG file is loaded, but those code sections are so complex that I could not figure out what was going wrong where.

The inverse test, "Hebrew1 English Hebrew2" has the exact same problem, with the space between "English" and "Hebrew2" breaking off into its own tspan, for no good reason.

Revision history for this message
David Mathog (mathog) wrote :
jazzynico (jazzynico)
tags: added: text
Revision history for this message
jazzynico (jazzynico) wrote :

The file provided comment #1 seems correct with trunk revision 12387. The text is not split at all, and I can't find a way to reproduce the issue. Could you please provide precise steps to reproduce?

Changed in inkscape:
status: New → Incomplete
Revision history for this message
David Mathog (mathog) wrote :

The file of comment #1 is correct, the problem is what Inkscape does with it when exporting to other formats. This is one way to reproduce it (it could probably be done with trunk too, if you export to a format where you can see the details of the text records within):

1. start inkscape (lp988601 branch)
2. open ehe2.svg
3. save as: ehe2.emf #this might work with trunk on Windows, but it will not on other platforms
4. using reademf from libUEMF and extract from drm_tools (both at sourceforge), look at how it wrote ehe2.emf. The EMF driver writes text records one at a time as they are sent in, so if there is a separate piece, it is because inkscape fed it in that way.

reademf ehe2.emf | extract -if emrtext -ifonly -mt -dl '<>' -fmt '([2])'
( Eng )
(שָׁלוֺם עוֺלָם)
( )
(Eng2)
( שָׁלוֺם עוֺלָם )
(English)
( )
(שָׁלוֺם עוֺלָם)

The contents of each () show what inkscape fed into the EMF driver.
The 3rd and 7th lines have spaces that should be attached to the string on one side or the other. This is the problem.
Look at how the first transition from English to Hebrew is managed - the space after "Eng" goes with that chunk. However the space after the Hebrew is all by itself, instead of being with the chunk before or after it. The bug is that for a string
"L1a L2 L1b", where L1a and L1b are strings in one language, and L2 is a string in another, the span calculation emits:

"L1a " "L2" " " "L1b"

it should emit some variant on

"L1a " "L2 " "L1b"

or

"L1a" " L2" " L1b"

It does not really matter where the space goes, but it should not be floating around by itself.

su_v (suv-lp)
tags: added: emf exporting
Changed in inkscape:
status: Incomplete → New
Revision history for this message
David Mathog (mathog) wrote :

The example SVG listed above looks really strange now for some reason (there is a huge gap after the first Hebrew character in "Shalom"). Attached is another version I just made which doesn't do that.

This problem with the extra space being generated is still present in revision 13915. If this file is exported to EMF and then passed through libUEMF's "reademf" program there are 4 fields for each line instead of the expected 3, with the 4th being just a single space.

Revision history for this message
David Mathog (mathog) wrote :

Ah, I see what the problem was with ehe2.svg. It used Century Schoolbook L (on linux) and that has some issue with Hebrew characters. Change everything to Ezra Sil SR (a real Hebrew font) and it looks OK. Most computers are not going to have Ezra Sil SR, of course, which is why Century Schoolbook must have been used.

Download source for Ezra Sil fonts:
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=silhebrunic2

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.