intltool-extract does not decode XML special entities

Bug #1191978 reported by Eloi Rivard
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
intltool
Incomplete
Undecided
Unassigned

Bug Description

Hi,
This fails on Fedora 18, with intltool 0.50.2
Take a simple testcase:

= foobar.xml ===================================
<?xml version="1.0" encoding="UTF-8"?>
<_foo>&gt;</_foo>
=============================================

Run:
intltool-extract --type="gettext/xml" foobar.xml
cat foobar.xml.h

And see:

= foobar.xml.h ==================================
char *s = N_("&gt;");
==============================================

The ">" character should have been decoded before beeing exported in the .h file.

I could fix it by replacing "add_message($lookup);" by "add_message(entity_decode($lookup));" at line 416. See attached patch.

Related branches

Revision history for this message
Eloi Rivard (azmeuk) wrote :
Revision history for this message
dobey (dobey) wrote :

I don't think we should be decoding these entities, as the translations will be merged back into the XML file or built into separate XML files, and thus they must be encoded properly in those XML files. If we were to simply decode every entity, we would then need to maintain some sort of map somewhere, specific to the XML file they were read from, to be able to re-encode any entities, when we generate the translated files. I don't think it makes sense for intltool to be doing that work.

If there is a way we can automatically get it right every time, without maintaining such a mapping, then it might be acceptable.

Changed in intltool:
status: New → Incomplete
Revision history for this message
Eloi Rivard (azmeuk) wrote :

In which case can those translations be merged back into the XML file?

The project I contribute to that needs this fix is GNU Denemo. Tags are parsed with libxml, and then translated with the _() function. Libxml automatically unescape special chars, so in this case, a translation is searched for ">", and not "&gt;". I think most XML parsers unescape strings when they parse files, am I wrong?

Also, I don't think translations catalogs should be polluted with escaping characters. That can seem cryptic to translators, and they probably don't need to know that string they translate come from XML or another datatype.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.