Part of a translated substring is matched and translated again by a subsequent translation rule

Bug #1867069 reported by Guido Longoni on 2020-03-11
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Gufw
Undecided
Unassigned

Bug Description

Symptoms:
In Italian, "OUT" is translated "IN USCITA". Subsequently, the substring "IN" from the previous translation is retranslated with "IN ENTRATA". The resulting translated rule is "IN ENTRATA USCITA", which sounds approximately like this: "INCOMING OUTGOING", which is very wrong and misleading, especially for the unsuspecting user...

Analysis:
File gufw/gufw/view/gufw.py
Lines 712-734

The string in "translated_rule" variable is manipulated recursively - I mean that the output of one replace() becomes the input of the next - and in this way the entire translated text is also compared with the search string of the next replace(). This causes the bug.
Initially, I experimented a little bit and tried to replace all the translations at once, but some of them actually RELY on some overlap between the previously translated text and the next one, because the search string is surrounded by spaces (I suspect the intention is to match only whole words) and the replace string contains the same spaces. But if you happen to have to translate two adjacent words separately, they only have one space between them, and it is in common.

Solution:
in the hope of doing something useful, I wrote a little patch.
It takes account of any overlapping of spaces and ensures the separation between the translated text and the text still to be translated.

A little caveat:
This is my very first contribution to an opensource project and I am not a bazaar user, so I did my best to provide a solution along with the problem.

costales (costales) wrote :

Hi Guido,

Thanks for the patch, but I don't like to change code because one language has one issue.

Could you ask to your team translators about a fix changing those strings? I think that would be the right solution.

Thanks in advance!

Changed in gui-ufw:
status: New → Opinion
Guido Longoni (guidolongoni) wrote :

Hello!
Mine was just an example. The code translates part of strings already translated and this makes it extremely brittle and also prone to breakage due to different translations.
I understand that most of the time everything works anyway, but it's by accident: changing the translation can break the code unexpectedly in any language.
Please, if you are not convinced about my patch don't use it, but seriously consider changing the translation procedure.

I'm not part of any team of translators, so I wouldn't even know how to ask for changes, but as a native-speaking italian user I assure you that "IN" -> "IN" and "OUT" -> "OUT" are already good translations.

Thank you
Guido Longoni (a random user)

Guido Longoni (guidolongoni) wrote :

"
I assure you that "IN" -> "IN" and "OUT" -> "OUT"
"

Sorry, I meant "IN" -> "IN ENTRATA" and "OUT" -> "IN USCITA"

Changed in gui-ufw:
status: Opinion → New
Bib (bybeu) wrote :

Hi Guido & Costales. I found, digging /usr/share/locale/fr/LC_MESSAGES/gufw.mo that this real bug was workarounded in at least one French string translation : they translated "Home" as "Dossier personnel" (note the blank space), but in the GUI it shows "Dossier_personnel" and I don't know where the underscore comes from (BTW this is a bad translation that seems made by either something automatic or by one that doesn't understand the context, should be Maison/Domicile (Casa) and further is too long for the embedded length check). So for Italian it may be IN_ENTRATA and IN_USCITA.

Although, for a project that is targeted to unskilled users AND is mainly a GUI, AND addresses security concerns, I feel costales opinion about not changing code because of a single one language issue (I'd add "ATM") reveals poor understanding of what languages are : to the sake of targeted audience, natural languages specifics should always be respected, not assuming/forcing one source word will match one target word. Boundaries should be whole phrases, or word definition should include any printable character plus white space. Your point that translation should NOT break the code is the definitive reason. The funny thing is there are many Italian names in this bug subscribers list... that the package maintainer is another Italian name, Devid Antonio Filoni, according to Synaptic in Trusty.
I found contact addresses of translators in the .mo file, you maybe try to PM them:
" Alessandro Ghione https://launchpad.net/~alex81\n"
" Alessandro Menti https://launchpad.net/~elgaton\n"
" Aliak https://launchpad.net/~aliak-93\n"
" Andrea Luciano Damico https://launchpad.net/~lehti\n"
" Claudio Arseni https://launchpad.net/~claudio.arseni\n"
" Devid Antonio Filoni https://launchpad.net/~d.filoni\n"
" Edoardo Vanin https://launchpad.net/~edoardo-vanin\n"
" Gianluca https://launchpad.net/~albatrosslive\n"
" Gualtiero https://launchpad.net/~gualtiero-testa\n"
" Guybrush88 https://launchpad.net/~guybrush\n"
" Luca Ferretti https://launchpad.net/~elle.uca\n"
" Lvcio https://launchpad.net/~lvcio\n"
" Mario Gatti https://launchpad.net/~parismarioinformatique\n"
" Wonderfulheart https://launchpad.net/~wonderfulheart\n"
" costales https://launchpad.net/~costales\n"
" flux https://launchpad.net/~luigimarco\n"
" giulianom89 https://launchpad.net/~giulianom89\n"
" lang-it https://launchpad.net/~lang-it\n"
" mattia.b89 https://launchpad.net/~mattia-b89\n"
" rudy79 https://launchpad.net/~rudy79"

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers