obsolete translations exported to the branch

Bug #669831 reported by Fabien Tassin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Данило Шеган

Bug Description

a week ago, i slightly changed the format of some of the Chromium
strings to make them more readable for Translators. Here is an example:

chromium-translations.head/chromium_strings/chromium_strings.pot:

 #. IDS_BROWSER_WINDOW_TITLE_FORMAT
 #. - description: The format for titles displayed in tabs and popup windows
 #: id: 3848258323044014972
-msgid "<ph name=\"PAGE_TITLE\"/> - Chromium"
+msgid "%{PAGE_TITLE} - Chromium"
 msgstr ""

I thought 'msgid' was the key, so i expected the old version to
disappear after some days (3??).
In the branch lp imports from me, there's no 'ph name=' anymore anywhere
(since Oct 24). Yet, in the branch lp exports to me, i have thousands of
'ph name='.

ex with 3848258323044014972:

$ grep -A2 3848258323044014972 chromium_strings/*
chromium_strings/de.po:#: id: 3848258323044014972
chromium_strings/de.po-msgid "<ph name=\"PAGE_TITLE\"/> - Chromium"
chromium_strings/de.po-msgstr ""
--
chromium_strings/es.po:#: id: 3848258323044014972
chromium_strings/es.po-msgid "<ph name=\"PAGE_TITLE\"/> - Chromium"
chromium_strings/es.po-msgstr ""
--
chromium_strings/fi.po:#: id: 3848258323044014972
chromium_strings/fi.po-msgid "<ph name=\"PAGE_TITLE\"/> - Chromium"
chromium_strings/fi.po-msgstr "<ph name=\"PAGE_TITLE\"/> - Chromium"
--
chromium_strings/pt_BR.po:#: id: 3848258323044014972
chromium_strings/pt_BR.po-msgid "<ph name=\"PAGE_TITLE\"/> - Chromium"
chromium_strings/pt_BR.po-msgstr "<ph name=\"PAGE_TITLE\"/> - Chromium"

$ find * | xargs grep -c 'ph name' | grep -v :0
chromium_strings/pt_BR.po:38
chromium_strings/fi.po:21
chromium_strings/de.po:17
chromium_strings/es.po:17
devtools_strings/pt_BR.po:20
devtools_strings/fi.po:13
devtools_strings/de.po:11
inspector_strings/pt_BR.po:63
inspector_strings/fi.po:59

(all those 'po' are new, upstream doesn't have translations for those
templates)

The problem is that those strings, once turned back into their original
grit format, are causing FTBFS of the package.

So now, i'm kind of stuck. I had to disable the translation patching in
my package to prevent the FTBFS, which is too bad after the time i've
spent on this.

After an initial investigation by henninge, it seems the tarball export is fine, so it's a bug in the branch export script.

Related branches

tags: added: code-integration
Revision history for this message
Данило Шеган (danilo) wrote :

The problem seems to be in the optimization to recognize when PO files need to be re-exported: updated template doesn't trigger export of all PO files. In general, this is not a big issue for proper gettext handling applications because missing messages from PO files are considered untranslated anyway (and since there were no updates on these PO files, since the POT messages are new they are still untranslated).

In general, I suggest you normalize PO files against the POT file using msgmerge during the build process if you depend on PO file being complete ("msgmerge -U chromium_strings/fi.po chromium_strings/chromium_strings.pot"): that is recommended anyway (even when we fix this bug, it will make your build process much less fragile and actually dependent only on your POT file being correct).

Changed in rosetta:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Fabien Tassin (fta) wrote :

It seems to impact only the templates for which they are no associated po files in the import branch.

I'll think about msgmerge but i already have a lot of code for this grit<->gettext:
http://bazaar.launchpad.net/~chromium-team/chromium-browser/chromium-translations-tools.head/annotate/head:/chromium2pot.py

In a nutshell, I convert the Grit files (grd+xtb) from upstream/trunk into gettext, feed that to lp, get a bunch of po files in return.
I then select one of the chromium branches (could be trunk for the dailies, or one of the dev/beta/stable channels),
take the corresponding template (grd file) and merge the strings extracted from the LPs po files with the upstream xtb file,
to finally produce patches, that i land in the packaging branch, and submit to upstream to close the loop.

So the lp export branch is only a pool of translated strings for me. msgid is not the key, the "#:" id is, that's why i'm impacted by obsolete strings. msgmerge could help, but i can also drop msgstr for which the msgid is not the expected one for a given id.
But that means those ids won't have a proper translation ever.

Revision history for this message
Данило Шеган (danilo) wrote :

I know it's a bit evil, but have you considered using msgctxt for actual IDs instead? msgmerge (and all the other gettext tools) should work with that correctly in that case. Of course, I can imagine you don't want to change the set-up once again, so I'll think by Monday what can we do to help you out that doesn't take too much time for us (ideally, we'd fix the underlying problem).

Revision history for this message
Fabien Tassin (fta) wrote :

I simply can't as i need a fully bijective conversion. I've explained how it works in an thread a few days ago. Pasting here for reference:

=cut

upstream uses grit. It's a combination of one grd and several xtb files.
All are XML files (some kind of xslt).

the 1st string of generated_resources.grd (the template):

====
<if expr="not pp_ifdef('use_titlecase')">
        <message name="IDS_CONTENT_CONTEXT_PRINTFRAME" desc="The name of the Print Frame command in the content area context menu">
          P&amp;rint frame...
        </message>
</if>
====

in generated_resources_en_GB.xtb (the en-GB translation file), it looks like this (and nothing else):

====
<translation id="1002064594444093641">P&amp;rint Frame...</translation>
====

there's no obvious link between IDS_CONTENT_CONTEXT_PRINTFRAME and
1002064594444093641. The reason is that in a grd file, a string could be used for
totally different IDS_xxx codes with different conditions in different
contexts. So what grit does is it applies a function to the string
creating the id in order for that string to be translated only once:
some_hash("P&amp;rint frame...") => 1002064594444093641

So for launchpad, i turn all this into:

====
#. IDS_CONTENT_CONTEXT_PRINTFRAME
#. - description: In Title Case: The name of the Print Frame command in
the content area context menu
#. - condition: pp_ifdef('use_titlecase')
#: id: 1002064594444093641
msgid "P&rint Frame..."
msgstr ""
====

of course, i have to transcode all the xml entities and tags so they are
easy to read, and also i need to convert the gettext back to the grit
format.

=end

it explains why my gettext keys changed (msgid is transcoded, and each time i improve/change the transcoding rules, i hit this bug)

Revision history for this message
Данило Шеган (danilo) wrote :

As per your explanation, raising the priority of the bug. We'll try to get to it asap.

Changed in rosetta:
importance: Medium → High
Revision history for this message
Fabien Tassin (fta) wrote :

Please have a look at https://code.launchpad.net/~chromium-team/chromium-browser/chromium-translations.head
It shows that even if my converter is stable, this is a very common situation upstream (context changes, description changes, strings changes...) so this bug is a blocker for chromium.

Changed in rosetta:
assignee: nobody → Данило Шеган (danilo)
milestone: none → 10.12
status: Triaged → In Progress
Revision history for this message
Данило Шеган (danilo) wrote :

Ok, I am putting a fix into review. We'll have to test it on our QA servers first before we roll it out, but I hope we'll get it out in the next few days.

Changed in rosetta:
status: In Progress → Fix Committed
Revision history for this message
Launchpad QA Bot (lpqabot) wrote : Bug fixed by a commit
tags: added: qa-needstesting
tags: added: qa-ok
removed: qa-needstesting
Revision history for this message
Данило Шеган (danilo) wrote :

I believe this has been rolled out. On your next template upload, all PO files should be exported as well.

Changed in rosetta:
status: Fix Committed → Fix Released
Revision history for this message
Fabien Tassin (fta) wrote :

how long will it take to get the obsolete strings out of the export branch?
Your fix landed 2 days ago, but the strings are still there:
http://people.ubuntu.com/~fta/chromium/translations/trunk/converter-output.html

Revision history for this message
Данило Шеган (danilo) wrote :

Obsolete strings don't get removed. That's how PO files work, and we can't change that (i.e. we can't stop exporting them). You should ignore them for your purposes.

Revision history for this message
Данило Шеган (danilo) wrote :

To clarify further: obsolete translations need to stay in exported PO files because people using PO files expect them to. Tools like msgmerge and similar can make use of them, and since that's the only way for users to get translations for obsolete messages exported from Launchpad at this time, we can't take that away from them.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.