db-seed.po files need cleanup to remove duplicate IDs from generated localized seed data

Bug #1681864 reported by Dan Scott
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Medium
Unassigned
2.12
Fix Released
Medium
Unassigned

Bug Description

This bug continues #1680312 - running through the translation process on master / rel_2_12, we find the localized 950.data.seed-values.sql files can contain duplicate rows:

cd build/i18n

# generate the new base POT
make newpot

# update the translation for fr-CA
make LOCALE=fr-CA updatepo

# create an updated SQL file to load fr-CA
make LOCALE=fr-CA project

# load the generated fr-CA SQL
psql -f project/locale/fr-CA/950.data.seed-values.sql evergreen

At this last step, I ran into some errors about duplicates in config.i18n_core. However, the load was successful, and the errors didn't seem to be harmful. To determine why we were getting duplicates, I examined the results of the "updatepo" step (which adds and merges the new strings with an existing translation) vs. a "newpo" step (which generates a fresh, untranslated PO file).

With the "updatepo" step, you get entries like:

# id::cmf.label__26 id::cza.label__1 id::cza.label__10
# id::cza.label__1 id::cza.label__10 id::cza.label__19
#: 950.data.seed-values.sql:173 950.data.seed-values.sql:410
#: 950.data.seed-values.sql:431
msgid "Title Control Number"
msgstr "Numéro de contrôle pour les titres"

With the "newpo" step, you get entries like:

# id::cmf.label__26 id::cza.label__1 id::cza.label__10
#: 950.data.seed-values.sql:173 950.data.seed-values.sql:410
#: 950.data.seed-values.sql:431
msgid "Title Control Number"
msgstr ""

So it looks like in the merge process, "pot2po" is holding onto prior ID rows, resulting in duplicates getting fed into the final SQL file. The problem could be cleaned up by running the output SQL through "sort | uniq", but that's fixing the symptom rather than the root of the problem. More investigation into what pot2po is doing may be required to fix the root of the problem.

Dan Scott (denials)
Changed in evergreen:
status: New → Triaged
milestone: none → 2.12.1
Revision history for this message
Ben Shum (bshum) wrote :

Looks like we may want to insert a step for "pocommentclean" which is part of translate toolkit and can be used to remove all existing comments in a given file. While the manpage says that you can only use pocommentclean against po directories and not individual files, my local testing with translate toolkit 1.13.0 on Ubuntu 16.04 yielded successfully cleaned files on a case by case basis.

So running "pocommentclean build/i18n/po/db.seed/fr-CA.po" resulted in a file fr-CA.po with no comments in it. Followed by running the individual command to update one PO file, like "pot2po --progress verbose -i po/db.seed/db.seed.pot -o po/db.seed/fr-CA.po -t po/db.seed/fr-CA.po" which resulted in an fr-CA.po file with the corrected IDs only (and all the updated IDs that were missing from the file too).

We'd have to figure out where best to implement the task to clean out the existing IDs and replace them all. And then with git version control tracking all our changes, the final update that gets committed would appear only to replace any mistakes in the file. Any existing IDs that are unchanged would remain unchanged with the git history.

Further testing and coding to come...

tags: added: i18n
Changed in evergreen:
status: Triaged → Confirmed
importance: Undecided → Medium
Revision history for this message
Ben Shum (bshum) wrote :

Working branch started up: user/bshum/lp161864-i18n-db-seed-fixing

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/bshum/lp161864-i18n-db-seed-fixing

All this does is add a pocommentclean step in the middle of updatepo make target that will selectively wipe out comments for a given db.seed locale file. Tested only on Ubuntu 16.04 presently with translate toolkit 1.13.0. There may be differences on other platforms in how this ends up behaving.

The test procedure would involve running "make newpot" in build/i18n first, to deploy the new template changes for IDs to the base db.seed.pot file. Then run "make LOCALE=ar-JO updatepo" or "make update_all_locales" to hit all of them at once, to apply the changes throughout between the templates and locale PO files.

As a potential hiccup, it looks like updatepo hasn't been called between POT and PO in a very long time (since it's not a step in the wiki instructions for release making predating 2.6). So it has been shown for me that running updatepo to apply this fix also nets a ton of other PO updates and changes. Testing this fix will require some finesse with managing the various i18n pieces that need updating and repair.

Not putting a pullrequest on this yet, while we investigate further and discuss any other potential ramifications.

Changed in evergreen:
milestone: 2.12.1 → 2.12.2
Changed in evergreen:
milestone: 2.12.2 → 2.12.3
Changed in evergreen:
milestone: 2.12.3 → 2.12.4
Revision history for this message
Ben Shum (bshum) wrote :

Setting a pullrequest on this now. The change should assist with fixing up the IDs during the next PO template sync and PO files.

tags: added: pullrequest
Changed in evergreen:
milestone: 2.12.4 → 2.12.5
Revision history for this message
Chris Sharp (chrissharp123) wrote :
tags: added: signedoff
Revision history for this message
Galen Charlton (gmc) wrote :

Pushed to master. Thanks, Ben and Chris!

Changed in evergreen:
status: Confirmed → Fix Committed
milestone: 2.12.5 → 3.0-alpha
Revision history for this message
Galen Charlton (gmc) wrote :

Advice on whether to backport to rel_2_12?

Revision history for this message
Ben Shum (bshum) wrote :

Probably wouldn't hurt to have it in rel_2_12. +1 to backport

Revision history for this message
Galen Charlton (gmc) wrote :

Pushed to rel_2_12

Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.