oils_i18n_gettext has bad IDs in 950 seed data

Bug #1680312 reported by Ben Shum
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Medium
Unassigned

Bug Description

Evergreen master (and older too)

It looks like we have a bunch of re-used IDs in the 950 seed data for oils_i18n_gettext translation purposes and this is leading to generating 950 seed data SQL inserts that can conflict with each other.

An example mismatch:

INSERT INTO config.coded_value_map (id, ctype, code, value, opac_visible) VALUES (1404, 'accm4', 's', oils_i18n_gettext('724', 'Music', 'ccvm', 'value'), FALSE);

The ID should be 1404, but is getting an oils_i18n_gettext ID of 724 (which is already used elsewhere)

Need to renumber all these INSERT ids for translation purposes.

Tags: i18n
Ben Shum (bshum)
tags: added: i18n
Changed in evergreen:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 2.12.1
Revision history for this message
Ben Shum (bshum) wrote :

Working branch to renumber the IDs accordingly: user/bshum/lp1680312-fix-ids-950-seed-data

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/bshum/lp1680312-fix-ids-950-seed-data

tags: added: pullrequest
Revision history for this message
Jason Boyer (jboyer) wrote :

All of this was intentional though it may be possible I misunderstood how oils_i18n_gettext works. The fixed fields with numbers like this are individual character positions within a multi-character field, so id 724 is also Music, but in the regular 'accm' rather than 'accm4'. Since there's no reason to have a different translation of Music in position one than positions 2-4, I reused the initial id for the field value rather than having a lot of duplicates.

Revision history for this message
Ben Shum (bshum) wrote :

According to Dan Scott, the oils_i18n_gettext needs us to have the IDs be unique so that it generates the output properly.

What we expect is that the PO itself gets generated as something like:

# id::ccvm.value__520 id::ccvm.value__897 id::ccvm.value__1565
# id::ccvm.value__1585
#: 950.data.seed-values.sql:7069 950.data.seed-values.sql:7875
#: 950.data.seed-values.sql:8580 950.data.seed-values.sql:8601
msgid "Poetry"
msgstr ""

Whereby the ccvm.value IDs get assigned as comments in the block. The message string itself is only translated once, but the comments with the IDs are used to uniquely generate and apply translation against the individual values.

Right now, that block's comments end up like:

# id::ccvm.value__520 id::ccvm.value__897 id::ccvm.value__897
# id::ccvm.value__897

And thus creates at least two duplicate 897 INSERT entries in the compiled 950 seed data file for a given locale.

Tested my patch and updated the POT (templates) to see the new IDs take effect. Not sure yet if we have to push those changes first, then do a new PO sync to get updated locale files, then try rebuilding the 950 seed data localized files to fully re-test.

Revision history for this message
Ben Shum (bshum) wrote :

Found a few more mismatched IDs. Added an additional commit and rebased to latest master and force pushed to same branch.

Revision history for this message
Ben Shum (bshum) wrote :

And found yet more duplicate errors in 950 localized seed data. These are not ID mismatches, but instead are label or key name duplications. More patches to come, removing pullrequest till fully ready...

psql:950.data.seed-values-ar-JO.sql:976: ERROR: duplicate key value violates unique constraint "i18n_identity"
DETAIL: Key (fq_field, identity_value, translation)=(coust.label, config.settings_group.system, ar-JO) already exists.

psql:950.data.seed-values-ar-JO.sql:1628: ERROR: duplicate key value violates unique constraint "i18n_identity"
DETAIL: Key (fq_field, identity_value, translation)=(ppl.description, 1, ar-JO) already exists.

psql:950.data.seed-values-ar-JO.sql:1745: ERROR: duplicate key value violates unique constraint "i18n_identity"
DETAIL: Key (fq_field, identity_value, translation)=(coust.label, vandelay.item.barcode.auto, ar-JO) already exists.

psql:950.data.seed-values-ar-JO.sql:2197: ERROR: duplicate key value violates unique constraint "i18n_identity"
DETAIL: Key (fq_field, identity_value, translation)=(coust.label, vandelay.item.copy_location.default, ar-JO) already exists.

psql:950.data.seed-values-ar-JO.sql:2355: ERROR: duplicate key value violates unique constraint "i18n_identity"
DETAIL: Key (fq_field, identity_value, translation)=(coust.label, vandelay.item.call_number.prefix, ar-JO) already exists.

psql:950.data.seed-values-ar-JO.sql:2639: ERROR: duplicate key value violates unique constraint "i18n_identity"
DETAIL: Key (fq_field, identity_value, translation)=(coust.label, vandelay.item.barcode.prefix, ar-JO) already exists.

psql:950.data.seed-values-ar-JO.sql:3014: ERROR: duplicate key value violates unique constraint "i18n_identity"
DETAIL: Key (fq_field, identity_value, translation)=(coust.label, vandelay.item.call_number.auto, ar-JO) already exists.

psql:950.data.seed-values-ar-JO.sql:3354: ERROR: duplicate key value violates unique constraint "i18n_identity"
DETAIL: Key (fq_field, identity_value, translation)=(coust.label, vandelay.item.circ_modifier.default, ar-JO) already exists.

psql:950.data.seed-values-ar-JO.sql:3604: ERROR: duplicate key value violates unique constraint "i18n_identity"
DETAIL: Key (fq_field, identity_value, translation)=(vie.description, import.duplicate.sysid, ar-JO) already exists.

psql:950.data.seed-values-ar-JO.sql:3624: ERROR: duplicate key value violates unique constraint "i18n_identity"
DETAIL: Key (fq_field, identity_value, translation)=(coust.label, ui.patron.edit.aua.county.require, ar-JO) already exists.

psql:950.data.seed-values-ar-JO.sql:3647: ERROR: duplicate key value violates unique constraint "i18n_identity"
DETAIL: Key (fq_field, identity_value, translation)=(coust.description, ui.patron.edit.aua.county.require, ar-JO) already exists.

tags: removed: pullrequest
Revision history for this message
Dan Scott (denials) wrote :

Allow me to offer a regression test that will hopefully prevent these from showing up in the future!

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/dbs/lp1680312_test_sql_gettext

Revision history for this message
Ben Shum (bshum) wrote :

Dan's test worked for me in the t directory, but not when doing make check. Making a minor tweak to his test for that..

And putting everything into a new collab branch: collab/bshum/lp1680312-fix-ids-950-seed-data

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/bshum/lp1680312-fix-ids-950-seed-data

Revision history for this message
Ben Shum (bshum) wrote :

After this is merged, we need to run the translation POT updates in order to generate new templates for PO and assign the right IDs in the future.

Revision history for this message
Jason Boyer (jboyer) wrote :

A quick note re: my response above. I talked with Ben and Dan and I had indeed misunderstood the proper care and feeding of oils_i18n_gettext, my mistake! Thanks to you both for tracking down and fixing these and the other mistakes mentioned above.

Revision history for this message
Dan Scott (denials) wrote :

I've tested Ben's fixed data seed SQL on master:

First, I ran a new 'eg_db_config --create-schema --load-all-sample' to ensure that the new base SQL loaded correctly. There were no issues.

Then I ran through the translation process:

cd build/i18n

# generate the new base POT
make newpot

# update the translation for fr-CA
make LOCALE=fr-CA updatepo

# create an updated SQL file to load fr-CA
make LOCALE=fr-CA project

# load the generated fr-CA SQL
psql -f project/locale/fr-CA/950.data.seed-values.sql evergreen

At this last step, I ran into some errors about duplicates in config.i18n_core again. However, the load was successful, and the errors didn't seem to be harmful. To determine why we were getting duplicates, I examined the results of the "updatepo" step (which adds and merges the new strings with an existing translation) vs. a "newpo" step (which generates a fresh, untranslated PO file).

With the "updatepo" step, you get entries like:

# id::cmf.label__26 id::cza.label__1 id::cza.label__10
# id::cza.label__1 id::cza.label__10 id::cza.label__19
#: 950.data.seed-values.sql:173 950.data.seed-values.sql:410
#: 950.data.seed-values.sql:431
msgid "Title Control Number"
msgstr "Numéro de contrôle pour les titres"

With the "newpo" step, you get entries like:

# id::cmf.label__26 id::cza.label__1 id::cza.label__10
#: 950.data.seed-values.sql:173 950.data.seed-values.sql:410
#: 950.data.seed-values.sql:431
msgid "Title Control Number"
msgstr ""

So it looks like in the merge process, "pot2po" is holding onto prior ID rows, resulting in duplicates getting fed into the final SQL file. But that problem (requiring some level of cleanup, from simply running the output SQL through "sort | uniq") is separate from the one that we're currently fixing, so I'm going to merge the cleaned-up base 950 seed data and corresponding test.

Revision history for this message
Dan Scott (denials) wrote :

Pushed to master. Note that even though we're touching an SQL file, there is no need for an upgrade script; the oils_i18n_gettext() macro is simply a no-op for translation extraction purposes.

Revision history for this message
Dan Scott (denials) wrote :

Pushed to rel_2_12 as well.

Revision history for this message
Dan Scott (denials) wrote :

I've opened bug #1681864 to focus on the cleanup of the db.seed.po files.

Changed in evergreen:
status: Triaged → Fix Committed
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.