Use xgettext's native support for scheme file string extraction

Bug #790574 reported by Geert Janssens on 2011-05-31
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
intltool
High
Данило Шеган

Bug Description

Intltool-update currently relies on intltool-extract to find translatable strings in guile/scheme files (*.scm). I have several arguments for this:

1. xgettext can handle scm files just fine, if not better than intltool-extract. So there is not need for an intermediate step.

2. xgettext is more flexible in handling whitespace. I have tested this on the gnucash code, which contains a lot of guile scripts with translatable strings.
Running
xgettext -k_ -kN_ *.scm
extracts all strings properly.
On the other hand
intltool-extract abc.scm
for each guile script returns nothing.This is because GnuCash marks the translatable strings with "N_ " while intltool-extract expects "N_" (no whitespace). Removing all the whitespaces makes intltool-extract extract all the strings. This could be fixed in intltool-extract, but it would be a redundant effort because xgettext already supports it.

3. intltool-extract creates an intermediate header file that xgettext parses. This means that the generated pot file has source references to this intermediate file instead of to the original source scm file. This makes it more difficult to look at a string in its original context. Using xgettext directly on the other hand does use the original source scm and hence the generated pot file has source references to the actual location of the string in this original file.

I have attached a patch for this change.

Related branches

Claude Paroz (paroz) wrote :

If I remember right, one blocking issue with intltool and gnucash was the problem of mulitline scheme strings extraction. Do you know if your patch also resolves this issue?

Geert Janssens (gjanssens) wrote :

Thanks for your feedback. I searched for an example multiline scheme string and found one in the aging report:
 (cons #f (sprintf
 (_ "Transactions relating to '%s' contain \
more than one currency. This report is not designed to cope with this possibility.") (gncOwnerGetName owner))))

This used to be extracted to the pot file as:
#. src/report/business-reports/aging.scm
#. src/report/business-reports/gnucash/report/aging.scm
#: ../intl-scm/guile-strings.c:624 ../intl-scm/guile-strings.c:1228
#, c-format
msgid "Transactions relating to '%s' contain more than one currency. This report is not designed to cope with this possibility."
msgstr "De boekingen die betrekking hebben op ‘%s’ bevatten meerdere valuta. Dit rapport is niet op die mogelijkheid toegerust."
(I have added an example translation)

After xgettext extraction and merge with an existing translation it becomes:
#: /kobaltnet/janssege/Development/EclipseGnuCash/GnuCash-git/po/../src/report/business-reports/aging.scm:212
msgid ""
"Transactions relating to '%s' contain more than one currency. This report "
"is not designed to cope with this possibility."
msgstr ""
"De boekingen die betrekking hebben op ‘%s’ bevatten meerdere valuta. Dit "
"rapport is niet op die mogelijkheid toegerust."

Note that I didn't edit the translation file manually. msgmerge seems to recognize a string as identical regardless of it being split over multiple lines and adds line breaks in existing translation by itself as well.

So unless I am missing something, it seems to me that xgettext does support multiline extractions now.

Thanks for all the work in submitting this bug. One benefit of keeping scm handling directly in intltool-extract is that it can work on systems with older xgettext. Do you happen to know when was scheme support introduced in xgettext? If it's sufficiently old, we should just rely on it instead.

Changed in intltool:
status: New → Triaged
importance: Undecided → High

I've looked into this in more detail. It seems the one regression would be that xgettext doesn't extract comments for translators. There are other incompatibilities as well, but they seem to stem from intltool supporting very wildly marked up strings (eg. it will extract a string "that" from code (func "this" _"that")). We'll have to check how this affects existing current users of scm support in intltool.

Above branch applies your patch and introduces missing .scm file detection to "intltool-update -m". I've tested it with gnu cash svn, and it seems to work fine (I got identical files minus the source file references, naturally). I'd appreciate any further testing you could do, but I am likely to do a release including this change today.

Changed in intltool:
milestone: none → 0.42.0
assignee: nobody → Данило Шеган (danilo)
status: Triaged → In Progress
Changed in intltool:
status: In Progress → Fix Committed
Changed in intltool:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers