poparser should use something else instead of chr for escaped chars

Bug #67138 reported by Carlos Perelló Marín
14
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Carlos Perelló Marín

Bug Description

While importing silva.pot file at https://launchpad.net/products/silva/trunk/+pots/silva we find this msgid:

#: /views/edit/VersionedContent/tab_status_get_status_link_title.py:37
msgid "view \302\253${version_title}\302\273"
msgstr ""

Those escaped characters are non ascii ones and our current parser convert them into a string using chr, the problem is that we use unicode strings internally and chr returns an ascii string that cannot be converted automatically.

The exception we get is:

http://librarian.launchpad.net/4894936/qeGBFfv6r2gj9yDr6Asnk1GdNgf.txt

Changed in rosetta:
assignee: nobody → carlos
importance: Undecided → Medium
status: Unconfirmed → Confirmed
Christian Reis (kiko)
Changed in rosetta:
importance: Medium → High
Revision history for this message
Eric Casteleijn (thisfred) wrote :

Hi, any indication on when this will be fixed? I realize that you guys are busy, but I understand that I understood that it's a one line fix, and it's holding up synchronizing our translations with rosetta and ultimately releasing a new version of our software...

Revision history for this message
Carlos Perelló Marín (carlos) wrote :

Sorry, I planned to do this as a quite fast fix, but went out of my todo list by mistake...

Changed in rosetta:
status: Confirmed → In Progress
Changed in rosetta:
status: In Progress → Fix Committed
Revision history for this message
Eric Casteleijn (thisfred) wrote :

Thanks, this is great! I'll test by reimporting the templates. (Or is this not yet on the live site, and should I wait for that?)

Revision history for this message
Carlos Perelló Marín (carlos) wrote :

No, it's not yet deployed in production.

I'm trying to get it deployed on Monday, but I cannot promise you anything. Once I change the status to 'Fix Released' you should be able to do the upload again.

Christian Reis (kiko)
Changed in rosetta:
importance: High → Critical
Revision history for this message
Christian Reis (kiko) wrote :

Just to elaborate, what actually happens is that we are building a unicode string additively, and doing that with a chr() will blow up if the string is latin-1. It's also interesting to note that:

  - This works for chr() for anything that is pure ascii because unicode + ascii does implicit conversion of the ascii to unicode.
  - It breaks for chr() for the latin-1 range 128-255 because chr() is nice enough to convert integers to ascii strings with latin-1, but the implicit converstion then breaks.
  - It breaks for chr() in the > 255 range because chr() will break with a ValueError.

My only concern about using unichr() is that it is basically assuming that a msgstr will always be unicode (or the ascii or latin-1 subsets). I'm not sure what the POTemplate/POFile format dictates -- if a backslashed sequence of numbers is always unicode or not.

Revision history for this message
Carlos Perelló Marín (carlos) wrote :

As pointed by Bjorn, this is not well fixed/tested. The escaped chars are not recoded as the rest of the file, which means unichr usage is not completely correct here.

I need to think a bit more about this.

Changed in rosetta:
status: Fix Committed → In Progress
Revision history for this message
Carlos Perelló Marín (carlos) wrote :

I just fixed this with a better solution with more tests and more corner cases fixed too.

Changed in rosetta:
status: In Progress → Fix Committed
Revision history for this message
Carlos Perelló Marín (carlos) wrote :

Ok, you should be able to import now your .pot file.

Changed in rosetta:
status: Fix Committed → Fix Released
visibility: private → public
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.