please stop smashing valid UTF-8 to ASCII in .changes files
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Fix Released
|
Medium
|
Julian Edwards |
Bug Description
Launchpad smashes valid UTF-8 in .changes files to ASCII when sending them out to the announcement list. The result of this is often sheer gibberish. For example:
https:/
The string rendered there as "KAazeaka" was in fact "Қазақ"; the mangling made it incomprehensible.
An example from today:
https:/
Instead of the original:
- Use × rather than x in progress bar
... we got:
- Use x rather than x in progress bar
... which made the changelog entry completely meaningless and meant I had to go and fetch the .diff.gz to figure out what it was talking about.
There's no good reason to smash valid UTF-8 to ASCII in .changes files when sending it out to the announcement list; katie didn't do it and I don't think Launchpad should either. Aside from the semantic problems above, it also breaks GPG signatures, which is problematic for Ubuntu developers when we want to figure out from the announcement list who signed a given upload (who in the case of sponsorship might not be mentioned in the Maintainer or Changed-By fields). There *is* a case for checking for invalid UTF-8 (most commonly, ISO-8859-1), although mangling that would also break GPG signatures. If it's only required to make the Content-Type in outgoing mail true, then I honestly suggest just leaving the data alone; if it's required for Launchpad's database or something, then perhaps you could only mangle the data if it's not already valid UTF-8.
Changed in soyuz: | |
assignee: | cprov → julian-edwards |
Changed in soyuz: | |
status: | Fix Committed → Fix Released |
Ok, soon we should get people involved with the lp standard mail dispatcher and sort out this issue. The changefile content is preserved in librarian.