Commit messages on Windows are interpreted as cp1252 even if they are UTF-8
Bug #610229 reported by
Max Kanat-Alexander
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar |
Confirmed
|
Medium
|
Unassigned |
Bug Description
A developer on the Bugzilla Project recently checked in two commits with garbled characters--UTF-8 that got interpreted as windows-1252 (cp1252) instead. You can see here that he attempted to insert U+2013 (an en-dash) and it got garbled:
http://
He says that he was using the same editor that he uses to edit his localization for Bugzilla, which is definitely in UTF-8.
My suspicion is that bzr is converting to UTF-8 using the terminal encoding, but on Windows the terminal encoding will nearly always be cp1252 or something that isn't UTF-8, even if people are writing in UTF-8.
Bazaar is using the user encoding, which will generally be the right option, but does cause problems like this.
Couple of options. Could add yet another config option for the editor encoding. Using something like the notepad heuristic, which auto-detects unicode encodings and falls back to the windows codepage, would mostly work but leaves the problem of what encoding to *write* to the temporary file. Both still have the potential to mangle the encoding.
Finally, might be worth looking at changing the default to UTF-8 (perhaps with BOM), which most things really should support these days. I still have one editor installed that doesn't support unicode, but everything I use regularly does.