In the case of gnome-panel, this fails during "find_extra_authors". It fails because it iterates the changelog and tries to decode('utf-8') each line, looking for an author.
In the case of gnome-panel, it lists Translators as:
Note that I'm pretty certain this is iso-8859-1 encoding, as '\xe9' => é and '\xee' => î. Not to mention that iso-8859-2 and iso-8859-15 all decode it to the same characters. I guess that means it could be any of them...
Anyway,
#1) These won't match the extra author information anyway, because they aren't in the form [Author Name]. So we could just wait to decode them until after the match is run. The current author regex is:
extra_author_re = re.compile(r"\s*\[([^\]]+)]\s*", re.UNICODE)
Which IIRC, says "leading-space [ anything-but-] ] trailing space".
However, if this sort of data is then brought into the commit log, etc, it is going to fail anyway, when we try to create a Unicode commit message.
#2) Allow the decode to fail, and just assume there isn't an author there.
In the case of gnome-panel, this fails during "find_extra_ authors" . It fails because it iterates the changelog and tries to decode('utf-8') each line, looking for an author.
In the case of gnome-panel, it lists Translators as:
(Pdb) pp changes
['* New upstream version:',
...
' Docs Translators:',
' - Maxim Dziumanenko (uk)',
' Translators:',
' - Vital Khilko (be)',
" - J\xe9r\xe9my Le Floc'h (br)",
' - Pema Geyleg (dz)',
' - Ivar Smolin (et)',
' - Beno\xeet Dejean (fr)',
...
Note that I'm pretty certain this is iso-8859-1 encoding, as '\xe9' => é and '\xee' => î. Not to mention that iso-8859-2 and iso-8859-15 all decode it to the same characters. I guess that means it could be any of them...
Anyway,
#1) These won't match the extra author information anyway, because they aren't in the form [Author Name]. So we could just wait to decode them until after the match is run. The current author regex is: r"\s*\[ ([^\]]+ )]\s*", re.UNICODE)
extra_author_re = re.compile(
Which IIRC, says "leading-space [ anything-but-] ] trailing space".
However, if this sort of data is then brought into the commit log, etc, it is going to fail anyway, when we try to create a Unicode commit message.
#2) Allow the decode to fail, and just assume there isn't an author there.
#3) Fall back to iso-8859-1 as the decoder.