The bug does not happen only with invalid UTF-8 filenames, you simply need UTF-8 filenames and a UTF-8 locale.
For example, in collections.py:810, there is: log.Debug(_("File %s is not part of a known set; creating new set") % (filename,))
On my system, when this fails (see error below), the _() string is a str object encoded in UTF-8; filename is a unicode object. The error below happens while Python encodes filename into an ASCII str object. If the _() string is a unicode object too, no encoding into a str object happens at this stage, and everything works. This can be achieved by setting gettext up differently in __init__.py, by passing unicode=True to gettext.install(). This is the solution recommended by the author of gettext for Python: http://www.wefearchange.org/2012/06/the-right-way-to-internationalize-your.html
This change requires a few modifications in other places so that only unicode strings are passed to the logger. I'm attaching a diff of quick and dirty changes I applied to demonstrate the idea.
Any chance to get some attention for this bug? This makes duplicity completely unusable on my system for more than a year.
This is with duplicity 0.6.21 on Fedora 19.
Traceback (most recent call last):
File "/usr/bin/duplicity", line 1411, in <module>
with_tempdir(main)
File "/usr/bin/duplicity", line 1404, in with_tempdir
fn()
File "/usr/bin/duplicity", line 1257, in main
action = commandline.ProcessCommandLine(sys.argv[1:])
File "/usr/lib64/python2.7/site-packages/duplicity/commandline.py", line 981, in ProcessCommandLine
args = parse_cmdline_options(cmdline_list)
File "/usr/lib64/python2.7/site-packages/duplicity/commandline.py", line 644, in parse_cmdline_options
log.Info(_("Using archive dir: %s") % (globals.archive_dir.name,))
File "/usr/lib64/python2.7/site-packages/duplicity/log.py", line 106, in Info
Log(s, INFO, code, extra)
File "/usr/lib64/python2.7/site-packages/duplicity/log.py", line 74, in Log
_logger.log(DupToLoggerLevel(verb_level), s.decode("utf8", "ignore"))
File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 16: ordinal not in range(128)
I think I have found a fix.
The bug does not happen only with invalid UTF-8 filenames, you simply need UTF-8 filenames and a UTF-8 locale.
For example, in collections.py:810, there is:
log.Debug( _("File %s is not part of a known set; creating new set") % (filename,))
On my system, when this fails (see error below), the _() string is a str object encoded in UTF-8; filename is a unicode object. The error below happens while Python encodes filename into an ASCII str object. If the _() string is a unicode object too, no encoding into a str object happens at this stage, and everything works. This can be achieved by setting gettext up differently in __init__.py, by passing unicode=True to gettext.install(). This is the solution recommended by the author of gettext for Python: www.wefearchang e.org/2012/ 06/the- right-way- to-internationa lize-your. html
http://
This change requires a few modifications in other places so that only unicode strings are passed to the logger. I'm attaching a diff of quick and dirty changes I applied to demonstrate the idea.
Any chance to get some attention for this bug? This makes duplicity completely unusable on my system for more than a year.
This is with duplicity 0.6.21 on Fedora 19.
Traceback (most recent call last): duplicity" , line 1411, in <module> tempdir( main) duplicity" , line 1404, in with_tempdir duplicity" , line 1257, in main ProcessCommandL ine(sys. argv[1: ]) python2. 7/site- packages/ duplicity/ commandline. py", line 981, in ProcessCommandLine options( cmdline_ list) python2. 7/site- packages/ duplicity/ commandline. py", line 644, in parse_cmdline_ options Info(_( "Using archive dir: %s") % (globals. archive_ dir.name, )) python2. 7/site- packages/ duplicity/ log.py" , line 106, in Info python2. 7/site- packages/ duplicity/ log.py" , line 74, in Log log(DupToLogger Level(verb_ level), s.decode("utf8", "ignore")) python2. 7/encodings/ utf_8.py" , line 16, in decode utf_8_decode( input, errors, True)
File "/usr/bin/
with_
File "/usr/bin/
fn()
File "/usr/bin/
action = commandline.
File "/usr/lib64/
args = parse_cmdline_
File "/usr/lib64/
log.
File "/usr/lib64/
Log(s, INFO, code, extra)
File "/usr/lib64/
_logger.
File "/usr/lib64/
return codecs.
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 16: ordinal not in range(128)