One Hundred Papercuts

Bug #989496
Comment #47

Comment 47 for bug 989496

Revision history for this message

Milan Bouchet-Valat (nalimilan) wrote on 2013-08-08:

#47

duplicity.patch Edit (8.5 KiB, text/plain)

I think I have found a fix.

The bug does not happen only with invalid UTF-8 filenames, you simply need UTF-8 filenames and a UTF-8 locale.

For example, in collections.py:810, there is:
log.Debug(_("File %s is not part of a known set; creating new set") % (filename,))

On my system, when this fails (see error below), the _() string is a str object encoded in UTF-8; filename is a unicode object. The error below happens while Python encodes filename into an ASCII str object. If the _() string is a unicode object too, no encoding into a str object happens at this stage, and everything works. This can be achieved by setting gettext up differently in __init__.py, by passing unicode=True to gettext.install(). This is the solution recommended by the author of gettext for Python:
http://www.wefearchange.org/2012/06/the-right-way-to-internationalize-your.html

This change requires a few modifications in other places so that only unicode strings are passed to the logger. I'm attaching a diff of quick and dirty changes I applied to demonstrate the idea.

Any chance to get some attention for this bug? This makes duplicity completely unusable on my system for more than a year.

This is with duplicity 0.6.21 on Fedora 19.

Traceback (most recent call last):
  File "/usr/bin/duplicity", line 1411, in <module>
    with_tempdir(main)
  File "/usr/bin/duplicity", line 1404, in with_tempdir
    fn()
  File "/usr/bin/duplicity", line 1257, in main
    action = commandline.ProcessCommandLine(sys.argv[1:])
  File "/usr/lib64/python2.7/site-packages/duplicity/commandline.py", line 981, in ProcessCommandLine
    args = parse_cmdline_options(cmdline_list)
  File "/usr/lib64/python2.7/site-packages/duplicity/commandline.py", line 644, in parse_cmdline_options
    log.Info(_("Using archive dir: %s") % (globals.archive_dir.name,))
  File "/usr/lib64/python2.7/site-packages/duplicity/log.py", line 106, in Info
    Log(s, INFO, code, extra)
  File "/usr/lib64/python2.7/site-packages/duplicity/log.py", line 74, in Log
    _logger.log(DupToLoggerLevel(verb_level), s.decode("utf8", "ignore"))
  File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 16: ordinal not in range(128)

I think I have found a fix.

The bug does not happen only with invalid UTF-8 filenames, you simply need UTF-8 filenames and a UTF-8 locale.

For example, in collections.py:810, there is:
                log.Debug(_("File %s is not part of a known set; creating new set") % (filename,))

This change requires a few modifications in other places so that only unicode strings are passed to the logger. I'm attaching a diff of quick and dirty changes I applied to demonstrate the idea.

Any chance to get some attention for this bug? This makes duplicity completely unusable on my system for more than a year.

This is with duplicity 0.6.21 on Fedora 19.