Duplicity doesn't handle non-utf8 filenames well

Bug #1050509 reported by Michael Terry
82
This bug affects 15 people
Affects Status Importance Assigned to Milestone
duplicity (Ubuntu)
Confirmed
Low
Unassigned

Bug Description

This is a break-out bug from bug 989496.

If the user is using a filename encoding that is non-utf8, duplicity doesn't have special support for that. It mixes use of filenames for both printing/logging and for opening. All print/log uses should use a utf8 version of the filename. All actual file operations should use the native encoding.

This will likely involve a new field like pretty_name or something on ROPath.

I suspect the number of users with non-utf8 systems is low. So I'm setting as low priority.

Tags: patch
Revision history for this message
Milan Bouchet-Valat (nalimilan) wrote :

Much thanks for taking care of this regression here and in the other bug! I still think this bug deserves a fix, or at least a better logging of the problematic file. If for some reason a user ends up with a file with an invalid name, there's no way of finding out this is the problem from the GUI, let alone identify the file and fix its name. So backup is impossible.

Revision history for this message
Roman Yepishev (rye) wrote :

Coming from bug 989496:

Using Ubuntu One backend, the remote filenames are delivered from backend.list() are in unicode (json module decodes the utf8 strings into unicode object). Therefore when copy_to_local(fn) tries to log the data using log.Notice(_("Copying %s to local cache.") % fn) the latter error message crashes duplicity. So not only local filesystem encoding should be considered but also the backend output.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in duplicity (Ubuntu):
status: New → Confirmed
Revision history for this message
François Marier (fmarier) wrote :

This is the patch I currently apply every time there's a duplicity upgrade to work around the broken debug statements and carry on with the rest of my backup.

It's certainly not ideal an I'm not suggesting it be accepted upstream, but it may be useful to other users until that bug is fixed.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "Hack to work-around the broken debug statements" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-reviewers team please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
Revision history for this message
Vv (vivien-perez) wrote :

Hello François,

I would like to use your patch to be able to resume my backups.

Could you tell me how to do that without breaking my system? Is it enough to replace the lines with a minus sign at their begining with the corresponding one with a plus sign in the /usr/lib/python2.7/dist-packages/duplicity/collections.py file?

Thanks for your help,

Vv

Revision history for this message
François Marier (fmarier) wrote :

Here's a version of my patch without the unnecessary print statements.

Again, I'm not pretending to solve the problem, but it may help others who are waiting for the official fix.

Revision history for this message
François Marier (fmarier) wrote :

Vv: You are correct. To apply the patch, you can simply remove the lines with minuses and replace them with the lines that start with a plus.

You can also use the "patch -p1 < filename.patch" command, but given there's only 3 lines to touch, it might be easier to do it by hand.

Revision history for this message
Vv (vivien-perez) wrote :

Hello François,

thanks for your help. I did remove/add the specified lines.

The previous error is no more, but there is a new one :

Traceback (most recent call last):
  File "/usr/bin/duplicity", line 1404, in <module>
    with_tempdir(main)
  File "/usr/bin/duplicity", line 1397, in with_tempdir
    fn()
  File "/usr/bin/duplicity", line 1273, in main
    sync_archive(decrypt)
  File "/usr/bin/duplicity", line 1077, in sync_archive
    + "\n" + "\n".join(local_missing))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 44: ordinal not in range(128)

It seems related with the previous one (which was "UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 23: ordinal not in range(128)").

Does anyone has an idea about the origin of the problem?

Thanks in advance,

Vv

Revision history for this message
otto06217 (otto-kesselgulasch) wrote :

Hi, folks,

what a bug!

If I use some other backup space like dropbox folder or gdrive (insync) I got no such error message. It seems to me a bug in Ubuntu One.

Thanks for help.

BTW: Some of my files were created on Windows.

Revision history for this message
Pilot6 (hanipouspilot) wrote :

Priority must not be low. I am unable to upgrade just because of this bug. It affects not onlu non-utf systems, but all systems, where some files were created in Windows.

Revision history for this message
Alexandr Makovksy (mailboxmak) wrote :

Hello, i have yhis one.
Traceback (most recent call last):
  File "/usr/bin/duplicity", line 1404, in <module>
    with_tempdir(main)
  File "/usr/bin/duplicity", line 1397, in with_tempdir
    fn()
  File "/usr/bin/duplicity", line 1248, in main
    action = commandline.ProcessCommandLine(sys.argv[1:])
  File "/usr/lib/python2.7/dist-packages/duplicity/commandline.py", line 994, in ProcessCommandLine
    globals.backend = backend.get_backend(args[0])
  File "/usr/lib/python2.7/dist-packages/duplicity/backend.py", line 161, in get_backend
    return _backends[pu.scheme](pu)
  File "/usr/lib/python2.7/dist-packages/duplicity/backends/u1backend.py", line 74, in __init__
    self.create_volume()
  File "/usr/lib/python2.7/dist-packages/duplicity/backend.py", line 328, in iterate
    return fn(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/duplicity/backends/u1backend.py", line 161, in create_volume
    answer = auth.request(self.volume_uri, http_method="PUT")
  File "/usr/lib/python2.7/dist-packages/ubuntuone-couch/ubuntuone/couch/auth.py", line 152, in request
    url, method=http_method, headers=headers, body=request_body)
  File "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 1543, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 1293, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 1229, in _conn_request
    conn.connect()
  File "/usr/lib/python2.7/dist-packages/httplib2/__init__.py", line 980, in connect
    sock.connect((self.host, self.port))
  File "/usr/lib/python2.7/dist-packages/httplib2/socks.py", line 424, in connect
    self.__negotiatehttp(destpair[0], destpair[1])
  File "/usr/lib/python2.7/dist-packages/httplib2/socks.py", line 374, in __negotiatehttp
    resp = self.recv(1)
timeout: timed out

Revision history for this message
Vv (vivien-perez) wrote :

Hi guys,

following my non resolved problem (reported on bug 989496 that has been recently closed), I have tried to remove unicode characters from the filenames of the photos that I try to backup.

I have checked with convmv (doing a " convmv -r -f utf8 -t ascii ./*" in the backed up directory) and got a confirmation that no non-ascii character was still present.

I still got the same error message from duplicity :
Traceback (most recent call last):
  File "/usr/bin/duplicity", line 1403, in <module>
    with_tempdir(main)
  File "/usr/bin/duplicity", line 1396, in with_tempdir
    fn()
  File "/usr/bin/duplicity", line 1272, in main
    sync_archive(decrypt)
  File "/usr/bin/duplicity", line 1076, in sync_archive
    + "\n" + "\n".join(local_missing))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 44: ordinal not in range(128)

Any help would be very appreciated, I have not been able to backup my files for quite some time now...

Is there at least a way to know which file causes the problem?

Thanks,

Vv

Revision history for this message
Pilot6 (hanipouspilot) wrote :

On a test system I created a folder containing two files:
1. Libreoffice file, created in Ubuntu
2. MS Word file created in Windows
Both foles have Russian names.

I tryed to backup this folder using Deja Dup to UbuntuOne and got same error.

Then I removed the file from Windows and tryed again, but still failed.
Now I can't backup even an empty folder.

Revision history for this message
Pilot6 (hanipouspilot) wrote :

I need to add that there is no such bug in precise.

Revision history for this message
Ibanez (ibanez) wrote :

I've a workaround,

You can change the session language before call to duplicity

declare -x LANG="en_US.UTF-8"

It work for me, my default LANG is "es_ES.UTF-8", and duplicity fails. With "en_US.UTF-8" works.

Revision history for this message
François Marier (fmarier) wrote :

I can confirm that the work-around in comment 16 does work, although I had to add this to my backup script:

  export LANG=en_US.utf8
  export LANGUAGE=
  export LC_CTYPE="en_US.utf8"
  export LC_NUMERIC="en_US.utf8"
  export LC_TIME="en_US.utf8"
  export LC_COLLATE="en_US.utf8"
  export LC_MONETARY="en_US.utf8"
  export LC_MESSAGES="en_US.utf8"
  export LC_PAPER=en_US.UTF-8
  export LC_NAME="en_US.utf8"
  export LC_ADDRESS="en_US.utf8"
  export LC_TELEPHONE="en_US.utf8"
  export LC_MEASUREMENT="en_US.utf8"
  export LC_IDENTIFICATION="en_US.utf8"
  export LC_ALL=

Another locale that can be used to reproduce the problem is fr_CA.utf8.

So this bug has in fact nothing to do with filenames and everything to do with the localized error messages breaking duplicity.

Revision history for this message
Pierre (pierre-fr34) wrote :

Adding only:

export LANG=en_US.utf8

in my backup script works for me. Thanks for the trick.

NB:
I am on precise: duplicity 0.6.18, python 2.7, LANG=fr_FR.UTF-8
This bug does not show when using duplicity on lucid (duplicity 0.6.08b python 2.6, LANG=fr_FR.UTF-8)

Revision history for this message
Vv (vivien-perez) wrote :

Hello,

what exactly do you call your backup script? I launch duplicity with deja-dup from the unity menu, and I don't see how to specify the locale this way.

Thanks for your help,

Vv

Revision history for this message
François Marier (fmarier) wrote :

Vv: my backup script is a shell script that wraps around duplicity. It's roughly what can be found in /usr/share/doc/duplicity/examples/system-backup.gz

Revision history for this message
Vv (vivien-perez) wrote :

Hello François,

thanks for your help. I don't have any system-backup.gz neither "examples" folder in /usr/share/doc/duplicity/.

So I changed the language of my session (from fr_FR.UTF-8 to fr_FR.UTF-8) in the system preferences, and now backups are working again.

Cheers,

Vv

Revision history for this message
Coeur Noir (coeur-noir) wrote :

Hello,

Not sure if this is the same problem :

https://bugs.launchpad.net/ubuntu/+source/duplicity/+bug/1286845/comments/14

line 130, in copy_file
    log.Info(_("Writing %s") % target.get_parse_name())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range(128)

Running Ubuntu 14.04 recently upgraded from 13.10 where duplicity/déjà-dup worked smoothly.

Revision history for this message
Michael Terry (mterry) wrote :

To be clear, this bug is about filenames that are NOT valid utf8. Most user errors in bugs and comments here are about filenames that are utf8 -- but not ascii -- and duplicity having problems with that. But this bug is for those filenames that are truly bizarre.

That said, the fix for both is similar. Ever since adding gettext support, we've used utility functions in util.py to convert between byte and unicode strings. Those functions pass the 'replace' option to decode/encode while they're at it, which gracefully handles non-utf8 characters. As we fix normal utf8 conversion errors, by using those utility functions we also make the non-utf8 cases better.

So where are we today? We've fixed a bunch of UnicodeDecodeErrors throughout duplicity [1]. I don't think we've fixed 100% of them, but I do think we've hit the majority of the use cases by now.

This generic bug might not be super useful anymore. It might be better to close this? And keep using separate bugs for each specific instance of a decode error.

[1] https://bugs.launchpad.net/duplicity/+bugs?field.searchtext=ordinal+not+in+range&orderby=-status&search=Search&field.status%3Alist=NEW&field.status%3Alist=CONFIRMED&field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.status%3Alist=FIXCOMMITTED&field.status%3Alist=FIXRELEASED&field.status%3Alist=INCOMPLETE_WITH_RESPONSE&field.status%3Alist=INCOMPLETE_WITHOUT_RESPONSE

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.