Wrong encoding when using --restore-missing functionality

Bug #1406505 reported by Laurent Bigonville on 2014-12-30
36
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Duplicity
Undecided
Unassigned
Déjà Dup
Undecided
Unassigned
deja-dup (Fedora)
Unknown
Unknown

Bug Description

deja-dup doesn't print the accentuated characters properly when using the --restore-missing functionality

The console show severals lines like: (deja-dup:4666): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text()

duplicity seems to properly return the file names when using "duplicity list-current-files".

I'm using deja-dup 32.0, duplicity 0.6.24

$ locale
LANG=fr_BE.utf8
LANGUAGE=
LC_CTYPE=fr_BE.UTF-8
LC_NUMERIC=fr_BE.utf8
LC_TIME=fr_BE.utf8
LC_COLLATE=fr_BE.UTF-8
LC_MONETARY=fr_BE.utf8
LC_MESSAGES=fr_BE.UTF-8
LC_PAPER=fr_BE.utf8
LC_NAME="fr_BE.utf8"
LC_ADDRESS="fr_BE.utf8"
LC_TELEPHONE="fr_BE.utf8"
LC_MEASUREMENT=fr_BE.utf8
LC_IDENTIFICATION="fr_BE.utf8"
LC_ALL=

Laurent Bigonville (bigon) wrote :

When enabling the debug logs I can see the following:

[...]

DUPLICITY: INFO 10 20141222T224743Z 'home/bigon/Vid\xe9os' dir
DUPLICITY: . Mon Dec 22 23:47:43 2014 home/bigon/Vidéos

(deja-dup:5584): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text()

(deja-dup:5584): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text()

(deja-dup:5584): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text()

(deja-dup:5584): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text()

(deja-dup:5584): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text()
DUPLICITY: INFO 10 20101007T103134Z 'home/bigon/Vid\xe9os/Webcam' dir
DUPLICITY: . Thu Oct 7 12:31:34 2010 home/bigon/Vidéos/Webcam

DUPLICITY: INFO 10 20101007T103515Z 'home/bigon/Vid\xe9os/Webcam/2010-10-07-123134.ogv' reg
DUPLICITY: . Thu Oct 7 12:35:15 2010 home/bigon/Vidéos/Webcam/2010-10-07-123134.ogv
[...]

Laurent Bigonville (bigon) wrote :

Please note that this is not only a cosmetic issue.

The dialog shows files that are actually still present on disk, but it seems that deja-dup fails to recognize them because of the encoding issue.

And the restoration fails with:

DUPLICITY: ERROR 30 UnicodeDecodeError
DUPLICITY: . Traceback (most recent call last):
DUPLICITY: . File "/usr/bin/duplicity", line 1509, in <module>
DUPLICITY: . with_tempdir(main)
DUPLICITY: . File "/usr/bin/duplicity", line 1503, in with_tempdir
DUPLICITY: . fn()
DUPLICITY: . File "/usr/bin/duplicity", line 1352, in main
DUPLICITY: . do_backup(action)
DUPLICITY: . File "/usr/bin/duplicity", line 1437, in do_backup
DUPLICITY: . restore(col_stats)
DUPLICITY: . File "/usr/bin/duplicity", line 703, in restore
DUPLICITY: . % (globals.restore_dir,),
DUPLICITY: . UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 20: ordinal not in range(128)
DUPLICITY: .

deja-dup calls:

DUPLICITY: INFO 1
DUPLICITY: . Args: /usr/bin/duplicity restore --gio --file-to-restore=home/bigon/incapacit�_0614.jpg --time=1419499522 --force smb://admin@diskstation/old/backup/Fornost /home/bigon/incapacit�_0614.jpg --verbosity=9 --gpg-options=--no-use-agent --archive-dir=/home/bigon/.cache/deja-dup --tempdir=/home/bigon/.cache/deja-dup/tmp --log-fd=13

Fixing the call to duplicity (changing the wrong characters with actual "é") allow duplicity to properly restore the file.

So this looks similar to bug #1319713

Laurent Bigonville (bigon) wrote :

After some investigation, I think the issue is in the compress_string() function.

In the following example, the first line is the string (s_in) that is passed as parameter, and the 2nd is the one returned by the function:

'home/bigon/incapacit\xe9_0614.jpg'
'home/bigon/incapacit�_0614.jpg'

Basically the function output ISO-8859 strings instead of UTF-8 ones

Laurent Bigonville (bigon) wrote :

Mhh, 'home/bigon/incapacit\xe9_0614.jpg' is coming from duplicity and is obviously latin1.

When modifying the unf() function from

def ufn(filename):
    "Convert a (bytes) filename to unicode for printing"
    assert not isinstance(filename, unicode)
    return filename.decode(sys.getfilesystemencoding(), 'replace')

to

def ufn(filename):
    "Convert a (bytes) filename to unicode for printing"
    assert not isinstance(filename, unicode)
    return filename.decode('latin1', 'replace')

The string is being encoded as: 'home/bigon/incapacit\xc3\xa9_0614.jpg', which is valid utf8

Mmm, confused here. The current ufn() in duplicity looks like:

def ufn(filename):
    "Convert a (bytes) filename to unicode for printing"
    assert not isinstance(filename, unicode)
    return filename.decode(sys.getfilesystemencoding(), 'replace')

How does this apply to duplicity?

Laurent Bigonville (bigon) wrote :

When running the following command:
/usr/bin/duplicity -v9 list-current-files --gio smb://URL --no-encryption --verbosity=9 --gpg-options=--no-use-agent --archive-dir=/home/bigon/.cache/deja-dup --tempdir=/home/bigon/.cache/deja-dup/tmp --log-fd=1 |grep incapaci

I get the following output:
Wed Jul 16 11:42:36 2014 home/bigon/incapacité_0614.jpg
INFO 10 20140716T094236Z 'home/bigon/incapacit\xe9_0614.jpg' reg
. Wed Jul 16 11:42:36 2014 home/bigon/incapacité_0614.jpg

On the 2nd line, the "é" is encoded as ISO-8859-1 and not UTF-8. This is then read by deja-dup and causing my initial issue.

Shouldn't duplicity output UTF-8 messages only

Yes, it should.

On Thu, Jan 1, 2015 at 11:31 AM, Laurent Bigonville <email address hidden>
wrote:

> When running the following command:
> /usr/bin/duplicity -v9 list-current-files --gio smb://URL --no-encryption
> --verbosity=9 --gpg-options=--no-use-agent
> --archive-dir=/home/bigon/.cache/deja-dup
> --tempdir=/home/bigon/.cache/deja-dup/tmp --log-fd=1 |grep incapaci
>
>
> I get the following output:
> Wed Jul 16 11:42:36 2014 home/bigon/incapacité_0614.jpg
> INFO 10 20140716T094236Z 'home/bigon/incapacit\xe9_0614.jpg' reg
> . Wed Jul 16 11:42:36 2014 home/bigon/incapacité_0614.jpg
>
> On the 2nd line, the "é" is encoded as ISO-8859-1 and not UTF-8. This is
> then read by deja-dup and causing my initial issue.
>
> Shouldn't duplicity output UTF-8 messages only
>
> --
> You received this bug notification because you are subscribed to
> Duplicity.
> https://bugs.launchpad.net/bugs/1406505
>
> Title:
> Wrong encoding when using --restore-missing functionality
>
> Status in Déjà Dup Backup Tool:
> New
> Status in Duplicity - Bandwidth Efficient Encrypted Backup:
> New
> Status in deja-dup package in Fedora:
> Unknown
>
> Bug description:
> deja-dup doesn't print the accentuated characters properly when using
> the --restore-missing functionality
>
> The console show severals lines like: (deja-dup:4666): Pango-WARNING
> **: Invalid UTF-8 string passed to pango_layout_set_text()
>
> duplicity seems to properly return the file names when using
> "duplicity list-current-files".
>
> I'm using deja-dup 32.0, duplicity 0.6.24
>
> $ locale
> LANG=fr_BE.utf8
> LANGUAGE=
> LC_CTYPE=fr_BE.UTF-8
> LC_NUMERIC=fr_BE.utf8
> LC_TIME=fr_BE.utf8
> LC_COLLATE=fr_BE.UTF-8
> LC_MONETARY=fr_BE.utf8
> LC_MESSAGES=fr_BE.UTF-8
> LC_PAPER=fr_BE.utf8
> LC_NAME="fr_BE.utf8"
> LC_ADDRESS="fr_BE.utf8"
> LC_TELEPHONE="fr_BE.utf8"
> LC_MEASUREMENT=fr_BE.utf8
> LC_IDENTIFICATION="fr_BE.utf8"
> LC_ALL=
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/deja-dup/+bug/1406505/+subscriptions
>

Silvara (mistresssilvara) wrote :

I'm not sure it is the same bug, but Deja-dup cannot restore missing files with non-ASCII names: it generates dummy file/folder names where non-ASCII characters are replaced with uXXXX substrings, and fails to restore them. When these files are restored, they are not found, and empty folders are created instead with those junk names.

Vej (vej) wrote :

Marked as duplicate of #1377873, as suggested by Karl Maier.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.