Non-UTF-8 filenames cause infinite loop

Bug #317604 reported by mb
4
Affects Status Importance Assigned to Milestone
Déjà Dup
Fix Released
High
Michael Terry

Bug Description

Backing up a filesystem with non-utf-8 filenames causes the warning "Invalid byte sequence in conversion input" to appear on the console over and over again. Deja Dup seems hung in that state.

Original description
====================
Hello,

I am getting the following warnings when trying to backup my home directory:

** (deja-dup:6632): WARNING **: Duplicity.vala:235: Invalid byte sequence in conversion input

** (deja-dup:6632): WARNING **: Duplicity.vala:235: Invalid byte sequence in conversion input

** (deja-dup:6632): WARNING **: Duplicity.vala:235: Invalid byte sequence in conversion input

and so on...

regards

** (deja-dup:6716): DEBUG: Duplicity.vala:154: Running the following duplicity command: ionice -c3 duplicity --no-encryption --exclude=/home/maximilian/.cache --exclude=/home/maximilian/.thumbnails --exclude=/home/maximilian/.gvfs --exclude=/tmp --exclude=/proc --exclude=/sys --include=/media/Daten/Dokumente --include=/home/maximilian --exclude=** / ssh://sbackup@192.168.2.100:22//media/sdb1/fileserver/backups/deja-dup/max-notebook --ssh-askpass --ssh-options=-oStrictHostKeyChecking=no --dry-run --verbosity=9 --volsize=1 --log-fd=17

Revision history for this message
Michael Terry (mterry) wrote :

What locale are you using? i.e. what is the output of "echo $LANG"?

Changed in deja-dup:
status: New → Incomplete
Revision history for this message
mb (maxbloemer) wrote :

normally I am using "de_DE.UTF-8" which brings up the following german error message:
** (deja-dup:9649): WARNING **: Duplicity.vala:235: Ungültige Bytefolge in Konvertierungseingabe

after switching to "en_GB.UTF-8":
** (deja-dup:7224): WARNING **: Duplicity.vala:235: Invalid byte sequence in conversion input

Revision history for this message
Michael Terry (mterry) wrote :

OK, hmm. It appears to be an encoding issue. But you're using UTF-8, which should be fine.

Maybe it's a filename. Would you have non-UTF-8 filenames hanging around? You could see the exact text that Deja Dup is failing on by running it with the environment variable DEJA_DUP_DEBUG=1.

This will generate *lots* of spew. But we're only interested the lines leading up to this message. Is there anything interesting there?

Revision history for this message
mb (maxbloemer) wrote :

Can I send you this log via email?

Revision history for this message
Michael Terry (mterry) wrote :

Sure, to <email address hidden>

Or, if size is your concern, you could just cut out the 'interesting' parts, which I would say is the 50 or so lines above the first time this warning appears.

Michael Terry (mterry)
description: updated
Changed in deja-dup:
importance: Undecided → Low
status: Incomplete → Confirmed
Michael Terry (mterry)
description: updated
Changed in deja-dup:
importance: Low → Medium
Revision history for this message
Michael Terry (mterry) wrote :

OK, after following up with Maximilian via email, I figured out how to reproduce this [1] and how to fix it.

To reproduce, type:
touch $'P\x93'
in a terminal. This will give you a filename that is illegal utf-8. Then try to back up that file.

The solution is to not assume all duplicity output is utf-8 (obviously). This means that for anything that we know isn't a filename, we should strip all illegal characters, and replace them with the unicode 'missing character' character. For filenames, we want to leave them as illegal utf-8, since that is in fact the filename. This involves cleverer unescaping of python's string-escape encoding than we had before.

Fix committed to bzr trunk.

Changed in deja-dup:
assignee: nobody → mterry
importance: Medium → High
milestone: none → 6.0
status: Confirmed → Fix Committed
Michael Terry (mterry)
Changed in deja-dup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.