UnicodeEncodeError in _comparison_data on unrepresentable filename

Bug #77533 reported by Ramon Diaz-Uriarte
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Low
Unassigned

Bug Description

to reproduce the bug do:

- open emacs, and open an emacs shell;
- touch a file with a "weird" character; for instance touch f1.ç
- bzr status will then crash

my locale settings:

LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

Under an xterm the filename shows correctly; under the emacs shell it doesn't (a \347 instead of ç)

Tags: unicode
Revision history for this message
John A Meinel (jameinel) wrote :

I can confirm that there is something that needs to be fixed. Specifically:
$ bzr init; echo foo > "å.txt"; bzr add; bzr commit -m "added"
$ bzr status
$ LANG=C bzr status
bzr: ERROR: exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 27: ordinal not in range(128)

Traceback (most recent call last):
  File "/home/jameinel/dev/bzr/bzr.dev/bzrlib/commands.py", line 650, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/home/jameinel/dev/bzr/bzr.dev/bzrlib/commands.py", line 612, in run_bzr
    ret = run(*run_argv)
  File "/home/jameinel/dev/bzr/bzr.dev/bzrlib/commands.py", line 304, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/home/jameinel/dev/bzr/bzr.dev/bzrlib/commands.py", line 622, in ignore_pipe
    result = func(*args, **kwargs)
  File "/home/jameinel/dev/bzr/bzr.dev/bzrlib/builtins.py", line 171, in run
    short=short)
  File "/home/jameinel/dev/bzr/bzr.dev/bzrlib/status.py", line 139, in show_tree_status
    specific_files=specific_files)
  File "/home/jameinel/dev/bzr/bzr.dev/bzrlib/tree.py", line 87, in changes_from
    include_root=include_root
  File "/home/jameinel/dev/bzr/bzr.dev/bzrlib/decorators.py", line 38, in read_locked
    return unbound(self, *args, **kwargs)
  File "/home/jameinel/dev/bzr/bzr.dev/bzrlib/tree.py", line 459, in compare
    specific_file_ids, include_root)
  File "/home/jameinel/dev/bzr/bzr.dev/bzrlib/delta.py", line 184, in _compare_trees
    specific_file_ids):
  File "/home/jameinel/dev/bzr/bzr.dev/bzrlib/tree.py", line 513, in _iter_changes
    to_kind, to_executable, to_stat = \
  File "/home/jameinel/dev/bzr/bzr.dev/bzrlib/workingtree.py", line 1310, in _comparison_data
    stat_value = os.lstat(abspath)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 27: ordinal not in range(128)

bzr 0.14.0dev0 on python 2.4.4.candidate.1 (linux2)
arguments: ['/home/jameinel/bin/bzr', 'st']

** please send this report to <email address hidden>

We are getting an error during os.lstat() because I'm guessing that abspath is trying to convert to an ascii filename.

Changed in bzr:
importance: Undecided → Low
status: Unconfirmed → Confirmed
Revision history for this message
John A Meinel (jameinel) wrote :

Just to clarify, there is a real bug here, in that we should be giving a better error, or possibly ignoring the file. Certainly there is a problem that a file can be created which we cannot understand, but we should either try to continue without that file, or give a simple error as to what file is causing us problems.

Revision history for this message
Fábio (machado2) wrote :

This is a patch for status to warn that there is an invalid filename, and ignore this file.

Revision history for this message
Fábio (machado2) wrote :

The patch above wont prevent a crash if the invalid file is added, it just prevents status from crashing with an unversioned invalid file.

Revision history for this message
Dan Watkins (oddbloke) wrote :

Fábio, attaching this to an email sent to the bzr mailing-list with a subject line beginning with '[MERGE]' will add it to Bundle Buggy[0] which is where most code review for bzr takes place.

[Footnote 0: http://bundlebuggy.aaronbentley.com]

Martin Pool (mbp)
summary: - bzr crashes with invalid filenames
+ UnicodeEncodeError in _comparison_data on unrepresentable filename
tags: added: unicode
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
Jelmer Vernooij (jelmer)
tags: removed: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.