"bzr add" crashed: UnicodeDecodeError in smart_add with ascii codec

Bug #715547 reported by Malek Ghantous
74
This bug affects 10 people
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Medium
Unassigned
Breezy
Fix Released
Medium
Jelmer Vernooij

Bug Description

I started with a directory of code (and data), did "bzr init", then "bzr add" in the top-level directory. The error occured in the "Data" directory, and seems to be because of some strange file names:
---
maltron@ocelotl:~/chalikov/cs/Data$ ls
int phasvelK?ph01 results spectrumph03 surface??ph00
integra phasvelo`ph00 results05 summary surface??ph00
integra05 phasvel??ph00 resultsK?ph01 summaryph00 surface??ph00
integraK?ph01 phasvel??ph00 resultso`ph00 summaryph01 timeseries
integrao`ph00 phasvel??ph00 results??ph00 summaryph02 timeseriesph00
integra??ph00 res results??ph00 summaryph03 timeseriesph01
integra??ph00 RESTART results??ph00 sur timeseriesph02
---
Removing those files solved the problem (the files names are admittedly the result of some buggy code I wrote). By way of comparison I tried using git for these files and it worked. Traceback below:

---
maltron@ocelotl:~/chalikov/cs$ bzr add
adding CS_model_unmodified_from_alina
adding Data
bzr: failed to report crash using apport:
     OSError(13, 'Permission denied')
bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xfd in position 7: ordinal not in range(128)

Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 912, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 1112, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 690, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 705, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/usr/lib/python2.6/dist-packages/bzrlib/cleanup.py", line 135, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/usr/lib/python2.6/dist-packages/bzrlib/cleanup.py", line 165, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/usr/lib/python2.6/dist-packages/bzrlib/builtins.py", line 690, in run
    no_recurse, action=action, save=not dry_run)
  File "/usr/lib/python2.6/dist-packages/bzrlib/mutabletree.py", line 50, in tree_write_locked
    return unbound(self, *args, **kwargs)
  File "/usr/lib/python2.6/dist-packages/bzrlib/mutabletree.py", line 549, in smart_add
    for subf in sorted(os.listdir(abspath)):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfd in position 7: ordinal not in range(128)

bzr 2.2.1 on python 2.6.6 (Linux-2.6.35-25-generic-x86_64-with-Ubuntu-10.10-maverick)
arguments: ['/usr/bin/bzr', 'add']
encoding: 'UTF-8', fsenc: 'UTF-8', lang: 'en_AU.utf8'
plugins:
  bash_completion /usr/lib/python2.6/dist-packages/bzrlib/plugins/bash_completion [2.2.1]
  bzrtools /usr/lib/python2.6/dist-packages/bzrlib/plugins/bzrtools [2.2.0]
  launchpad /usr/lib/python2.6/dist-packages/bzrlib/plugins/launchpad [2.2.1]
  netrc_credential_store /usr/lib/python2.6/dist-packages/bzrlib/plugins/netrc_credential_store [2.2.1]
  news_merge /usr/lib/python2.6/dist-packages/bzrlib/plugins/news_merge [2.2.1]

*** Bazaar has encountered an internal error. This probably indicates a
    bug in Bazaar. You can help us fix it by filing a bug report at
        https://bugs.launchpad.net/bzr/+filebug
    including this traceback and a description of the problem.

Related branches

Revision history for this message
Martin Pool (mbp) wrote :

To judge from ls showing question marks, these file names aren't correct utf-8? Is that correct?

bzr only versions file with names that are supported by the filesystem encoding, so that we know how to decode them later. bzr 2.2.3 and later give a better message.

Revision history for this message
Malek Ghantous (malektronic) wrote : Re: [Bug 715547] Re: "bzr add" crashed ERROR: exceptions.UnicodeDecodeError

You're probably right:

maltron@ocelotl:~/chalikov/cs/Data$ ls results\?\?ph00
ls: cannot access results??ph00: No such file or directory

despite:

maltron@ocelotl:~/chalikov/cs/Data$ ls results*
results results05 resultsK?ph01 resultso`ph00 results??ph00
results??ph00 results??ph00

So fair enough. I was curious as to why git would accept them; here's
the commit message prepared by git when I did git commit:

...
# new file: results
# new file: results05
# new file: "resultsK\310ph01"
# new file: resultso`ph00
# new file: "results\235\345ph00"
# new file: "results\347\027ph00"
# new file: "results\375\367ph00"

I don't know much about character encoding, so I don't know what
characters those numbers refer to. Clearly these aren't file names
I'd want to live with, but I thought the contrast in behaviour was
curious. From what you've told me, the bazaar behaviour is more
sensible and "safer" - if it had merely said "don't do that" rather
than crashed and said "bug" I wouldn't have reported the bug!

2011/2/9 Martin Pool <email address hidden>:
> *** This bug is a duplicate of bug 686611 ***
>    https://bugs.launchpad.net/bugs/686611
>
> To judge from ls showing question marks, these file names aren't correct
> utf-8?  Is that correct?
>
> bzr only versions file with names that are supported by the filesystem
> encoding, so that we know how to decode them later.  bzr 2.2.3 and later
> give a better message.
>
> ** This bug has been marked a duplicate of bug 686611
>   `bzr add file1 file2` in non-ascii folder fails, but `bzr add file1` works
>  * You can subscribe to bug 686611 by following this link: https://bugs.launchpad.net/bzr/+bug/686611/+subscribe
>
> --
> You received this bug notification because you are a direct subscriber
> of the bug.
> https://bugs.launchpad.net/bugs/715547
>
> Title:
>  "bzr add" crashed ERROR: exceptions.UnicodeDecodeError
>

Revision history for this message
Martin Packman (gz) wrote : Re: "bzr add" crashed ERROR: exceptions.UnicodeDecodeError

This is not the same issue as bug 686611. Rather, it's the same as bug 77657, which appears either to not actually ever been fixed, or regressed since.

Revision history for this message
Martin Packman (gz) wrote :

mkdir badenc
cd badenc
python -c "file('\xe9','w').close()"
bzr init
bzr add

Revision history for this message
Martin Packman (gz) wrote :

See also bug 589008 which interestingly suggests the bad filename was previously getting past this point and failing later, so the problem has been moving around across bzr versions.

Changed in bzr:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Saša Janiška (gour) wrote :

I was pointed at this bug although I reported at https://bugs.launchpad.net/bzr-diffstat/+bug/56680.

The exception was raised while adding files to the newly init-ed repo. Here is the trace:

bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xbe in position 6: ordinal not in range(128)

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/bzrlib/commands.py", line 926, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/bzrlib/commands.py", line 1126, in run_bzr
    ret = run(*run_argv)
  File "/usr/local/lib/python2.7/site-packages/bzrlib/commands.py", line 691, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/local/lib/python2.7/site-packages/bzrlib/commands.py", line 713, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/bzrlib/cleanup.py", line 135, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/bzrlib/cleanup.py", line 165, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/bzrlib/builtins.py", line 650, in run
    no_recurse, action=action, save=not dry_run)
  File "/usr/local/lib/python2.7/site-packages/bzrlib/mutabletree.py", line 50, in tree_write_locked
    return unbound(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/bzrlib/mutabletree.py", line 558, in smart_add
    for subf in sorted(os.listdir(abspath)):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xbe in position 6: ordinal not in range(128)

bzr 2.3.3 on python 2.7.1 (FreeBSD-9.0-CURRENT-amd64-64bit-ELF)
arguments: ['/usr/local/bin/bzr', 'add', '.']
plugins: bash_completion[2.3.3], bzrtools[2.3.1], colo[0.2.1],
    explorer[1.1.2], fastimport[0.11.0dev], git[0.6.0], launchpad[2.3.3],
    netrc_credential_store[2.3.3], news_merge[2.3.3], qbzr[0.20.0]
encoding: 'UTF-8', fsenc: 'UTF-8', lang: 'en_US.UTF-8'

*** Bazaar has encountered an internal error. This probably indicates a
    bug in Bazaar. You can help us fix it by filing a bug report at
        https://bugs.launchpad.net/bzr/+filebug
    including this traceback and a description of the problem.

Revision history for this message
Martin Packman (gz) wrote :

For people encountering this, you need to check two things to resolve the issue.

1) Your locale must be set correctly. If the bzr crash report says your fsenc is 'ANSI_X3.4-1968' rather than 'UTF-8' double check the LANG and LC_* variables are set correctly.

2) All the filenames you are trying to add must be in the encoding specified. With a UTF-8 filesystem, à for instance needs to be the byte sequence '\xc3\xa0' rather than '\xe0'.

Revision history for this message
Martin Pool (mbp) wrote :

These are interesting crashes because in both cases the fsenc is utf-8, and yet we seem to be trying to decode the listdir result as ascii. So I think this is a more specific bug than just having the encoding set wrong (bug 794353) or invalid names (bug 63324).

summary: - "bzr add" crashed ERROR: exceptions.UnicodeDecodeError
+ "bzr add" crashed: UnicodeDecodeError in smart_add with ascii codec
Revision history for this message
John C Barstow (jbowtie) wrote :

This is most likely due a implicit string conversion somewhere, as the Python 2.x default is to assume an ascii encoding if none is explicitly set.

Revision history for this message
Gustaf (g-rantila) wrote :

This is insanely stupd and has been pissing me off for some time now after having regressed back and forth for *years*.

If you don't feel like waiting for someone to actually permanently fix this:
* Edit (as root) /usr/lib/python2.7/site.py (or whatever python version you're using, default in Oneiric is likely this)
* Search for the line 'encoding = "ascii"'
* Change "ascii" to "utf-8"
Now, I have absolutely no idea whatever side effects this will give you, but it works for me. FYI, I'm not using python for anything but bzr (and whatever desktop applications using it), if you're actively using it for important work, this change might be harmful.
Since this hasn't been fixed for quite a few days now, I just had to do the above in wait for a real fix somewhere. Please beware.

Revision history for this message
Martin Packman (gz) wrote :

Yeah, don't do that. You're moving a bug from a safe place before the repository is edited to... potentially corrupting things.

Also, editing site.py is like old sys.setdefaultencoding trick, but affects all the python packages on your machine rather than just the one script. To understand what's actually happening here read the following and the related posts:
<http://tarekziade.wordpress.com/2008/01/08/syssetdefaultencoding-is-evil/>

Revision history for this message
Philipp Noack (philipp-noack-b) wrote :

Still having that bug in bzr!

Revision history for this message
Chucky (lechuck) wrote :

And so do I !

Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
Jelmer Vernooij (jelmer)
Changed in brz:
status: New → Triaged
importance: Undecided → Medium
tags: removed: check-for-breezy
Jelmer Vernooij (jelmer)
Changed in brz:
status: Triaged → Fix Committed
assignee: nobody → Jelmer Vernooij (jelmer)
milestone: none → 3.0.0
Jelmer Vernooij (jelmer)
Changed in brz:
status: Fix Committed → Fix Released
status: Fix Released → Fix Committed
Jelmer Vernooij (jelmer)
Changed in brz:
status: Fix Committed → Fix Released
status: Fix Released → Confirmed
status: Confirmed → Fix Committed
Changed in bzr:
status: Confirmed → Opinion
status: Opinion → Invalid
status: Invalid → Fix Released
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Hi Jaroslavas,

When did this get fixed in bzr?

Changed in brz:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.