bzr init with files in invalid encodings in the dir crashes without useful diagnostics

Bug #371597 reported by to be removed on 2009-05-04
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

See traceback below.

What I did: I unpacked the jaunty source packages (cf. #165315; I used a variant of the script attached to that bug), then attempted to initialize a bzr branch at the root of all the unpacked source packages. This failed.

liw@gytha$ ../ init --development-rich-root
Unable to load plugin 'gtk'. It requested API version (1, 13, 0) of module <module 'bzrlib' from '/home/liw/UNBACKED/bzr-torture/'> but the minimum exported version is (1, 15, 0), and the maximum is (1, 15, 0)
bzr: ERROR: exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xf8 in position 1: ordinal not in range(128)

Traceback (most recent call last):
  File "/home/liw/UNBACKED/bzr-torture/", line 727, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/home/liw/UNBACKED/bzr-torture/", line 922, in run_bzr
    ret = run(*run_argv)
  File "/home/liw/UNBACKED/bzr-torture/", line 559, in run_argv_aliases
  File "/home/liw/UNBACKED/bzr-torture/", line 1597, in run
  File "/home/liw/UNBACKED/bzr-torture/", line 525, in create_branch_convenience
  File "/home/liw/UNBACKED/bzr-torture/", line 1599, in create_workingtree
    accelerator_tree=accelerator_tree, hardlink=hardlink)
  File "/home/liw/UNBACKED/bzr-torture/", line 1441, in initialize
    hardlink=hardlink, delta_from_tree=True)
  File "/home/liw/UNBACKED/bzr-torture/", line 2035, in build_tree
  File "/home/liw/UNBACKED/bzr-torture/", line 2051, in _build_tree
    for dir, files in wt.walkdirs():
  File "/home/liw/UNBACKED/bzr-torture/", line 2394, in walkdirs
    current_disk =
  File "/home/liw/UNBACKED/bzr-torture/", line 1326, in walkdirs
    names = sorted(_listdir(top))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf8 in position 1: ordinal not in range(128)

bzr 1.15dev on python 2.6.2 (linux2)
arguments: ['../', 'init', '--development-rich-root']
encoding: 'UTF-8', fsenc: 'UTF-8', lang: 'fi_FI.UTF-8'
  bzrtools /usr/lib/python2.6/dist-packages/bzrlib/plugins/bzrtools [1.13]
  dbus /usr/lib/python2.6/dist-packages/bzrlib/plugins/dbus [unknown]
  launchpad /home/liw/UNBACKED/bzr-torture/ [1.15dev]
  netrc_credential_store /home/liw/UNBACKED/bzr-torture/ [1.15dev]
*** Bazaar has encountered an internal error.
    Please report a bug at
    including this traceback, and a description of what you
    were doing when the error occurred.
[status 4]

Robert Collins (lifeless) wrote :

I'm not sure why init is walking the tree; this may not be strictly development-rich-root related - if its not, we should probably drop the priority down. For now though, assigning to jam to get his attention :)

Changed in bzr:
assignee: nobody → John A Meinel (jameinel)
importance: Undecided → High
status: New → Confirmed
John A Meinel (jameinel) wrote :

This is the generic dirstate initialize() code.


Which then calls down to:

TreeTransform.build_tree(basis_tree, wt, accelerator_tree, ...)

Which then does this:
    existing_files = set()
    for dir, files in wt.walkdirs():
        existing_files.update(f[0] for f in files)

I believe the idea is that if you did "bzr co" in that directory, it would try to resolve files that already exist versus files that it is trying to create. It *does* seem silly to do this for 'bzr init', since you shouldn't be trying to create any files.

If it is considered critical, then we could do something in DSRT.initialize() to special case 'revision_id==NULL_REVISION'.

However, I'll also note that while initialize failed, 'bzr add' will fail for exactly the same reason later on. Namely, you have a non-ascii file, which doesn't conform to whatever you claim your filesystem encoding is. (My guess is you have a latin-1 filename and UTF-8 encoding.)

The error you see is because

1) os.listdir(u'unicode-string') is supposed to return unicode strings, with each filename decoded
2) When it encounters a string in the filesystem that cannot be decoded, it returns a plain byte string
3) When doing 'sorted(list_of_mixed_unicode_and_str)' it auto-upcasts the plain strings to unicode, and that fails, because the default in-memory string encoding is ascii, which can thus not handle any non-ascii paths.

Anyway, we've had this bug in 'bzr add' for a long time, such as bug #187267. ATM, we refuse to version files that can't be stored as Unicode strings, and we expect people to have their filesystem encoding set correctly.

To avoid this bug more completely requires a lot of reworking of internals, and also a policy decision as to how we want to handle these things.

Changed in bzr:
assignee: John A Meinel (jameinel) → nobody
importance: High → Medium
John A Meinel (jameinel) wrote :

By the way, Larz, can you confirm that a plain "bzr init" or "bzr init --1.9" triggers the same bug? (And also 'bzr init --1.9-rich-root" if you want to be extra thorough...)

to be removed (liw) wrote :


using revision revision 4325 of "bzr init", "bzr init --1.9", and "bzr init --1.9-rich-root" all fail in the same way.

to be removed (liw) wrote :

I confirm that the source tree has files that have encodings that are not acceptable in my locale. I can rename or delete those before doing "bzr add", but having "bzr init" fail was really surprising.

In any case, bzr is entirely unhelpful in helping me find the problematic names. If this was about the names, I'd suggest printing them out some way, and ignoring them and not crashing.

to be removed (liw) wrote :

After I removed all names with non-ASCII characters, "bzr init --development-rich-root --no-plugins" succeeded.

summary: - bzr init --development-rich-root crashed
+ bzr init with files in invalid encodings in the dir crashes without
+ useful diagnostics
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers