bzr-git can't handle \f in names

Bug #882396 reported by Eli Zaretskii
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar Git Plugin
Fix Released
Medium
Jelmer Vernooij

Bug Description

"bzr branch git://git.savannah.gnu.org/emacs.git" crashes half-way through the "fetching revisions" stage, with this error:

  bzr: ERROR: The key 'admin/a^Lb' is not a valid key.

The "^L" is a literal Control-L (form-feed) character.

This is on Windows XP with the latest bzr-git 0.6.2 and dulwich 0.8.0.

Here's the traceback from .bzr.log:

20824.676 Traceback (most recent call last):
  File "bzrlib\commands.pyo", line 946, in exception_to_return_code
  File "bzrlib\commands.pyo", line 1150, in run_bzr
  File "bzrlib\commands.pyo", line 699, in run_argv_aliases
  File "bzrlib\commands.pyo", line 721, in run
  File "bzrlib\cleanup.pyo", line 135, in run_simple
  File "bzrlib\cleanup.pyo", line 165, in _do_with_cleanups
  File "bzrlib\builtins.pyo", line 1307, in run
  File "C:/Program Files/Bazaar/plugins\git\dir.py", line 169, in sprout
  File "C:/Program Files/Bazaar/plugins\git\fetch.py", line 639, in fetch_objects
  File "C:/Program Files/Bazaar/plugins\git\fetch.py", line 476, in import_git_objects
  File "C:/Program Files/Bazaar/plugins\git\fetch.py", line 375, in import_git_commit
  File "C:/Program Files/Bazaar/plugins\git\fetch.py", line 283, in import_git_tree
  File "C:/Program Files/Bazaar/plugins\git\fetch.py", line 295, in import_git_tree
  File "C:/Program Files/Bazaar/plugins\git\fetch.py", line 161, in import_git_blob
  File "bzrlib\groupcompress.pyo", line 1661, in insert_record_stream
  File "bzrlib\groupcompress.pyo", line 1856, in _insert_record_stream
  File "bzrlib\groupcompress.pyo", line 1738, in flush
  File "bzrlib\groupcompress.pyo", line 2064, in add_records
  File "bzrlib\btree_index.pyo", line 239, in add_nodes
  File "bzrlib\btree_index.pyo", line 170, in add_node
  File "bzrlib\index.pyo", line 196, in _check_key_ref_value
  File "bzrlib\index.pyo", line 114, in _check_key
BadIndexKey: The key 'admin/a
b' is not a valid key.

20824.770 return code 3

Related branches

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

bzr doesn't support non-utf8 filenames

affects: bzr-git → bzr
Revision history for this message
Eli Zaretskii (eliz) wrote :

Sorry, I don't understand the response: all the files in the admin/ directory of the Emacs repo have pure ASCII names. Here they are:

$ ls -xF admin/
CPP-DEFINES ChangeLog FOR-RELEASE
MAINTAINERS README admin.el
alloc-colors.c build-configs* bzrmerge.el
charsets/ check-doc-strings cus-test.el
diff-tar-files* emacs-pretesters grammars/
make-announcement* make-changelog-diff* make-emacs*
make-tarball.txt notes/ nt/
quick-install-emacs* unidata/

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

It doesn't have to be in the current tree, it can be at any point in history that such a file existed.

Revision history for this message
Eli Zaretskii (eliz) wrote :

But then how does bzr live with that file in the bzr repo that I have on my disk?

Anyway, I very much doubt that we ever had a non-ASCII file in that directory (or elsewhere in Emacs). Is there any bzr command that I could run on the bzr repo I have to find out what is this file? Like listing all the files that ever were in the rpo? Or maybe there's a git command to do that which I could run on the git clone of the repo, which I have on a GNU/Linux system?

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 882396] Re: bzr-git crashes when cloning Emacs git repo

I don't think this is specifically an issue about non-utf8 filenames.
\u000c, ascii formfeed, is a utf-8 character too.

Running with BZR_PDB=1 and then typing 'print locals()' and 'up' in
the debugger should give you some idea of the context of the problem.

emacs idioms tend to encourage use of ^L as a page break or other kind
of marker so it's quite plausible to me that someone would have either
accidentally or intentionally created a file or other object with that
name.

The specific place this is trapping is a check that bzr internal index
elements don't have whitespace characters in them.

My theory is that when bzr-git maps git names to bzr ids it escapes
whitespace but not ^L.

Martin

On 28 October 2011 04:52, Eli Zaretskii <email address hidden> wrote:
> But then how does bzr live with that file in the bzr repo that I have on
> my disk?
>
> Anyway, I very much doubt that we ever had a non-ASCII file in that
> directory (or elsewhere in Emacs).  Is there any bzr command that I
> could run on the bzr repo I have to find out what is this file?  Like
> listing all the files that ever were in the rpo?  Or maybe there's a git
> command to do that which I could run on the git clone of the repo, which
> I have on a GNU/Linux system?
>
> --
> You received this bug notification because you are a member of bzr-core,
> which is subscribed to Bazaar.
> https://bugs.launchpad.net/bugs/882396
>
> Title:
>  bzr-git crashes when cloning Emacs git repo
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/bzr/+bug/882396/+subscriptions
>
>

Revision history for this message
Eli Zaretskii (eliz) wrote : Re: bzr-git crashes when cloning Emacs git repo

Here's what I got:

C:\usr\eliz\bzr>bzr branch git://git.savannah.gnu.org/emacs.git
bzr: ERROR: The key 'admin/a♀b' is not a valid key.
**** entering debugger
> c:\usr\eliz\bzr\bzrlib\index.pyo(114)_check_key()
(Pdb) print locals()
{'self': <bzrlib.btree_index.BTreeBuilder object at 0x0344F690>, 'key': StaticTuple('admin/a\x0cb', 'git-v1:24d3a2befc699b1388d2cf2d1a75ac8dcf510bff'), 'element': 'admin/a\x0cb'}
(Pdb) up
> c:\usr\eliz\bzr\bzrlib\index.pyo(196)_check_key_ref_value()
(Pdb) print locals()
{'self': <bzrlib.btree_index.BTreeBuilder object at 0x0344F690>, 'as_st': <built-in function from_sequence>, 'references': StaticTuple(StaticTuple(),), 'key': StaticTuple('admin/a\x0cb', 'git-v1:24d3a2befc699b1388d2cf2d1a75ac8dcf510bff'), 'value': '34362894 23 0 0'}

I have this session in the debugger, feel free to ask for more data.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Ah, fair enough. I was testing with \xc rather than \x0c, the latter of which actually decodes happily in utf8.

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 882396] Re: bzr-git crashes when cloning Emacs git repo

On 30 October 2011 18:38, Jelmer Vernooij <email address hidden> wrote:
> Ah, fair enough. I was testing with \xc rather than \x0c, the latter of
> which actually decodes happily in utf8.

Well, '\xc' could mean either of two things:

1- the same as \x0c, a form feed
2- probably more likely, it combines with the following character in
the string to form a byte in 0xc0..0xcf, which is not a complete utf8
character and could fail depending what comes after

Martin Pool (mbp)
affects: bzr → bzr-git
Changed in bzr-git:
status: New → Confirmed
importance: Undecided → Medium
summary: - bzr-git crashes when cloning Emacs git repo
+ bzr-git can't handle \f in names
Jelmer Vernooij (jelmer)
Changed in bzr-git:
status: Confirmed → Triaged
status: Triaged → Fix Released
milestone: none → 0.6.4
assignee: nobody → Jelmer Vernooij (jelmer)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.