bzr can be caused to error with filenames containing newlines

Bug #3918 reported by Matthieu Moy on 2005-11-05
32
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Bazaar
High
Robert Collins
Launchpad itself
Low
Unassigned

Bug Description

newlines in filenames are uncommon, but possible at least on unix.

"bzr status" shows the filename replacing the newline with a space. That's strange. I'd prefer some kind of escaping ("\n", and "\\" for a real '\'). Furthermore, the filename still appears in the "unknown" section.

bzr diff does not work at all

bzr commit ignores the file

Below is a testcase.

$ mkdir test
$ cd test
$ bzr init
$ touch 'file\
? with newline'

CORRECT>wish newline' (y|n|e|a)? no
$ bzr add file*
added "file
with newline"
$ bzr status
added:
  file with newline
unknown:
  file
with newline
$ bzr diff
=== added file 'file with newline'
bzr: ERROR: [Errno 2] No such file or directory: '/tmp/test/file with newline'
  command: '/home/moy/bin/bzr' 'diff'
      pwd: u'/tmp/test'
    error: exceptions.OSError
  at /net/ecrins/local/moy/usr/src/bzr.dev/bzrlib/workingtree.py line 234, in is_executable()
  see ~/.bzr.log for debug information
$ bzr commit -m "added a file with newline"
bzr: ERROR: no changes to commit

Related branches

lp:~lifeless/bzr/bug-3918
Andrew Bennetts: Approve on 2009-09-28
Martin Pool (mbp) wrote :

I know they're possible, but do they actually occur in any trees people might want to version?

I agree they should handled gracefully somehow. Should they just be forbidden?

Robert Collins (lifeless) wrote :

I think we should support them as much as we support files with mixed-case-only-differences: you cant checkout a tree with live filenames that have only differences in case on MacOSX or Windows, but if the files are not present in the current tip, then it all just works.

Matthieu Moy (matthieu-moy) wrote :

I think someone used to unix command line would be crazy to put newlines in filenames. However, it might make sense for someone using a graphical file manager (I'm not part of this category, so I can't tell).

James Blackwell (jblack) wrote :

Confirmed on 2006-03-17 with the following:

~/a$ touch 'this
that'
~/a$ bzr add 'this
that'
added "this
that"
jblack@pluto:~/a$ bzr st
unknown:
  this
that

Changed in bzr:
status: Unconfirmed → Confirmed
Martin Pool (mbp) wrote :

Still confirmed, but wishlist because it's quite an edge case.

I don't see any very strong reason to disallow such filenames - it should be treated as just another unicode character for storage, and represented in some sensible way for display.

So, fix this in stauts, diff, etc. And add tests for all cases.

(It's also possible on unix to create files whose name is not decodable as unicode, e.g. by inserting control characters, and those really can be disallowed.)

David Allouche (ddaa) wrote :

Can you increase the priority of this bug? It is currently blocking one launchpad code import: https://launchpad.net/coccinella/trunk

  File "/home/importd/dists/launchpad/lib/bzrlib/commit.py", line 317, in commit
    self.rev_id = self.builder.commit(self.message)
  File "/home/importd/dists/launchpad/sourcecode/bzr/bzrlib/repository.py", line 1973, in commit
    self.new_inventory, self._config)
  File "<string>", line 4, in add_revision_write_locked
  File "/home/importd/dists/launchpad/sourcecode/bzr/bzrlib/repository.py", line 134, in add_revision
    plaintext = Testament(rev, inv).as_short_text()
  File "/home/importd/dists/launchpad/lib/bzrlib/testament.py", line 178, in as_short_text
    return (self.short_header +
  File "/home/importd/dists/launchpad/lib/bzrlib/testament.py", line 198, in as_sha1
    map(s.update, self.as_text_lines())
  File "/home/importd/dists/launchpad/lib/bzrlib/testament.py", line 135, in as_text_lines
    a(self._entry_to_line(path, ie))
  File "/home/importd/dists/launchpad/lib/bzrlib/testament.py", line 168, in _entry_to_line
    l = u' %s %s %s%s%s\n' % (ie.kind, self._escape_path(path),
  File "/home/importd/dists/launchpad/lib/bzrlib/testament.py", line 149, in _escape_path
    assert not contains_linebreaks(path)
AssertionError

Michael Hudson-Doyle (mwhudson) wrote :

This bug -- probably with dirstate? -- is now much worse, because a commit that adds a file containing a new line will fail.

mwh@mithril-inside:foo$ ls
a?b
mwh@mithril-inside:foo$ bzr st
added:
  a
b
mwh@mithril-inside:foo$ bzr ci -m blah
added a
b
bzr: ERROR: exceptions.AssertionError:

Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 718, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 679, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 375, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 2226, in run
    reporter=reporter, revprops=properties)
  File "/usr/lib/python2.5/site-packages/bzrlib/decorators.py", line 165, in write_locked
    return unbound(self, *args, **kwargs)
  File "/usr/lib/python2.5/site-packages/bzrlib/workingtree_4.py", line 245, in commit
    result = WorkingTree3.commit(self, message, revprops, *args, **kwargs)
  File "/usr/lib/python2.5/site-packages/bzrlib/decorators.py", line 165, in write_locked
    return unbound(self, *args, **kwargs)
  File "/usr/lib/python2.5/site-packages/bzrlib/mutabletree.py", line 195, in commit
    revprops=revprops, *args, **kwargs)
  File "/usr/lib/python2.5/site-packages/bzrlib/commit.py", line 311, in commit
    self.rev_id = self.builder.commit(self.message)
  File "/usr/lib/python2.5/site-packages/bzrlib/repository.py", line 1973, in commit
    self.new_inventory, self._config)
  File "/usr/lib/python2.5/site-packages/bzrlib/decorators.py", line 165, in write_locked
    return unbound(self, *args, **kwargs)
  File "/usr/lib/python2.5/site-packages/bzrlib/repository.py", line 134, in add_revision
    plaintext = Testament(rev, inv).as_short_text()
  File "/usr/lib/python2.5/site-packages/bzrlib/testament.py", line 181, in as_short_text
    % (self.revision_id, self.as_sha1()))
  File "/usr/lib/python2.5/site-packages/bzrlib/testament.py", line 198, in as_sha1
    map(s.update, self.as_text_lines())
  File "/usr/lib/python2.5/site-packages/bzrlib/testament.py", line 135, in as_text_lines
    a(self._entry_to_line(path, ie))
  File "/usr/lib/python2.5/site-packages/bzrlib/testament.py", line 168, in _entry_to_line
    l = u' %s %s %s%s%s\n' % (ie.kind, self._escape_path(path),
  File "/usr/lib/python2.5/site-packages/bzrlib/testament.py", line 149, in _escape_path
    assert not contains_linebreaks(path)
AssertionError

bzr 0.17.0 on python 2.5.1.final.0 (linux2)
arguments: ['/usr/bin/bzr', 'ci', '-m', 'blah']

** please send this report to <email address hidden>

David Allouche (ddaa) wrote :

Reported on launchpad-bazaar (and confirmed) as it causes at least one code import to fail.

Changed in launchpad-bazaar:
status: New → Confirmed
Andrew Bennetts (spiv) wrote :
Download full text (3.5 KiB)

Michael Hudson points out that commit now fails if a file with a newline in the name is added, but that's when running with asserts.

If you are using python -O (or a bzr installed to use .pyo files), then instead of the assert, you get:
KnitCorrupt: Knit None corrupt: incorrect number of lines 4 != 3 for version {<email address hidden>}

I used this script:

---
set -e

BZR=bzr

mkdir test
cd test/
python -O $BZR init
touch foo
python -O $BZR add foo
python -O $BZR ci -m "Initial"
cd ..
python -O $BZR co test co-test
cd co-test/
echo "newline" > "file
name"
python -O $BZR add "file
name"
python -O $BZR ci -m "Commit"
---

And got this traceback with bzr.dev:

Traceback (most recent call last):
  File "/home/andrew/code/bzr/bzrlib/commands.py", line 836, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/home/andrew/code/bzr/bzrlib/commands.py", line 791, in run_bzr
    ret = run(*run_argv)
  File "/home/andrew/code/bzr/bzrlib/commands.py", line 492, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/home/andrew/code/bzr/bzrlib/builtins.py", line 2329, in run
    author=author)
  File "/home/andrew/code/bzr/bzrlib/decorators.py", line 165, in write_locked
    return unbound(self, *args, **kwargs)
  File "/home/andrew/code/bzr/bzrlib/workingtree_4.py", line 246, in commit
    result = WorkingTree3.commit(self, message, revprops, *args, **kwargs)
  File "/home/andrew/code/bzr/bzrlib/decorators.py", line 165, in write_locked
    return unbound(self, *args, **kwargs)
  File "/home/andrew/code/bzr/bzrlib/mutabletree.py", line 187, in commit
    revprops=revprops, *args, **kwargs)
  File "/home/andrew/code/bzr/bzrlib/commit.py", line 389, in commit
    revision_id=self.rev_id)
  File "/home/andrew/code/bzr/bzrlib/repository.py", line 949, in fetch
    return inter.fetch(revision_id=revision_id, pb=pb, find_ghosts=find_ghosts)
  File "/home/andrew/code/bzr/bzrlib/decorators.py", line 165, in write_locked
    return unbound(self, *args, **kwargs)
  File "/home/andrew/code/bzr/bzrlib/repository.py", line 2737, in fetch
    revision_ids).pack()
  File "/home/andrew/code/bzr/bzrlib/repofmt/pack_repo.py", line 589, in pack
    return self._create_pack_from_packs()
  File "/home/andrew/code/bzr/bzrlib/repofmt/pack_repo.py", line 719, in _create_pack_from_packs
    self._copy_inventory_texts()
  File "/home/andrew/code/bzr/bzrlib/repofmt/pack_repo.py", line 649, in _copy_inventory_texts
    self._process_inventory_lines(inv_lines)
  File "/home/andrew/code/bzr/bzrlib/repofmt/pack_repo.py", line 904, in _process_inventory_lines
    inv_lines, self.revision_ids)
  File "/home/andrew/code/bzr/bzrlib/repository.py", line 1276, in _find_file_ids_from_xml_inventory_lines
    line_iterator).iterkeys():
  File "/home/andrew/code/bzr/bzrlib/repository.py", line 1216, in _find_text_key_references_from_xml_inventory_lines
    for line, version_id in line_iterator:
  File "/home/andrew/code/bzr/bzrlib/repofmt/pack_repo.py", line 805, in _copy_nodes_graph
    write_index, output_lines, pb, readv_group_iter, total_items):
  File "/home/andrew/...

Read more...

Changed in bzr:
importance: Wishlist → High
Mary Gardiner (puzzlement) wrote :

Regarding graceful failure, it would be appreciated. The reason Andrew was messing with this at all was that I accidentally created and added a file with a newline in the name. To be told that my repository was now corrupt (which is how I interpreted KnitCorrupt) was quite alarming.

What is the state of this bug?

There doesn't seem to be a way to fix a corrupt repository as an end user. I corrupted my repository, too. Fortunately, branching from a revision before the disatrous commit works and commiting everything correctly again (without new lines) solves the problem. I versioned my bachelor thesis with bzr and was also shocked to see this. Please fix this problem as soon as possible, even just by telling that there are newlines in filenames, which don't work at the moment.

What I am wondering now: Is the rest correctly escaped? How can you be sure that adding files with malformed names can not "destroy" my repository. Apparently they can now *!

* destroy meant from a users perspective.

Andrew Bennetts (spiv) wrote :

Hi Martin,

The only thing corrupt is that particular revision. So you can effectively fix a damaged repo by doing "bzr uncommit", changing the offending filenames, and recommitting. This leaves the broken revision in the repository, but it's unreferenced so totally harmless.

An easy improvement would be to fix bzr so that it always prevents a commit if there's a newline in a filename, even if running with "python -O"/.pyo files. This would at least tell the user about this limitation at a more appropriate time, and avoid ever writing invalid data into a repository.

Martin Pool (mbp) wrote :

If Andrew's comment means that things are only guarded by an assert then that should be easy to fix, and a high priority.

codeslinger (codeslinger) wrote :

people working on this bug should also consider Bug #135320 these bugs have much in common and are probably related.

special characters ought to be allowed in file names, however misguided they may be they do exist in common user scenarios.

consider for instance a Japanese freeware icon editor, which although published in English, it has a file with Japanese characters in the name... but this is on a windows system which sees those as being single byte special characters, and being accessed on Linux via FUSE. Suddenly you have a mess. It is not okay to arbitrarily disregard characters however strange they may appear to be. Bottom line is that if the file system allows it to exist then bzr ought to support it.

While this bug affects a few imports launchpad-bazaar run, it's really not very many: low importance.

Changed in launchpad-bazaar:
importance: Undecided → Low
Doug Lee (dgl-dlee) wrote :

Additional case: Space at the end of a filename works in Unix but causes problems in Windows. I'm using Bazaar 1.5 to version a set of files from someone else, some of which come out of Cab files. I keep my main repo and sandbox on Unix so filename case changes can track without errors, but I actually check out into Windows. A file called "TimeTrack.xml " (note the trailing space) is created fine in Unix, but in Windows, the space is quietly dropped by the OS, with the following consequences:

1. `bzr diff' in Windows says "removed file 'TimeTrack.xml '" but does not then list the file as diff content as it usually would. (If I actually delete TrackTime.xml in Windows, the "removed file" notice is followed by the expected file contents.)

2. Trying `bzr commit' yields a complaint that there's nothing to commit.

3. `bzr merge' refuses to run because there's an uncommitted change. The upshot of this and the above 2 items is that merges seem impossible.

Mateusz Korniak (matkor) wrote :

Pls at least block commits of files containing newline char so we do not end up with:
*** Bazaar has encountered an internal error.
    Please report a bug at https://bugs.launchpad.net/bzr/+filebug
    including this traceback, and a description of what you
    were doing when the error occurred.

when using bzr from pyo/pyc .

Regards,

Vincent Ladeuil (vila) wrote :

So the current (as-in soon-to-landed-in-bzr.dev as of this writing) status of this bug
is that a warning will be issued and the file will not be added.

Yet, the long term plan should be to *accept* these paths ?

I'm mostly thinking about conversions from foreign VCS here.
We already encouter the problem in such contexts.

So may be we need a way to address the issue by accepting the
path and *renaming* it on the fly in some canonical way that
allows tracking further changes on it ?

Andrew Bennetts (spiv) wrote :

I agree, it would be good to be able to accept these paths. I was a bit surprised to realise that \n (at least) in a filename still does not work in a 2a format repository. Obviously 2a will be the default format for quite some time now, but it would be interesting to see how hard it is to allow all bytes in filenames in the next development format. My guess is that it's not actually very hard now that we no longer use XML for inventories.

On Tue, 2009-09-29 at 00:13 +0000, Andrew Bennetts wrote:
> I agree, it would be good to be able to accept these paths. I was a bit
> surprised to realise that \n (at least) in a filename still does not
> work in a 2a format repository. Obviously 2a will be the default format
> for quite some time now, but it would be interesting to see how hard it
> is to allow all bytes in filenames in the next development format. My
> guess is that it's not actually very hard now that we no longer use XML
> for inventories.

\n will break CHKMap unless its escaped. And dirstate.

Supporting these paths is no easier or harder than it was before:
 - we need a encode/decode pair around the things that our low level
code wants to use as control characters.

One way would be a content filter; another would be to specify it as
always being present and bump various formats to match.

Also we need to make sure we don't corrupt old formats that think they
can pull from 2a etc.

-Rob

John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> I agree, it would be good to be able to accept these paths. I was a bit
> surprised to realise that \n (at least) in a filename still does not
> work in a 2a format repository. Obviously 2a will be the default format
> for quite some time now, but it would be interesting to see how hard it
> is to allow all bytes in filenames in the next development format. My
> guess is that it's not actually very hard now that we no longer use XML
> for inventories.
>

CHK pages are still split on '\n', IIRC. So in the intermediate term it
is quite difficult.

I think an individual row is split on '\0' and some of the file-ids, etc
are split on some random chars like '\r'.

If we changed the byte storage to not be delimited, but instead be
length-prefixed, then all of this would go away. Or require strictly
'\0' delimited, though that gets a bit hairy with nesting and arbitrary
width stuff.

If I think hard about it, we probably could unambiguously parse a CHK
page that had '\n' in the filenames. But it would require rewriting the
parser to not start with:

 lines = bytes.split('\n')

:)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkrCHFYACgkQJdeBCYSNAAPWmgCgr4YlPx52dDxZtcbwcv7PzDSp
aUIAoI6DHk63m7/3pbJG9Mwu90PNJkyN
=symT
-----END PGP SIGNATURE-----

John A Meinel (jameinel) on 2009-10-14
Changed in bzr:
milestone: none → 2.0.1
status: Confirmed → Fix Released
assignee: nobody → Robert Collins (lifeless)
Tim Penhey (thumper) wrote :

Since we have 2.1.x deployed.

Changed in launchpad-code:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Related questions