"bzr visualise/gannotate" asserts on certain per-file commit comments

Bug #295161 reported by GuilhemBichot on 2008-11-07
6
Affects Status Importance Assigned to Milestone
Bazaar
High
John A Meinel
Bazaar GTK+ Frontends
Undecided
John A Meinel

Bug Description

bzr branch -r 'revid:<email address hidden>' https://code.launchpad.net/~mysql/mysql-server/mysql-5.1-bugteam

bzr pull bzr-gcommit-err.cset # see attached file

bzr viz --limit=1
gives me:
Traceback (most recent call last):
  File "/home/guilhem/.bazaar/plugins/gtk/revisionview.py", line 371, in do_set_property
    self._set_revision(value)
  File "/home/guilhem/.bazaar/plugins/gtk/revisionview.py", line 428, in _set_revision
    file_info = bdecode(file_info.encode('UTF-8'))
  File "/home/mysql_src/logiciels/bzr_versions/dev/bzrlib/util/bencode.py", line 85, in bdecode
    r, l = decode_func[x[0]](x, 0)
  File "/home/mysql_src/logiciels/bzr_versions/dev/bzrlib/util/bencode.py", line 53, in decode_list
    v, f = decode_func[x[f]](x, f)
  File "/home/mysql_src/logiciels/bzr_versions/dev/bzrlib/util/bencode.py", line 61, in decode_dict
    k, f = decode_string(x, f)
  File "/home/mysql_src/logiciels/bzr_versions/dev/bzrlib/util/bencode.py", line 44, in decode_string
    n = long(x[f:colon])
ValueError: invalid literal for long() with base 10: ''
and the Per-file tab is empty though there was a per-file commit message at gcommit time.
The context is:
this revision was created by a colleague when using OS X (with X11), doing copy-paste from Thunderbird into gcommit, and pressing the "commit" button. It might be an issue of \r, \r\n...
It is for sure a bug in bzr or bzr-gtk and not a user error: as "bzr gcommit" managed to commit, "bzr viz" should be able to display the committed revision.
gannotate has same problem (see one post below).

GuilhemBichot (guilhem-bichot) wrote :
GuilhemBichot (guilhem-bichot) wrote :

latest bzr.dev, latest bzr-gtk

GuilhemBichot (guilhem-bichot) wrote :

"bzr gannotate sql/ha_partition.cc" also affected: do it and click on line 48 of the file, it prints to stdout/stderr:
Traceback (most recent call last):
  File "/home/guilhem/.bazaar/plugins/gtk/revisionview.py", line 371, in do_set_property
    self._set_revision(value)
  File "/home/guilhem/.bazaar/plugins/gtk/revisionview.py", line 428, in _set_revision
    file_info = bdecode(file_info.encode('UTF-8'))
  File "/home/mysql_src/logiciels/bzr_versions/dev/bzrlib/util/bencode.py", line 85, in bdecode
    r, l = decode_func[x[0]](x, 0)
  File "/home/mysql_src/logiciels/bzr_versions/dev/bzrlib/util/bencode.py", line 53, in decode_list
    v, f = decode_func[x[f]](x, f)
  File "/home/mysql_src/logiciels/bzr_versions/dev/bzrlib/util/bencode.py", line 61, in decode_dict
    k, f = decode_string(x, f)
  File "/home/mysql_src/logiciels/bzr_versions/dev/bzrlib/util/bencode.py", line 44, in decode_string
    n = long(x[f:colon])
ValueError: invalid literal for long() with base 10: ''

description: updated
Martin Pool (mbp) wrote :

I think we have seen similar problems before and had thought they were caused by a one-off error such as a memory bit error rather than a bug in bzr. If they have created a new revision with this problem and especially if this can be done reproducibly that would indicate a problem writing the data.

If it's an unreproducible glitch then it may be reasonable to have bzrlib be a bit defensive if it's unable to read them, and give a warning rather than an exception.

Martin Pool (mbp) wrote :

The first thing is to determine whether you can reproducible record incorrect bencoded data using gcommit (or otherwise).

Martin Pool (mbp) on 2008-11-11
Changed in bzr:
assignee: nobody → mbp
importance: Undecided → High
status: New → Confirmed
Download full text (3.6 KiB)

Hi, here is a repeatable scenario (at least on my setup):

Preparation:
Use a Mac OS X machine with bzr, bzr-gtk and bzrtools installed by macport
So that bzr gcommit is running under X11 on the Mac

send a mail consisting of these rows:
Aper-file

Bper-file

Cglobal

Dglobal

and open that in Thunderbird.

Create a change and run 'bzr gcommit'
(bzr init-repo test, cd test, mkdir testdir, cd testdir, bzr init, echo "simple file" > testfile, bzr add testfile, bzr gcommit)
Then mark and copy (cmd-c) the three rows in the mail open in Thunderbird:
Cglobal

Dglobal

and click in the Global Commit Message text area and paste with (ctrl-v).
Move the cursor to the empty row, and delete that empty row (by deleting the character to the right of the cursor, i.e. NOT using backspace)

Do the same with the "Aper-file.." rows to the Commit message for testfile.

press commit.

Explanation:
When copying from thunderbird, the string will be "Cglobal\r\rDglobal", and by deleting the empty row with 'forward delete' and add a new empty line in bzr gcommit/X11 it will become "Cglobal\r\nDglobal".

All seems well in the repository, when it displays as unicode, but when trying bzr visualize it will print out:
testdir$ bzr visualize
Traceback (most recent call last):
  File "/Users/mattiasj/.bazaar/plugins/gtk/revisionview.py", line 371, in do_set_property
    self._set_revision(value)
  File "/Users/mattiasj/.bazaar/plugins/gtk/revisionview.py", line 428, in _set_revision
    file_info = bdecode(file_info.encode('UTF-8'))
  File "/opt/local/lib/python2.5/site-packages/bzrlib/util/bencode.py", line 85, in bdecode
    r, l = decode_func[x[0]](x, 0)
  File "/opt/local/lib/python2.5/site-packages/bzrlib/util/bencode.py", line 53, in decode_list
    v, f = decode_func[x[f]](x, f)
  File "/opt/local/lib/python2.5/site-packages/bzrlib/util/bencode.py", line 61, in decode_dict
    k, f = decode_string(x, f)
  File "/opt/local/lib/python2.5/site-packages/bzrlib/util/bencode.py", line 44, in decode_string
    n = long(x[f:colon])
ValueError: invalid literal for long() with base 10: ''

when debuggin bzrlib/repository.py:
> /opt/local/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/bzrlib/repository.py(1148)_get_revisions()
-> revs[record.key[0]] = rev
(Pdb) p text
'<revision committer="Mattias Jonsson &lt;<hidden>@sun.com&gt;" format="5" inventory_sha1="3f7f86a25f3b78ec301ecb1a44c100bd617a1eeb" <email address hidden>" timestamp="1226403127.806" timezone="3600">\n<message>Cglobal\r\nDglobal</message>\n<properties><property name="branch-nick">testdir</property>\n<property name="file-info">ld7:file_id42:testfile-20081111112927-ubq1w7b7xonhsrzj-17:message20:Aper-file\r\nBper-file4:path8:testfileee</property>\n</properties>\n</revision>\n'
(Pdb) p rev
<Revision id <hidden>@sun.com-20081111113207-76v7wbhcppftz05x>

Here is the string as entered "Cglobal\r\nDglobal" and "Aper-file\r\nBper-file" with length 20.

(Pdb) p rev.properties
{'file-info': 'ld7:file_id42:testfile-20081111112927-ubq1w7b7xonhsrzj-17:message20:Aper-file\nBper-file4:path8:testfileee', 'branch-nick': 'testdir'}

Here has the file mess...

Read more...

John A Meinel (jameinel) wrote :

So there is a subtle issue in how we handle revisions properties that we need a better answer to.

Specifically, we write the per-file message as a blob to to a cElementTree.Element object (as the '.text' member).

This seems to serialize down to a text string while preserving everything. However the parsing functions for cElementTree end up normalizing end-of-line characters as though you had used a "text" mode file rather than a "binary" mode file.

Here is a simple example using python >2.5

from xml.etree.cElementTree import Element, tostring, fromstring
e = Element('element-name')
e.text = 'text\rwith\nvarious\r\nline endings\n'
e.tail = '\n'
s = tostring(e)
print repr(s) # '<element-name>text\rwith\nvarious\r\nline endings\n</element-name>\n'
f = fromstring(s)
f.text == e.text # False
print repr(f.text) # 'text\nwith\nvarious\nline endings\n'

So it would seem that the data is written out with all '\r' characters onto disk, but the code that reads that back into memory is combining '\r\n' into '\n', and transforming '\r' into '\n'.

This isn't detected by the 'sha1' check, because that check is happening on the raw bytes which still have all the necessary information.

Changed in bzr:
status: Confirmed → Triaged
John A Meinel (jameinel) on 2008-11-13
Changed in bzr:
assignee: mbp → jameinel
Changed in bzr-gtk:
assignee: nobody → jameinel
status: New → In Progress
John A Meinel (jameinel) wrote :

The associated branch fixes bzr-gtk to normalize line endings at commit time, and to handle when the per-file commit message is "broken".

Changed in bzr-gtk:
status: In Progress → Fix Committed
John A Meinel (jameinel) wrote :

I have a fix committed which forces commit to fail if we can't properly round-trip the revision property or commit message data.

Changed in bzr:
status: Triaged → Fix Committed

The fix works, but then I think that the string in text area for the commit messages should only allow one type of line breaks (either '\n', '\r' or '\r\n'). As it is now, it shows 'a\n\nb', 'a\r\rb' AND 'a\r\nb' (if edited) as:
a

b
But after the commit 'a\r\nb' becomes 'a\nb' (so it looks like the formatting has changed, when looking in 'bzr vis'), which is a little bit confusing:
a
b

So I would suggest to also add the conversion to the commit message text areas, since it should be in the correct format when editing takes place.

Jelmer Vernooij (jelmer) on 2009-01-28
Changed in bzr:
status: Fix Committed → Fix Released
Jelmer Vernooij (jelmer) on 2009-01-30
Changed in bzr-gtk:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments