"bzr annotate" attributes changes to wrong revision

Bug #387952 reported by GuilhemBichot
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
High
John A Meinel

Bug Description

I am looking at this branch https://code.launchpad.net/~mysql/mysql-server/mysql-next , file sql/table.cc.
I have revision <email address hidden> .
I want to know who last changed those two lines which are near the file's beginning:
/* INFORMATION_SCHEMA name */
LEX_STRING INFORMATION_SCHEMA_NAME= {C_STRING_WITH_LEN("information_schema")};
I run
 bzr annotate sql/table.cc
it says that it's

revision-id: sp1r-kostja@bodhi.(none)-20070731194738-47444
parent: <email address hidden>
parent: sp1r-kostja@bodhi.(none)-20070731165243-35632

It's a merge revision. Mmmm usually merge revisions wouldn't introduce such code, but ok, maybe it was a conflict resolution. So let's dig further. First parent <email address hidden> , according to
 bzr cat -r 'revid:<email address hidden>' sql/table.cc
does not have those two lines.
But second parent has, according to:
 bzr cat -r 'revid:sp1r-kostja@bodhi.(none)-20070731165243-35632' sql/table.cc
I verify that sql/table.cc and the output of the last "bzr cat" above have no differences at the two interesting lines, so logically those lines should not be attributed to sp1r-kostja@bodhi.(none)-20070731194738-47444 (the merge) but rather to sp1r-kostja@bodhi.(none)-20070731165243-35632 (the second parent).
Let's dig more: if we go back in time when the second parent was the tip:
 bzr annotate -r 'revid:sp1r-kostja@bodhi.(none)-20070731165243-35632' sql/table.cc
attributes the two lines to

revision-id: sp1r-malff/marcsql@weblab.(none)-20070727182200-43755
parent: <email address hidden>
parent: sp1r-malff/marcsql@weblab.(none)-20070727181936-28265
committer: malff/marcsql@weblab.(none)
timestamp: Fri 2007-07-27 12:22:00 -0600
message:
  Merge <email address hidden>:/home/bk/mysql-5.1-runtime
  into weblab.(none):/home/marcsql/TREE/mysql-5.1-25422-d

Again, a merge revision. The first parent, according to
 bzr cat -r 'revid:<email address hidden>' sql/table.cc
does not have the two lines. The second parent has them, according to
 bzr cat -r 'revid:sp1r-malff/marcsql@weblab.(none)-20070727181936-28265' sql/table.cc
I verify that sql/table.cc as it was in the merge revision (sp1r-malff/marcsql@weblab.(none)-20070727182200-43755) and as it was in the second parent have no differences at all, so logically those lines should not be attributed to sp1r-malff/marcsql@weblab.(none)-20070727182200-43755 (the merge) but rather to sp1r-malff/marcsql@weblab.(none)-20070727181936-28265 (the second parent).
This second parent is:

revision-id: sp1r-malff/marcsql@weblab.(none)-20070727181936-28265
parent: sp1r-malff/marcsql@weblab.(none)-20070727063106-28256
committer: malff/marcsql@weblab.(none)
timestamp: Fri 2007-07-27 12:19:36 -0600
message:
  Code review changes

If I go back in time to when this revision was the tip:
 bzr annotate -r 'revid:sp1r-malff/marcsql@weblab.(none)-20070727181936-28265' sql/table.cc
it says the two lines are from

revision-id: sp1r-malff/marcsql@weblab.(none)-20070727063106-28256
parent: sp1r-kostja@bodhi.(none)-20070721135732-20589
committer: malff/marcsql@weblab.(none)
timestamp: Fri 2007-07-27 00:31:06 -0600
message:
  WL#3984 (Revise locking of mysql.general_log and mysql.slow_log) (I cut the long message)

And yes this last revision really seems to have introduced those two lines; that's what
 bzr diff -c 'revid:sp1r-malff/marcsql@weblab.(none)-20070727063106-28256'
says, and there are facts which give me confidence that this is true:
- this revision isn't a merge
- the commit comments are related to the two lines in question: the per-file comment of sql_show.cc says "Move INFORMATION_SCHEMA_NAME to table.cc"
- http://lists.mysql.com/commits/31673 is the mail which was generated by committing the revision above (comparison of comments, commit date and patch between mail and revision shows this), and the patch there adds those two lines; this really shows that this revision is the author.

So, the bug is that "bzr annotate" is wrong by a significant distance (several revisions) from what it should tell.
Or it may be that I missed something :-)

Tags: mysql

Related branches

Revision history for this message
GuilhemBichot (guilhem-bichot) wrote :

I have bzr.dev:
Bazaar (bzr) 1.17dev
  from bzr checkout /home/mysql_src/logiciels/bzr_versions/dev
    revision: 4439
    revid: <email address hidden>
This repository received the fix for https://bugs.launchpad.net/bugs/277537 already.

Revision history for this message
John A Meinel (jameinel) wrote :

Just as a quick comment, if I branch into my mysql branch and do annotate I get something different:
sp1r-thek@adventure.(none)-20070729110821-43408

However, this repo may not have the fix for bug #277537 so I'll check and see if that is it.

Revision history for this message
Vincent Ladeuil (vila) wrote : Re: [Bug 387952] Re: "bzr annotate" attributes changes to wrong revision

>>>>> "jam" == John A Meinel <email address hidden> writes:

    jam> Just as a quick comment, if I branch into my mysql
    jam> branch and do annotate I get something different:
    jam> sp1r-thek@adventure.(none)-20070729110821-43408

    jam> However, this repo may not have the fix for bug #277537
    jam> so I'll check and see if that is it.

That should be it as I can reproduce it with a repo where the fix
has been applied (and I Guilhem told me that his repo has been
fixed too).

 status confirmed
 importance high

Changed in bzr:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (4.6 KiB)

So I can confirm that it was just an old repo that was not fixed. Doing a new branch gives the same results.

From what I can tell, it is a 'newline' that is screwing things up. I'll try to explain the steps, and maybe I'll find an answer.

When annotating a merge revision, we use roughly this algorithm. We have 3 sets of lines "new_lines", "annotated_left_parent" and "annotated_right_parent":

1) Match "new_lines" with "left_parent" (no annotations), for lines which match, copy the annotations from "annotated_left_parent". For lines which do not, annotate them as "new". This generates "annotated_new_lines"

2) Now compare "annotated_new_lines" with "annotated_right_parent". Note that
this is an *annotated* comparison.
   a) Lines which match exactly are just copied across (as generally expected)
   b) For unmatched regions, compare the unannotated "new_lines" with the unannotated "right_parent".
   c) For lines which match
      i) If "annotated_new_lines" claims "new", then simply copy the annotation from "right_parent"
      ii) For lines which do not claim "new", resolve them by computing the DAG heads() between the two annotations.
      iii) If heads() does not resolve to a simple revision, break the tie using _break_annotation_tie.

Now stepping through the annotation of the file during revision
  sp1r-kostja@bodhi.(none)-20070731194738-47444

What I'm seeing is:

(Pdb) pp matching_left_and_right
[(0, 0, 21),
 (34, 22, 1),
 (35, 35, 10),
 (46, 46, 161),
 (250, 250, 105),
 (357, 357, 249),
 (608, 608, 783),
 (1391, 1392, 1),
 (1392, 1397, 3163),
 (4556, 4561, 3),
 (4560, 4565, 127),
 (4687, 4692, 0)]

Now the curious bit is "(34, 22, 1)" that says that "annotate_right_parent" matched 1 line in the new "annotated_lines". When you look closer it is:
(('sp1f-table.cc-19700101030959-nsxtem2adyqzwe6nz4cgrpcmts3o54v7',
  '<email address hidden>'),
 '\n')

Which says that there is a single blank line that claims to be from
  <email address hidden>

If you look at the surrounding text on both sides it is:

30 sp1r-malff/marcsql@weblab.(none)-20070727182200-43755 LEX_STRING GENERAL_LOG_NAME= {C_STRING_WITH_LEN("general_log")};
31 sp1r-malff/marcsql@weblab.(none)-20070727182200-43755
32 sp1r-malff/marcsql@weblab.(none)-20070727182200-43755 /* SLOW_LOG name */
33 sp1r-malff/marcsql@weblab.(none)-20070727182200-43755 LEX_STRING SLOW_LOG_NAME= {C_STRING_WITH_LEN("slow_log")};
** <email address hidden>
35 <email address hidden> /* Functions defined in this file */
36 <email address hidden>
37 <email address hidden> void open_table_error(TABLE_SHARE *share, int error, int db_errno,

versus

18 <email address hidden> #include "mysql_priv.h"
19 <email address hidden> #include "sql_trigger.h"
20 <email address hidden> #include <m_ctype.h>
21 <email address hidden> #include "my_md5.h"
** <email address hidden>
23 sp1r-kostja@bodhi.(none)-20070731194738-47444 ...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :

The branch I just registered seems to give correct results.

The main downside is that there is a significant slowdown in "time bzr annotate". Specifically for "bzrlib/builtins.py" it goes from 5s down to 8+ seconds. For NEWS it is about 21s down to 30s.

However, I have some ideas about how to speed up the inner loop, which I'll be working on.

But I figured I'd give a heads up that we may have a fix soon.

Revision history for this message
John A Meinel (jameinel) wrote :

Initial fix is committed, still need performance improvements

Changed in bzr:
assignee: nobody → John A Meinel (jameinel)
status: Confirmed → Fix Committed
Revision history for this message
Robert Collins (lifeless) wrote :

Perhaps we can do radically better with code moves by using a
move-capable differ? [like gc uses]

-Rob

Revision history for this message
John A Meinel (jameinel) wrote :

So we could... but there are still questions about what that implies for annotations.

I would probably be the most interested in the 'edge based' matching ideas, though you still have questions about how you put that into annotations...

Revision history for this message
GuilhemBichot (guilhem-bichot) wrote :

Hi John. I don't understand one point. In 1) you say that:
  For lines which do not, annotate them as "new". This generates "annotated_new_lines"
so to me, annotated_new_lines have "new" as annotation mark, they don't have a real revid (this is where I'm probably understanding wrongly).
Then in 2) you say that
 2) Now compare "annotated_new_lines" with "annotated_right_parent". Note that
 this is an *annotated* comparison.
   a) Lines which match exactly are just copied across (as generally expected)
but how can there be an exact match if annotation for "annotated_new_lines" is "new" and annotations for "annotated_right_parent" are real revids?

Revision history for this message
John A Meinel (jameinel) wrote :

"annotated_new_lines" is the complete list of lines, both ones that are "new" and ones that are marked as being present in the first parent. "new_lines" relates to the most recent text and sometimes they are marked with "new" as the revision which introduced them.

Also "new" is the current revision_id.

Anyway, both the associated branches change how this is done (in slightly different ways). By doing a comparison on the original lines, and *then* resolving annotation agreements/disagreements.

On the good side, annotation is actually even faster in the second branch, while preserving the results we want.

Revision history for this message
John A Meinel (jameinel) wrote :

landed as bzr.dev 4522

Changed in bzr:
milestone: none → 1.17
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.