So I can confirm that it was just an old repo that was not fixed. Doing a new branch gives the same results.
From what I can tell, it is a 'newline' that is screwing things up. I'll try to explain the steps, and maybe I'll find an answer.
When annotating a merge revision, we use roughly this algorithm. We have 3 sets of lines "new_lines", "annotated_left_parent" and "annotated_right_parent":
1) Match "new_lines" with "left_parent" (no annotations), for lines which match, copy the annotations from "annotated_left_parent". For lines which do not, annotate them as "new". This generates "annotated_new_lines"
2) Now compare "annotated_new_lines" with "annotated_right_parent". Note that
this is an *annotated* comparison.
a) Lines which match exactly are just copied across (as generally expected)
b) For unmatched regions, compare the unannotated "new_lines" with the unannotated "right_parent".
c) For lines which match
i) If "annotated_new_lines" claims "new", then simply copy the annotation from "right_parent"
ii) For lines which do not claim "new", resolve them by computing the DAG heads() between the two annotations.
iii) If heads() does not resolve to a simple revision, break the tie using _break_annotation_tie.
Now stepping through the annotation of the file during revision
sp1r-kostja@bodhi.(none)-20070731194738-47444
Now the curious bit is "(34, 22, 1)" that says that "annotate_right_parent" matched 1 line in the new "annotated_lines". When you look closer it is:
(('sp1f-table.cc-19700101030959-nsxtem2adyqzwe6nz4cgrpcmts3o54v7',
'<email address hidden>'),
'\n')
Which says that there is a single blank line that claims to be from
<email address hidden>
If you look at the surrounding text on both sides it is:
30 sp1r-malff/marcsql@weblab.(none)-20070727182200-43755 LEX_STRING GENERAL_LOG_NAME= {C_STRING_WITH_LEN("general_log")};
31 sp1r-malff/marcsql@weblab.(none)-20070727182200-43755
32 sp1r-malff/marcsql@weblab.(none)-20070727182200-43755 /* SLOW_LOG name */
33 sp1r-malff/marcsql@weblab.(none)-20070727182200-43755 LEX_STRING SLOW_LOG_NAME= {C_STRING_WITH_LEN("slow_log")};
** <email address hidden>
35 <email address hidden> /* Functions defined in this file */
36 <email address hidden>
37 <email address hidden> void open_table_error(TABLE_SHARE *share, int error, int db_errno,
From what I can tell, this is happening because the insert of the LEX_STRING INFORMATION... text is adding a newline at the end of the section. (Or just as easily at the beginning of the section.)
What it means, is that we think the line following "#include "mymd5.h" was moved to after the LEX_STRING section. (Note that we correctly believe that "my_md5.h" was an insertion, etc.
Anyway, because a section of code "swapped places" with another section, only one of them can be considered a match.
Unfortunately, that match ends up being the "\n" because it gets first dibs being an exact match against the left-hand parent.
It is rather silly to be doing this, but at least I have a feeling as to *why* it is happening.
There will always be cases of code moving and annotation having difficulty at that point. This is sort of the minimal way it can happen, and it seems that we should be able to recover.
The reason PatienceDiff's handling of non-unique lines isn't working here, is because we are matching against the *annotated* lines. Which means that (<revid>, '\n') is artificially more unique than '\n', so we think that the line should be a match.
The best answer I can come up with is to work around step (2)'s failure, and avoid matching the annotated lines. I'll see if I can put something together.
So I can confirm that it was just an old repo that was not fixed. Doing a new branch gives the same results.
From what I can tell, it is a 'newline' that is screwing things up. I'll try to explain the steps, and maybe I'll find an answer.
When annotating a merge revision, we use roughly this algorithm. We have 3 sets of lines "new_lines", "annotated_ left_parent" and "annotated_ right_parent" :
1) Match "new_lines" with "left_parent" (no annotations), for lines which match, copy the annotations from "annotated_ left_parent" . For lines which do not, annotate them as "new". This generates "annotated_ new_lines"
2) Now compare "annotated_ new_lines" with "annotated_ right_parent" . Note that new_lines" claims "new", then simply copy the annotation from "right_parent" annotation_ tie.
this is an *annotated* comparison.
a) Lines which match exactly are just copied across (as generally expected)
b) For unmatched regions, compare the unannotated "new_lines" with the unannotated "right_parent".
c) For lines which match
i) If "annotated_
ii) For lines which do not claim "new", resolve them by computing the DAG heads() between the two annotations.
iii) If heads() does not resolve to a simple revision, break the tie using _break_
Now stepping through the annotation of the file during revision kostja@ bodhi.( none)-200707311 94738-47444
sp1r-
What I'm seeing is:
(Pdb) pp matching_ left_and_ right
[(0, 0, 21),
(34, 22, 1),
(35, 35, 10),
(46, 46, 161),
(250, 250, 105),
(357, 357, 249),
(608, 608, 783),
(1391, 1392, 1),
(1392, 1397, 3163),
(4556, 4561, 3),
(4560, 4565, 127),
(4687, 4692, 0)]
Now the curious bit is "(34, 22, 1)" that says that "annotate_ right_parent" matched 1 line in the new "annotated_lines". When you look closer it is: table.cc- 19700101030959- nsxtem2adyqzwe6 nz4cgrpcmts3o54 v7',
(('sp1f-
'<email address hidden>'),
'\n')
Which says that there is a single blank line that claims to be from
<email address hidden>
If you look at the surrounding text on both sides it is:
30 sp1r-malff/ marcsql@ weblab. (none)- 20070727182200- 43755 LEX_STRING GENERAL_LOG_NAME= {C_STRING_ WITH_LEN( "general_ log")}; marcsql@ weblab. (none)- 20070727182200- 43755 marcsql@ weblab. (none)- 20070727182200- 43755 /* SLOW_LOG name */ marcsql@ weblab. (none)- 20070727182200- 43755 LEX_STRING SLOW_LOG_NAME= {C_STRING_ WITH_LEN( "slow_log" )}; error(TABLE_ SHARE *share, int error, int db_errno,
31 sp1r-malff/
32 sp1r-malff/
33 sp1r-malff/
** <email address hidden>
35 <email address hidden> /* Functions defined in this file */
36 <email address hidden>
37 <email address hidden> void open_table_
versus
18 <email address hidden> #include "mysql_priv.h" bodhi.( none)-200707311 94738-47444 /* INFORMATION_SCHEMA name */ bodhi.( none)-200707311 94738-47444 LEX_STRING INFORMATION_ SCHEMA_ NAME= {C_STRING_ WITH_LEN( "information_ schema" )}; bodhi.( none)-200707311 94738-47444
19 <email address hidden> #include "sql_trigger.h"
20 <email address hidden> #include <m_ctype.h>
21 <email address hidden> #include "my_md5.h"
** <email address hidden>
23 sp1r-kostja@
24 sp1r-kostja@
25 sp1r-kostja@
From what I can tell, this is happening because the insert of the LEX_STRING INFORMATION... text is adding a newline at the end of the section. (Or just as easily at the beginning of the section.)
What it means, is that we think the line following "#include "mymd5.h" was moved to after the LEX_STRING section. (Note that we correctly believe that "my_md5.h" was an insertion, etc.
Anyway, because a section of code "swapped places" with another section, only one of them can be considered a match.
Unfortunately, that match ends up being the "\n" because it gets first dibs being an exact match against the left-hand parent.
It is rather silly to be doing this, but at least I have a feeling as to *why* it is happening.
There will always be cases of code moving and annotation having difficulty at that point. This is sort of the minimal way it can happen, and it seems that we should be able to recover.
The reason PatienceDiff's handling of non-unique lines isn't working here, is because we are matching against the *annotated* lines. Which means that (<revid>, '\n') is artificially more unique than '\n', so we think that the line should be a match.
The best answer I can come up with is to work around step (2)'s failure, and avoid matching the annotated lines. I'll see if I can put something together.