Comment 4 for bug 387952

Revision history for this message
John A Meinel (jameinel) wrote :

So I can confirm that it was just an old repo that was not fixed. Doing a new branch gives the same results.

From what I can tell, it is a 'newline' that is screwing things up. I'll try to explain the steps, and maybe I'll find an answer.

When annotating a merge revision, we use roughly this algorithm. We have 3 sets of lines "new_lines", "annotated_left_parent" and "annotated_right_parent":

1) Match "new_lines" with "left_parent" (no annotations), for lines which match, copy the annotations from "annotated_left_parent". For lines which do not, annotate them as "new". This generates "annotated_new_lines"

2) Now compare "annotated_new_lines" with "annotated_right_parent". Note that
this is an *annotated* comparison.
   a) Lines which match exactly are just copied across (as generally expected)
   b) For unmatched regions, compare the unannotated "new_lines" with the unannotated "right_parent".
   c) For lines which match
      i) If "annotated_new_lines" claims "new", then simply copy the annotation from "right_parent"
      ii) For lines which do not claim "new", resolve them by computing the DAG heads() between the two annotations.
      iii) If heads() does not resolve to a simple revision, break the tie using _break_annotation_tie.

Now stepping through the annotation of the file during revision
  sp1r-kostja@bodhi.(none)-20070731194738-47444

What I'm seeing is:

(Pdb) pp matching_left_and_right
[(0, 0, 21),
 (34, 22, 1),
 (35, 35, 10),
 (46, 46, 161),
 (250, 250, 105),
 (357, 357, 249),
 (608, 608, 783),
 (1391, 1392, 1),
 (1392, 1397, 3163),
 (4556, 4561, 3),
 (4560, 4565, 127),
 (4687, 4692, 0)]

Now the curious bit is "(34, 22, 1)" that says that "annotate_right_parent" matched 1 line in the new "annotated_lines". When you look closer it is:
(('sp1f-table.cc-19700101030959-nsxtem2adyqzwe6nz4cgrpcmts3o54v7',
  '<email address hidden>'),
 '\n')

Which says that there is a single blank line that claims to be from
  <email address hidden>

If you look at the surrounding text on both sides it is:

30 sp1r-malff/marcsql@weblab.(none)-20070727182200-43755 LEX_STRING GENERAL_LOG_NAME= {C_STRING_WITH_LEN("general_log")};
31 sp1r-malff/marcsql@weblab.(none)-20070727182200-43755
32 sp1r-malff/marcsql@weblab.(none)-20070727182200-43755 /* SLOW_LOG name */
33 sp1r-malff/marcsql@weblab.(none)-20070727182200-43755 LEX_STRING SLOW_LOG_NAME= {C_STRING_WITH_LEN("slow_log")};
** <email address hidden>
35 <email address hidden> /* Functions defined in this file */
36 <email address hidden>
37 <email address hidden> void open_table_error(TABLE_SHARE *share, int error, int db_errno,

versus

18 <email address hidden> #include "mysql_priv.h"
19 <email address hidden> #include "sql_trigger.h"
20 <email address hidden> #include <m_ctype.h>
21 <email address hidden> #include "my_md5.h"
** <email address hidden>
23 sp1r-kostja@bodhi.(none)-20070731194738-47444 /* INFORMATION_SCHEMA name */
24 sp1r-kostja@bodhi.(none)-20070731194738-47444 LEX_STRING INFORMATION_SCHEMA_NAME= {C_STRING_WITH_LEN("information_schema")};
25 sp1r-kostja@bodhi.(none)-20070731194738-47444

From what I can tell, this is happening because the insert of the LEX_STRING INFORMATION... text is adding a newline at the end of the section. (Or just as easily at the beginning of the section.)

What it means, is that we think the line following "#include "mymd5.h" was moved to after the LEX_STRING section. (Note that we correctly believe that "my_md5.h" was an insertion, etc.

Anyway, because a section of code "swapped places" with another section, only one of them can be considered a match.
Unfortunately, that match ends up being the "\n" because it gets first dibs being an exact match against the left-hand parent.

It is rather silly to be doing this, but at least I have a feeling as to *why* it is happening.

There will always be cases of code moving and annotation having difficulty at that point. This is sort of the minimal way it can happen, and it seems that we should be able to recover.

The reason PatienceDiff's handling of non-unique lines isn't working here, is because we are matching against the *annotated* lines. Which means that (<revid>, '\n') is artificially more unique than '\n', so we think that the line should be a match.

The best answer I can come up with is to work around step (2)'s failure, and avoid matching the annotated lines. I'll see if I can put something together.