`bzr check` incorrectly reports about inconsistent parents

Bug #162931 reported by Alexander Belchenko
2
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Low
Robert Collins

Bug Description

John Meinel wrote:
Alexander Belchenko wrote:
> > bzr 0.92
> >
> > bzr init 1
> > cd 1
> > bzr mkdir bar
> > bzr ci -m 1
> > bzr check
> >
> > checked branch file:///C:/work/Temp/1/ format Bazaar Branch Format 6
> > (bzr 0.15)
> > checked repository <bzrlib.transport.local.LocalTransport
> > url=file:///C:/work/Temp/1/> format <RepositoryFormatKnit1>
> > 1 revisions
> > 1 file-ids
> > 0 unique file texts
> > 0 repeated file texts
> > 0 unreferenced text ancestors
> >
> > bzr mkdir foo
> > bzr ci -m 2
> > bzr check
> >
> > checked branch file:///C:/work/Temp/1/ format Bazaar Branch Format 6
> > (bzr 0.15)
> > checked repository <bzrlib.transport.local.LocalTransport
> > url=file:///C:/work/Temp/1/> format <RepositoryFormatKnit1>
> > 2 revisions
> > 2 file-ids
> > 0 unique file texts
> > 0 repeated file texts
> > 0 unreferenced text ancestors
> > 1 inconsistent parents
> > 1 file versions are not referenced by their inventory
> >
> > ^-- something really wrong here. Why parents are inconsistent?
> > It's the fresh branch! What it means? Why file versions are
> > not versioned by their inventory? What kind of check 'bzr check' does?
> > How to interpeter their output? Help say nothing about this:

It is a bug in the check code. I'm not sure what is causing it, but I see the
same problem here.

Adding some debugging statements, it seems to be saying that

Looking at my revisions:
    1 John Arbash Meinel 2007-11-15
      revision-id:<email address hidden>
      1
added:
  bar/ bar-20071115172717-gkupq6i121hu1fdb-1

    2 John Arbash Meinel 2007-11-15
      revision-id:<email address hidden>
      2
added:
  foo/ foo-20071115172738-ov7vd6jwdwnx10ss-1

And then check is complaining about:
    file_id: bar-20071115172717-gkupq6i121hu1fdb-1
   text_rev: <email address hidden>
revision_id: <email address hidden>

So for some reason the check code thinks that the last modified revision for
'bar' should be the second commit. Even though it knows that it hasn't changed.

The loop itself seems a bit wrong:
        wrong_parents = {}
        dangling_file_versions = set()
        for num, revision_id in enumerate(self.planned_revisions):
            correct_parents = self.calculate_file_version_parents(
                revision_id, file_id)
            if correct_parents is None:
                continue
            text_revision = self.revision_versions.get_text_version(
                file_id, revision_id)
            try:
                knit_parents = tuple(weave.get_parents(revision_id))
            except errors.RevisionNotPresent:
                knit_parents = None
            if text_revision != revision_id:
                # This file version is not referenced by its corresponding
                # inventory!
                print ' file_id: %s\n text_rev: %s\nrevision_id: %s\n' % (
                    file_id, text_revision, revision_id)
                dangling_file_versions.add((file_id, revision_id))
            if correct_parents != knit_parents:
                wrong_parents[revision_id] = (knit_parents, correct_parents)

If I understand correctly, this is saying...

For all revisions I am checking (planned_revisions is passed in, and seems
completely independent from the file_id)

  correct_parents seems to be None when the file doesn't exist in this
    revision, fair enough.
  figure out what the last-modified value for that file is from the inventory
  get the parents from the weave/knit file
  if the last-modified revision != current_revision complain
  if the weave/knit parents don't match 'correct' complain

The problem is that last-modified revision will usually *not* be
current_revision, *unless* the file was modified in this revision.

I don't see where unmodified files in a revision skip this loop. Which means
that for all unmodified files there will be a dangling_file_version.

I can also prove that with:

bzr init checktest
cd checktest
touch file
bzr add file
bzr commit -m 'file'
bzr check # Everything is okay
bzr commit --unchanged -m '1'
bzr check # 1 inconsistent parents, 1 version not referenced by inventory
bzr commit --unchnaged -m '2'
bzr check # 2 inconsistent parents, 2 versions not referenced by inventory

And if I have 2 files (touch foo bar; bzr add; bzr commit) then I get 2 and 4
instead of 1 and 2.

So basically, check_file_version_parents is just plain broken.

I wonder if it was written expecting self.planned_revisions to be only entries
that changed the file.

Anyway, definitely a bug on 'bzr check'. 'bzr annotate' blames Andrew for that
bit of code, but unfortunately he is gone for the next two weeks so we can't
quite ask what his plans were. It looks like he moved "check_parents" out from
VersionedFile, which makes me wonder if it wasn't just copy + paste without
realizing that it wasn't restricting to just the revisions which modified that
file.

John
=:->

Changed in bzr:
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
John A Meinel (jameinel) wrote :

This is a possible fix:
=== modified file 'bzrlib/check.py'
--- bzrlib/check.py 2007-10-26 06:48:09 +0000
+++ bzrlib/check.py 2007-11-15 18:13:35 +0000
@@ -208,6 +208,7 @@
         self.inventory_weave.check(progress_bar=self.progress)
         files_in_revisions = {}
         revisions_of_files = {}
+ planned_set = set(self.planned_revisions)
         for i, weave_id in enumerate(weave_ids):
             self.progress.update('checking versionedfile', i, n_weaves)
             w = self.repository.weave_store.get_weave(weave_id,
@@ -215,8 +216,9 @@
             # No progress here, because it looks ugly.
             w.check()

+ weave_revisions = [v for v in w.versions() if v in planned_set]
             weave_checker = self.repository.get_versioned_file_checker(
- self.planned_revisions, self.revision_versions)
+ weave_revisions, self.revision_versions)
             result = weave_checker.check_file_version_parents(w, weave_id)
             bad_parents, dangling_versions = result
             bad_parents = bad_parents.items()

It just restricts the VersionedFileChecker to only check revisions which are stored in the weave/knit. Which seems to be what the code expects.

Changed in bzr:
assignee: nobody → lifeless
status: Confirmed → Fix Committed
Vincent Ladeuil (vila)
Changed in bzr:
milestone: none → 1.0rc1
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.