ghost fetch issue: fail when fetching a text referenced by a live revision introduced by a ghost revision

Bug #246880 reported by Loïc Minier on 2008-07-09
10
Affects Status Importance Assigned to Milestone
Bazaar
High
Unassigned

Bug Description

A problem branching :
% bzr branch lp:~ubuntu-core-dev/casper/trunk
bzr: ERROR: bzrlib.errors.KnitCorrupt: Knit 9e/x_%254datt_%255aimmerman_%253cmatt.zimmerman%40canonical.com%253e_%2553un_%254dar_13_00%253a51%253a19_2005_1366.38 corrupt: line-delta from stream for version mdz@mizar-20051205230117-c327e75be767f237 references missing parent Arch-1:<email address hidden>%casper--main--0--patch-21
Try running "bzr check" on the source repository, and "bzr reconcile" if necessary.

is not fixable by reconcile, because reconcile would have to alter the inventory and does not currently do that.

The symptom is that:
        topo_order = tsort.topo_sort(ancestors)
        rev_order = dict(zip(topo_order, range(len(topo_order))))
        bad_texts.sort(key=lambda key:rev_order[key[0][1]])

bad_texts contains a key not in rev_order.

Matt Zimmerman (mdz) wrote :

In case it helps, the old Bazaar 1.0 branches corresponding to those revisions can be found here:

http://people.ubuntu.com/~<email address hidden>/

http://people.ubuntu.com/~<email address hidden>/

Robert Collins (lifeless) wrote :

In the tarball of the repository I have, there are two references to the missing revision:
x_Matt_Zimmerman_<email address hidden>_Sun_Mar_13_00:51:19_2005_1366.23^@Arch-1:<email address hidden>%casper--main--0--patch-21
and
x_Matt_Zimmerman_<email address hidden>_Sun_Mar_13_00:51:19_2005_1366.38^@Arch-1:<email address hidden>%casper--main--0--patch-21

(thats two files claiming to be modified in that revision)

I'm digging further now to find out if there are more references.

Robert Collins (lifeless) wrote :

References to the problem text:

[(('x_Matt_Zimmerman_<email address hidden>_Sun_Mar_13_00:51:19_2005_1366.38', 'mdz@mizar-20051205230117-c327e75be767f237'), (('x_Matt_Zimmerman_<email address hidden>_Sun_Mar_13_00:51:19_2005_1366.38', 'Arch-1:<email address hidden>%casper--main--0--patch-21'), ('x_Matt_Zimmerman_<email address hidden>_Sun_Mar_13_00:51:19_2005_1366.38', 'Arch-1:<email address hidden>%casper--main--0--base-0'))), (('x_Matt_Zimmerman_<email address hidden>_Sun_Mar_13_00:51:19_2005_1366.23', 'mdz@mizar-20051205230117-c327e75be767f237'), (('x_Matt_Zimmerman_<email address hidden>_Sun_Mar_13_00:51:19_2005_1366.23', 'Arch-1:<email address hidden>%casper--main--0--patch-21'), ('x_Matt_Zimmerman_<email address hidden>_Sun_Mar_13_00:51:19_2005_1366.23', 'Arch-1:<email address hidden>%casper--main--0--base-0')))]

Robert Collins (lifeless) wrote :

Both of these files are accessible:
list(r.iter_files_bytes([(ref[0], ref[1], None) for ref, parents in text_refs]))
[(None, '# This is here so that you can boot with "casper/enable=true"; it will never\n# be displayed, so there is no point in translating this.\nTemplate: casper/enable\nType: boolean\nDefault: false\nDescription: Enable casper mode?\n\nTemplate: casper/info\nType: title\n_Description: Ubuntu Live\n'), (None, '#! /bin/sh\n. /usr/share/debconf/confmodule\ndb_get casper/enable\nif [ "$RET" = true ]; then\n\tdb_info casper/info\nfi\n')]

so there would appear to be a index mismatch.. checking furthr

Robert Collins (lifeless) wrote :

Tracked down the root cause:
 - there are revisions missing
 - there are full texts from those revisions

Reconcile is meant to find and fix this. So this is ultimately a bug in reconcile.

>>> r.texts._get_components_positions([text_refs[0][0]])
{('x_Matt_Zimmerman_<email address hidden>_Sun_Mar_13_00:51:19_2005_1366.38', 'Arch-1:<email address hidden>%casper--main--0--patch-21'): (('line-delta', False), (GraphIndex('file:///home/robertc/source/casper/casper/.bzr/repository/indices/8042739a7b3d49049673383de67430db.tix'), 6297, 156), ('x_Matt_Zimmerman_<email address hidden>_Sun_Mar_13_00:51:19_2005_1366.38', 'Arch-1:<email address hidden>%casper--main--0--base-0')), ('x_Matt_Zimmerman_<email address hidden>_Sun_Mar_13_00:51:19_2005_1366.38', 'Arch-1:<email address hidden>%casper--main--0--base-0'): (('fulltext', False), (GraphIndex('file:///home/robertc/source/casper/casper/.bzr/repository/indices/8042739a7b3d49049673383de67430db.tix'), 6066, 231), None), ('x_Matt_Zimmerman_<email address hidden>_Sun_Mar_13_00:51:19_2005_1366.38', 'mdz@mizar-20051205230117-c327e75be767f237'): (('line-delta', False), (GraphIndex('file:///home/robertc/source/casper/casper/.bzr/repository/indices/8042739a7b3d49049673383de67430db.tix'), 6669, 117), ('x_Matt_Zimmerman_<email address hidden>_Sun_Mar_13_00:51:19_2005_1366.38', 'Arch-1:<email address hidden>%casper--main--0--patch-21'))}
>>> r.has_revision('Arch-1:<email address hidden>%casper--main--0--base-0')
False

Changed in bzr:
importance: Undecided → High
status: New → Triaged
description: updated

original branch from colin's laptop

Andrew Bennetts (spiv) on 2008-07-29
description: updated
description: updated

Ok, have finally tracked down exactly what is going on.

There is an inventory (don't have the revid for it yet) that refers to ('x_Matt_Zimmerman_<email address hidden>_Sun_Mar_13_00:51:19_2005_1366.38', 'Arch-1:<email address hidden>%casper--main--0--patch-21').

That revision is a ghost.

Current fetch will not copy texts that were introduced by a revision not being fetched;
this text belongs to a revision that does not exist.

I'm not sure that we should alter the inventory during reconcile; its likely a bug in fetch, but the simplest workaround for this particular branch is to convert the original history and fetch-ghosts it into colin's repository.

description: updated
Colin Watson (cjwatson) wrote :
Download full text (7.8 KiB)

I tried to do this and got some way, but am now stuck. The current branch error is:

<cjwatson@sarantium ~/src/ubuntu/casper/bzr>$ bzr branch casper.backup junk
bzr: ERROR: Revision {Arch-1:<email address hidden>%casper--main--0--patch-9} not present in "x_Matt_Zimmerman_<email address hidden>_Sun_Mar_13_00:51:19_2005_1366.36".

I believe that I have imported that revision, although with some difficulty. fetch-ghosts refused to fetch it, I think because it was a direct ancestor, and I had to use bzrlib by hand to say this_branch.fetch(other_branch, 'Arch-1:<email address hidden>%casper--main--0--patch-26'). However, that succeeded (reporting 27 new revisions), so I proceeded to try to import the other branches that would supply the remaining ghosts.

My current attempts are failing like this. Relevant baz archives are:

  http://people.ubuntu.com/~<email address hidden>/
  http://people.ubuntu.com/~<email address hidden>/
  http://people.ubuntu.com/~<email address hidden>/
  http://people.ubuntu.com/~<email address hidden>/

<cjwatson@sarantium ~/src/ubuntu/casper/bzr/casper.backup>$ bzr check
checked branch file:///home/cjwatson/src/ubuntu/casper/bzr/casper.backup/ format Bazaar Branch Format 6 (bzr 0.15)
checked repository <bzrlib.transport.local.LocalTransport url=file:///home/cjwatson/src/ubuntu/casper/bzr/casper.backup/> format <RepositoryFormatKnitPack1>
   758 revisions
   115 file-ids
  1398 unique file texts
 33520 repeated file texts
     0 unreferenced text versions
    12 ghost revisions
    12 revisions missing parents in ancestry
<cjwatson@sarantium ~/src/ubuntu/casper/bzr/casper.backup>$ cd ..
<cjwatson@sarantium ~/src/ubuntu/casper/bzr>$ bzr baz-import-branch tfheen-unionfs <email address hidden>/casper--unionfs--0 casper.backup old-main
importing <email address hidden>/casper--unionfs--0 into /home/cjwatson/src/ubuntu/casper/bzr/tfheen-unionfs
Cleaning up
Import complete.
<cjwatson@sarantium ~/src/ubuntu/casper/bzr>$ cd casper.backup
<cjwatson@sarantium ~/src/ubuntu/casper/bzr/casper.backup>$ bzr fetch-ghosts ../tfheen-unionfs
Installed:
Arch-1:<email address hidden>%casper--unionfs--0--patch-3
Still missing:
Arch-1:<email address hidden>%casper--translations--0--patch-19
Arch-1:<email address hidden>%casper--quieten--0--patch-1
Arch-1:<email address hidden>%casper--vtfix--0--patch-2
Reconciling branch file:///home/cjwatson/src/ubuntu/casper/bzr/casper.backup/
revision_history ok.
Reconciling repository file:///home/cjwatson/src/ubuntu/casper/bzr/casper.backup/
bzr: ERROR: bzrlib.errors.KnitCorrupt: Knit inventory corrupt:
  sha-1 46ef5baf0c84edf97e1aa5fe25a1a9d9e89938fe
  of reconstructed text does not match
  expected 5e691b65534b4ebd046134664516fe58c0c30547
  for version Arch-1:<email address hidden>%casper--unionfs--0--base-0

Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 846, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", lin...

Read more...

Colin Watson (cjwatson) wrote :

Is there anything more I can try here? This is an important core package in Ubuntu and a variety of people often want to make changes to it; not being able to do so is a real problem for us.

Martin Pool (mbp) wrote :

<cjwatson> is there any useful way I can escalate bug 246880? it's becoming a real hassle for us on that one branch
<cjwatson> I think I have all the data of the imported branches and am happy to try weird stuff for people

I'll try to either speak to Robert about it or pick up where he left off.

Colin Watson (cjwatson) wrote :

Thanks, much appreciated.

John A Meinel (jameinel) wrote :

Just to follow up, as Kiko asked me to poke my head in here.

To summarize the problem: There is a revision, which says that it did *not* modify a text, but that you should use the version of the text available in one of its (ancestral) parents. That parent is actually a ghost, which means we don't have information for it, such as an inventory that we can use to recognize the text to be copied.

In 'knit' format, Bazaar would keep track of each files ancestry separately, and didn't realize that the text wasn't referenced from the appropriate inventory. So at "fetch" time, it just grabbed everything it didn't have that was in the ancestry of the revisions it was interested in.

The new fetch code says "give me all texts for *these* revisions".

It is arguably a bug in the converter, that it decided to assign the text to a revision it didn't know anything about.

There are 2 possible "easy" fixes (one easier than the other, but more "evil")

1) Write some code (possibly part of reconcile) that fixes up these inventories, and changes the 'last-modified' field. Instead of having it reference the "ghost" revision, it will just say that the file was modified in THIS (since it differs from all other *present* parents). And then insert the text with that revision.

This requires rewriting specific inventories, and won't propagate, etc.

2) Generate a "fake" revision with the right information. Specifically, we need it to have a revision id which matches, and an inventory which references the file in question.

This could probably be done in about 5 minutes. (If I was doing it, I would take the information for the "broken" revision, copy it, modify the 2 attributes, and push that into the repository.)

(2) is evil, but easy and fast. You are injecting false information into the system. If the real information is "out-there" and comes in contact with this code, the results are *undefined*. In this particular case, I don't think it is a problem. If you had access to that revision, you could just convert it, and then it would just join into the ancestry without a problem. As it seems to be "truly gone" we can create an artificial one without them meeting eachother.

If you are okay with (2), I can do it, and probably give you a bundle to merge. At which point your repository will be 'infected" with the artificial revision and cloning from it will succeed without complaint. Future clones from this repository will also stay infected.

Merging this repository probably will not, as we don't fill in ghosts as a side-effect anymore.

Martin Pool (mbp) wrote :

In (2) the fake revision John's referring to is to create a revision for the ghost revision. So essentially it cuts off this situation at the very start: we'll never be trying to pull a text created in a ghost. If the ghost is ever filled it will be bad, but if the ghost could be filled we should do that now.

On Thu, Sep 11, 2008 at 11:21:33PM -0000, Martin Pool wrote:
> In (2) the fake revision John's referring to is to create a revision for
> the ghost revision. So essentially it cuts off this situation at the
> very start: we'll never be trying to pull a text created in a ghost. If
> the ghost is ever filled it will be bad, but if the ghost could be
> filled we should do that now.

We still have the old format branches as noted in
https://bugs.edge.launchpad.net/bzr/+bug/246880/comments/2

Filling in the ghost seems like the most straightforward solution to me;
just tell me what more is needed.

--
 - mdz

Colin Watson (cjwatson) wrote :

The comment Matt links to doesn't have all the ghost revisions needed, but https://bugs.edge.launchpad.net/bzr/+bug/246880/comments/9 does.

Colin Watson (cjwatson) wrote :

John, when you say that 1) "won't propagate", does that mean that existing branches will be broken and everyone will have to re-clone from scratch? (Not that there are all that many existing branches since the breakage has made it difficult for people to branch.)

Colin Watson (cjwatson) wrote :

Oh, I can also say that if we do 2) then there is no danger of the ghost ever being filled in any other way, as far as I know. I don't know of any branches out there that contain any more conversions from baz than we already have in the current branch, and there would be no reason to bother with doing this if the current branch were fixed.

On Fri, Sep 12, 2008 at 7:11 PM, Colin Watson <email address hidden> wrote:
> John, when you say that 1) "won't propagate", does that mean that
> existing branches will be broken and everyone will have to re-clone from
> scratch? (Not that there are all that many existing branches since the
> breakage has made it difficult for people to branch.)

No, he means that if someone pulls or merges from one of the branches
with this inserted, the puller won't automatically get the fix.
(This is because we're going back to insert a revision that's kind of
half-present and the code to determine what to fetch doesn't search
through things that should already be present.)

So people affected by this would need to either run the command to fix
it, or re-clone.

--
Martin <http://launchpad.net/~mbp/>

Martin Pool (mbp) wrote :

If we do in fact have all the original baz data it would be safer and cleaner to go back and insert it.

On Fri, Sep 12, 2008 at 09:29:18AM -0000, Martin Pool wrote:
> If we do in fact have all the original baz data it would be safer and
> cleaner to go back and insert it.

Can you help us do that please?

--
 - mdz

Robert Collins (lifeless) wrote :

There are several things to do here:
 *) recover the casper data
 *) fix 'check' to report this as errors
 *) fix 'fetch' to preserve the referenced data

On the first point I'm creating a script to regenerate the inventories and texts to be consistent with the available history.

Robert Collins (lifeless) wrote :

Fixed branch. This has the same revision ids, but inventories freshly created. Some testing is worth while - e.g. export a couple of random revisions to disk and diff them.

Robert Collins (lifeless) wrote :

Script I used to clean it up

Colin Watson (cjwatson) wrote :

Thanks! I had to run your script again because the branch had accumulated some more revisions since the version you were using, but the result was fine (tested by diffing every revision) and I've pushed it to Launchpad now.

  https://lists.ubuntu.com/archives/ubuntu-devel/2008-September/026498.html

Martin Pool (mbp) wrote :

Per bug 270738 a similar problem might be present in pqm, also converted from baz, a copy of which is in the attached tarball.

John A Meinel (jameinel) wrote :

I'm pretty sure this is fixed now. We changed the 'fetch' code to do a set-difference, so it will find keys that are referenced even when the revision_id itself is not fetched.

As part of this, Jelmer recently fixed 'reconcile' so that it won't abort on those keys. I believe it still figures them to be 'bad_texts' but the sort key has been updated to not fail arbitrarily.

(Note that the *fetch* issue was fixed a few releases ago, but the reconcile fix was only done recently)

Changed in bzr:
milestone: none → 1.16
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions