"key ... not in nodes" error on incremental commits

Bug #898806 reported by Andy Grimm
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Bazaar Fast Import
New
Undecided
Unassigned

Bug Description

I've been having some issues with testing of incremental commits in fast-import. My sync script is doing this:

#!/bin/sh
BASELINE=2335
REPOPATH=/mnt/code/bzr/
bzr pull $REPOPATH/eee
bzr fast-export --baseline -r ${BASELINE}.. $REPOPATH/eee \
    | bzr fast-import-filter \
           (lots of redacted "-x" arguments here)
           --dont-squash-empty-commits \
           --user-map=user.map >3.0_preview.fi
bzr fast-import 3.0_preview.fi 3.0_preview

This works for the initial import, and it _sometimes_ works for updates. Very often, though, I get an error like this:

12:10:45 Starting import of 622 commits ...
12:10:46 Found 538 commits already loaded - skipping over these ...
ABORT: exception occurred processing commit :540
bzr: ERROR: exceptions.KeyError: "key ('<email address hidden>',) not in nodes"

Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/bzrlib/commands.py", line 946, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/bzrlib/commands.py", line 1150, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib64/python2.7/site-packages/bzrlib/commands.py", line 699, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib64/python2.7/site-packages/bzrlib/commands.py", line 721, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/bzrlib/cleanup.py", line 135, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/bzrlib/cleanup.py", line 165, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/home/agrimm/.local/lib/python2.7/site-packages/bzrlib/plugins/fastimport/cmds.py", line 313, in run
    user_map=user_map)
  File "/home/agrimm/.local/lib/python2.7/site-packages/bzrlib/plugins/fastimport/cmds.py", line 39, in _run
    return proc.process(p.iter_commands)
  File "/home/agrimm/.local/lib/python2.7/site-packages/bzrlib/plugins/fastimport/processors/generic_processor.py", line 310, in process
    super(GenericProcessor, self)._process(command_iter)
  File "/home/agrimm/.local/lib/python2.7/site-packages/fastimport/processor.py", line 75, in _process
    handler(self, cmd)
  File "/home/agrimm/.local/lib/python2.7/site-packages/bzrlib/plugins/fastimport/processors/generic_processor.py", line 535, in commit_handler
    handler.process()
  File "/home/agrimm/.local/lib/python2.7/site-packages/fastimport/processor.py", line 158, in process
    self.post_process_files()
  File "/home/agrimm/.local/lib/python2.7/site-packages/bzrlib/plugins/fastimport/bzr_commit_handler.py", line 672, in post_process_files
    self._get_inventories)
  File "/home/agrimm/.local/lib/python2.7/site-packages/bzrlib/plugins/fastimport/revision_store.py", line 408, in load_using_delta
    tree, basis_rev_id, changes):
  File "/usr/lib64/python2.7/site-packages/bzrlib/vf_repository.py", line 693, in record_iter_changes
    head_set = self._heads(change[0], set(head_candidates))
  File "/home/agrimm/.local/lib/python2.7/site-packages/bzrlib/plugins/fastimport/revision_store.py", line 395, in thunked_heads
    res = set(self._graph.heads(revision_ids))
  File "/usr/lib64/python2.7/site-packages/bzrlib/graph.py", line 1979, in heads
    head_keys = self._graph.heads(as_keys)
  File "_known_graph_pyx.pyx", line 444, in bzrlib._known_graph_pyx.KnownGraph.heads (bzrlib/_known_graph_pyx.c:4217)
KeyError: "key ('<email address hidden>',) not in nodes"

What appears to be happening is that the import is attempting to process a change which has two parents, but only one of the parents already exists in the known graph. The code path causing the failure is the "else" in this section of code in thunked_heads in revision_store.py:

384 if len(revision_ids) < 2:
385 res = set(revision_ids)
386 else:
387 res = set(self._graph.heads(revision_ids))

My quick fix which seemed to work was to test each possible head individually, and simply ignore non-existent nodes:

                   if len(revision_ids) < 2:
                       res = set(revision_ids)
                   else:
                       res = set()
                       for x in revision_ids:
                           try:
                               res.add(self._graph.heads([x,]))
                           except:
                               continue

I don't know this code very well, though, so I suspect there's a better solution than this.

I'm trying to determine a simple reproducer, but it may just have to wait until the bzr repo I'm working with becomes public next week.

Revision history for this message
Andy Grimm (agrimm) wrote :

Oops, the patch broke three tests, so it's not good...

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Hi,

What version of bzr-fastimport are you using? The line numbers seem pretty off for e.g. trunk.

Revision history for this message
Andy Grimm (agrimm) wrote :

Sorry, I had forgotten about this issue, but I just hit it again on a system without my workaround patch.

Revision is:

revno: 345
revision-id: <email address hidden>
parent: <email address hidden>
committer: Jelmer Vernooij <email address hidden>
branch nick: trunk
timestamp: Tue 2012-01-10 09:48:02 +0100
message:
  Fix compatibility with bzr 2.5.

I'm not sure why the line number seem off to you. I double-checked, and those numbers are still correct. After adding my workaround patch, I got past my issue today as well... I've been successfully running with the patch for a couple of months now. I still need to figure out why the patch is breaking tests, but it certainly seems to have fixed my problem.

Revision history for this message
Florian Rathgeber (florian-rathgeber) wrote :

I can confirm the issue on Ubuntu 12.04, using the following bzr version (to make it work with git-bzr-ng): https://code.launchpad.net/~larstiq/bzr/bug541626, and bzr-fastimport 0.13.0-1 (which is the trunk I believe).

$ bzr --version
Bazaar (bzr) 2.6.0dev2
  from bzr checkout /home/fr710/src/bzr
    revision: 6524
    revid: <email address hidden>
    branch nick: bzr
  Python interpreter: /usr/bin/python 2.7.3
  Python standard library: /usr/lib/python2.7
  Platform: Linux-3.2.0-24-generic-x86_64-with-Ubuntu-12.04-precise

Backtrace:

  File "/home/fr710/src/bzr/bzrlib/commands.py", line 930, in exception_to_retur
n_code
    return the_callable(*args, **kwargs)
  File "/home/fr710/src/bzr/bzrlib/commands.py", line 1141, in run_bzr
    ret = run(*run_argv)
  File "/home/fr710/src/bzr/bzrlib/commands.py", line 673, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/home/fr710/src/bzr/bzrlib/commands.py", line 697, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/home/fr710/src/bzr/bzrlib/cleanup.py", line 136, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/home/fr710/src/bzr/bzrlib/cleanup.py", line 166, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/fastimport/cmds.py", lin
e 307, in run
    user_map=user_map)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/fastimport/cmds.py", lin
e 39, in _run
    return proc.process(p.iter_commands)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/fastimport/processors/ge
neric_processor.py", line 310, in process
    super(GenericProcessor, self)._process(command_iter)
  File "/usr/lib/python2.7/dist-packages/fastimport/processor.py", line 75, in _
process
    handler(self, cmd)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/fastimport/processors/ge
neric_processor.py", line 535, in commit_handler
    handler.process()
  File "/usr/lib/python2.7/dist-packages/fastimport/processor.py", line 158, in
process
    self.post_process_files()
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/fastimport/bzr_commit_ha
ndler.py", line 672, in post_process_files
    self._get_inventories)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/fastimport/revision_stor
e.py", line 400, in load_using_delta
    tree, basis_rev_id, changes):
  File "/home/fr710/src/bzr/bzrlib/vf_repository.py", line 705, in record_iter_c
hanges
    head_set = self._heads(change[0], set(head_candidates))
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/fastimport/revision_stor
e.py", line 387, in thunked_heads
    res = set(self._graph.heads(revision_ids))
  File "/home/fr710/src/bzr/bzrlib/graph.py", line 1697, in heads
    head_keys = self._graph.heads(as_keys)
  File "_known_graph_pyx.pyx", line 444, in bzrlib._known_graph_pyx.KnownGraph.h
eads (bzrlib/_known_graph_pyx.c:4217)
KeyError: "key ('xxx',) not in nodes"

The suggested workaround has worked for my case.

Revision history for this message
xrg (xrg) wrote :

Could this workaround be any better: ?

if len(revision_ids) < 2:
                    res = set(revision_ids)
                else:
                    res = set()
                    for r in revision_ids:
                        try:
                            res.update(self._graph.heads([r,]))
                        except KeyError:
                            res.add(r)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.