Client.fetch() doesn't support thin packs

Reported by Jody McIntyre on 2012-07-17
48
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Dulwich
High
Unassigned

Bug Description

I'm getting a KeyError when attempting to use hg-git to pull changes from two repos:

scjody@ailuropoda:~$ hg init foo
scjody@ailuropoda:~$ cd foo
scjody@ailuropoda:~/foo$ hg pull git+https://github.com/mozilla/django-csp.git
pulling from git+https://github.com/mozilla/django-csp.git
importing git objects into hg
(run 'hg heads' to see heads)
scjody@ailuropoda:~/foo$ hg pull git+https://github.com/fmarier/django-csp.git
pulling from git+https://github.com/fmarier/django-csp.git
** unknown exception encountered, please report by visiting
** http://mercurial.selenic.com/wiki/BugTracker
** Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) [GCC 4.4.5]
** Mercurial Distributed SCM (version 2.1)
** Extensions loaded: color, children, fetch, graphlog, hgk, mq, purge, rebase, record, transplant, convert, kiln.bigpush, kiln.caseguard, kiln.gestalt, kiln.kilnauth, kiln.kilnpath, pager, hggit, extdiff, churn
Traceback (most recent call last):
  File "/usr/bin/hg", line 38, in <module>
    mercurial.dispatch.run()
  File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 27, in run
    sys.exit((dispatch(request(sys.argv[1:])) or 0) & 255)
  File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 64, in dispatch
    return _runcatch(req)
  File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 87, in _runcatch
    return _dispatch(req)
  File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 683, in _dispatch
    cmdpats, cmdoptions)
  File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 465, in runcommand
    ret = _runcommand(ui, options, cmd, d)
  File "/usr/lib/pymodules/python2.6/mercurial/extensions.py", line 184, in wrap
    return wrapper(origfn, *args, **kwargs)
  File "/usr/lib/pymodules/python2.6/hgext/pager.py", line 107, in pagecmd
    return orig(ui, options, cmd, cmdfunc)
  File "/usr/lib/pymodules/python2.6/mercurial/extensions.py", line 184, in wrap
    return wrapper(origfn, *args, **kwargs)
  File "/usr/lib/pymodules/python2.6/hgext/color.py", line 362, in colorcmd
    return orig(ui_, opts, cmd, cmdfunc)
  File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 737, in _runcommand
    return checkargs()
  File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 691, in checkargs
    return cmdfunc()
  File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 680, in <lambda>
    d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
  File "/usr/lib/pymodules/python2.6/mercurial/util.py", line 456, in check
    return func(*args, **kwargs)
  File "/usr/lib/pymodules/python2.6/mercurial/extensions.py", line 139, in wrap
    util.checksignature(origfn), *args, **kwargs)
  File "/usr/lib/pymodules/python2.6/mercurial/util.py", line 456, in check
    return func(*args, **kwargs)
  File "/usr/lib/pymodules/python2.6/hgext/rebase.py", line 652, in pullrebase
    orig(ui, repo, *args, **opts)
  File "/usr/lib/pymodules/python2.6/mercurial/util.py", line 456, in check
    return func(*args, **kwargs)
  File "/usr/lib/pymodules/python2.6/mercurial/extensions.py", line 139, in wrap
    util.checksignature(origfn), *args, **kwargs)
  File "/usr/lib/pymodules/python2.6/mercurial/util.py", line 456, in check
    return func(*args, **kwargs)
  File "/usr/lib/pymodules/python2.6/hgext/mq.py", line 3325, in mqcommand
    return orig(ui, repo, *args, **kwargs)
  File "/usr/lib/pymodules/python2.6/mercurial/util.py", line 456, in check
    return func(*args, **kwargs)
  File "/usr/lib/pymodules/python2.6/mercurial/commands.py", line 4340, in pull
    modheads = repo.pull(other, heads=revs, force=opts.get('force'))
  File "/home/scjody/pwbank/src/hg-git/hggit/hgrepo.py", line 14, in pull
    return git.fetch(remote.path, heads)
  File "/home/scjody/pwbank/src/hg-git/hggit/git_handler.py", line 154, in fetch
    refs = self.fetch_pack(remote, heads)
  File "/home/scjody/pwbank/src/hg-git/hggit/git_handler.py", line 821, in fetch_pack
    commit()
  File "/usr/lib/pymodules/python2.6/dulwich/object_store.py", line 568, in commit
    return self.move_in_pack(path)
  File "/usr/lib/pymodules/python2.6/dulwich/object_store.py", line 542, in move_in_pack
    entries = p.sorted_entries()
  File "/usr/lib/pymodules/python2.6/dulwich/pack.py", line 1098, in sorted_entries
    ret = list(self.iterentries(progress=progress))
  File "/usr/lib/pymodules/python2.6/dulwich/pack.py", line 1086, in iterentries
    for i, result in enumerate(PackIndexer.for_pack_data(self)):
  File "/usr/lib/pymodules/python2.6/dulwich/pack.py", line 1233, in _walk_all_chains
    for result in self._walk_ref_chains():
  File "/usr/lib/pymodules/python2.6/dulwich/pack.py", line 1243, in _walk_ref_chains
    self._ensure_no_pending()
  File "/usr/lib/pymodules/python2.6/dulwich/pack.py", line 1239, in _ensure_no_pending
    raise KeyError([sha_to_hex(s) for s in self._pending_ref])
KeyError: ['f69f776453ccec1a9f9bdc5bfaa49fd0e375f9b3', '2f01f206a41522c7e57ebce050b70dbecd93dd7a', 'a8c54dd23c7d444774aec1404bbaa326638aff1d', '8069b7ba8e8b8bfd60ed84e949de3db08e7886c3']
scjody@ailuropoda:~/foo$

As a workaround, the problem goes away if I use git:// instead of git+https:// in the second URL.

I'm using an up to date clone of hg-git with Dulwich 0.8.1 (I've confirmed it still happens with an up to date clone of Dulwich).

Hi,

Thanks for the bugreport.

On Tue, Jul 17, 2012 at 09:02:26PM -0000, Jody McIntyre wrote:
> I'm getting a KeyError when attempting to use hg-git to pull changes
> from two repos:
>
> scjody@ailuropoda:~$ hg init foo
> scjody@ailuropoda:~$ cd foo
> scjody@ailuropoda:~/foo$ hg pull git+https://github.com/mozilla/django-csp.git
> pulling from git+https://github.com/mozilla/django-csp.git
[...]
>
> I'm using an up to date clone of hg-git with Dulwich 0.8.1 (I've
> confirmed it still happens with an up to date clone of Dulwich).

Can you reproduce this problem with plain dulwich ? dulwich itself
seems to happily clone this repository.

$ dulwich clone https://github.com/mozilla/django-csp.git
$

  affects dulwich
  status incomplete

Cheers,

Jelmer

Changed in dulwich:
status: New → Incomplete
Jody McIntyre (scjody) wrote :

On Tue, Jul 17, 2012 at 5:15 PM, Jelmer Vernooij <<email address hidden>
> wrote:

> Can you reproduce this problem with plain dulwich ? dulwich itself
> seems to happily clone this repository.
>

hg-git is happy to clone either repository too. The issue occurs when I
try to pull the "fmarier" repository into a local repository already
containing the "mozilla" repository. I don't know how to do that with
Dulwich.

scjody@ailuropoda:~$ hg clone git+https://github.com/mozilla/django-csp.git
destination directory: django-csp
importing git objects into hg
updating to branch default
20 files updated, 0 files merged, 0 files removed, 0 files unresolved
scjody@ailuropoda:~$ cd django-csp
scjody@ailuropoda:~/django-csp$ hg pull git+
https://github.com/fmarier/django-csp.git
pulling from git+https://github.com/fmarier/django-csp.git
** unknown exception encountered, please report by visiting
** http://mercurial.selenic.com/wiki/BugTracker
...

Jelmer Vernooij (jelmer) wrote :

On Tue, Jul 17, 2012 at 09:25:56PM -0000, Jody McIntyre wrote:
> On Tue, Jul 17, 2012 at 5:15 PM, Jelmer Vernooij <<email address hidden>
> > wrote:
>
> > Can you reproduce this problem with plain dulwich ? dulwich itself
> > seems to happily clone this repository.
> >
>
> hg-git is happy to clone either repository too. The issue occurs when I
> try to pull the "fmarier" repository into a local repository already
> containing the "mozilla" repository. I don't know how to do that with
> Dulwich.
>
> scjody@ailuropoda:~$ hg clone git+https://github.com/mozilla/django-csp.git
> destination directory: django-csp
> importing git objects into hg
> updating to branch default
> 20 files updated, 0 files merged, 0 files removed, 0 files unresolved
> scjody@ailuropoda:~$ cd django-csp
> scjody@ailuropoda:~/django-csp$ hg pull git+
> https://github.com/fmarier/django-csp.git
> pulling from git+https://github.com/fmarier/django-csp.git
> ** unknown exception encountered, please report by visiting
> ** http://mercurial.selenic.com/wiki/BugTracker
> ...
Ah, thanks - I can reproduce that using dulwich too ("dulwich fetch
...")

Cheers,

Jelmer

Volodymyr Kostyrko (c-kworr) wrote :

This one is a dup of #783456 I think.

Bob Halley (rthalley) wrote :

FWIW, I worked around this by removing the "thin-pack" capability from the client, e.g.

client, path = get_transport_and_path(args.pop(0))
client._fetch_capabilities.remove('thin-pack')

Clearly this isn't a fix for whatever's going wrong receiving a thin pack, but it seems to be an effective workaround.

Jelmer Vernooij (jelmer) wrote :

Is hg-git adding the rest of the repository as a fallback for the thin pack that is received? It sounds like it isn't.

dom (dominikruf) wrote :

I tried to add client._fetch_capabilities.remove('thin-pack') to git_handler.py
but this only leads to

KeyError: 'thin-pack'

So Bobs workaround doesn't work for me :-(

Jelmer Vernooij (jelmer) wrote :

I'm inclined to close this as it seems to be a hg-git bug; it should either provide context *or* disable thin packs. Can somebody confirm that?

dom (dominikruf) wrote :

I don't know much about dulwich or hg-git but as I wrote before I was not able to disable thin packs the way bob did.
I just looked into the git_handler.py and found the following line
return client.HttpGitClient(uri, thin_packs=False), uri

So to me it seems that thin packs are already disabled.

What kind of context do you need?

dom (dominikruf) wrote :

since you can reproduce this with pure dulwich I don't think this is a hg-git problem

$ dulwich clone https://github.com/mozilla/django-csp.git
$ cd django-csp.git/
$ dulwich fetch https://github.com/fmarier/django-csp.git
...
    raise KeyError([sha_to_hex(s) for s in self._pending_ref])
KeyError: ['f69f776453ccec1a9f9bdc5bfaa49fd0e375f9b3', 'a8c54dd23c7d444774aec140
4bbaa326638aff1d']

Am Monday, den 24.09.2012, 13:22 +0000 schrieb dom:
> since you can reproduce this with pure dulwich I don't think this is a
> hg-git problem
>
> $ dulwich clone https://github.com/mozilla/django-csp.git
> $ cd django-csp.git/
> $ dulwich fetch https://github.com/fmarier/django-csp.git
> ...
> raise KeyError([sha_to_hex(s) for s in self._pending_ref])
> KeyError: ['f69f776453ccec1a9f9bdc5bfaa49fd0e375f9b3', 'a8c54dd23c7d444774aec140
> 4bbaa326638aff1d']
Actually, the example dulwich script doesn't handle thin packs correctly
either (and doesn't disable them). It just assumes it only receives full
packs.

Cheers,

Jelmer

OK I looked into the tutorials at http://www.samba.org/~jelmer/dulwich/docs/tutorial/index.html and tried to understand how all this works. Sadly my simple example lead to the same error.

What I tried is I cloned https://github.com/mozilla/django-csp.git

git clone https://github.com/mozilla/django-csp.git

then I started python and did the following

from dulwich.repo import Repo
from dulwich.client import HttpGitClient
from dulwich.repo import Repo

repo = Repo('django-csp')
client = HttpGitClient('https://github.com/fmarier/django-csp.git')
client.fetch('https://github.com/fmarier/django-csp.git', repo)

which leads to the well known Exception
...
C:\tools\Python27\lib\site-packages\dulwich-0.8.5-py2.7-win32.egg\dulwich\pack.p
yc in _ensure_no_pending(self)
   1242 def _ensure_no_pending(self):
   1243 if self._pending_ref:
-> 1244 raise KeyError([sha_to_hex(s) for s in self._pending_ref])
   1245
   1246 def _walk_ref_chains(self):

KeyError: ['f69f776453ccec1a9f9bdc5bfaa49fd0e375f9b3', '2f01f206a41522c7e57ebce0
50b70dbecd93dd7a', 'a8c54dd23c7d444774aec1404bbaa326638aff1d', '8069b7ba8e8b8bfd
60ed84e949de3db08e7886c3']

So my question now is what would be the right way to do this?

cheers
Dominik

On Thu, 2012-09-27 at 08:57 +0000, dom wrote:
> OK I looked into the tutorials at
> http://www.samba.org/~jelmer/dulwich/docs/tutorial/index.html and tried
> to understand how all this works. Sadly my simple example lead to the
> same error.
>
> What I tried is I cloned https://github.com/mozilla/django-csp.git
>
> git clone https://github.com/mozilla/django-csp.git
>
> then I started python and did the following
>
> from dulwich.repo import Repo
> from dulwich.client import HttpGitClient
> from dulwich.repo import Repo
>
> repo = Repo('django-csp')
> client = HttpGitClient('https://github.com/fmarier/django-csp.git')
> client.fetch('https://github.com/fmarier/django-csp.git', repo)
>
> which leads to the well known Exception
> ...
> C:\tools\Python27\lib\site-packages\dulwich-0.8.5-py2.7-win32.egg\dulwich\pack.p
> yc in _ensure_no_pending(self)
> 1242 def _ensure_no_pending(self):
> 1243 if self._pending_ref:
> -> 1244 raise KeyError([sha_to_hex(s) for s in self._pending_ref])
> 1245
> 1246 def _walk_ref_chains(self):
>
> KeyError: ['f69f776453ccec1a9f9bdc5bfaa49fd0e375f9b3', '2f01f206a41522c7e57ebce0
> 50b70dbecd93dd7a', 'a8c54dd23c7d444774aec1404bbaa326638aff1d', '8069b7ba8e8b8bfd
> 60ed84e949de3db08e7886c3']
>
> So my question now is what would be the right way to do this?

bin/dulwich just calls out to client.fetch() too, like your code does.
The problem is that fetch doesn't use add_thin_pack but add_pack which
can't handle thin packs and doesn't resolve external references.

Arguably this is a major bug in GitClient.fetch(). bzr-git doesn't use
GitClient.fetch(), which is why we haven't really noticed this before.
I'll bump the priority of this bug.

Cheers,

Jelmer

Jelmer Vernooij (jelmer) on 2012-10-03
Changed in dulwich:
status: Incomplete → Triaged
summary: - KeyError when using git+https in hg-git
+ Client.fetch() doesn't support thin packs
Changed in dulwich:
importance: Undecided → High
dom (dominikruf) wrote :

It's been a while but I still have this problem.
I looked into bzr-git/fetch.py and experimented a bit
I tried to following which is based on the fetch_objects method of fetch.py

from dulwich.repo import Repo
from dulwich.client import HttpGitClient
from dulwich.repo import Repo

ref_changes = {}
def determine_wants(heads):
    old_refs = dict([(k, (v, None)) for (k, v) in heads.iteritems()])
    new_refs = old_refs
    ref_changes.update(new_refs)
    return [sha1 for (sha1, bzr_revid) in new_refs.itervalues()]

def process(text):
    print text

repo = Repo('django-csp')
url = 'https://github.com/fmarier/django-csp.git'
client = HttpGitClient(url)

localrepo = Repo('django-csp\\.git')
graphwalker = localrepo.get_graph_walker()
f, commit = localrepo.object_store.add_pack()
refs = client.fetch_pack(url,
                        determine_wants, graphwalker, f.write,
                        process)
commit()

But I still get the same error(s).

Can somebody tell my what's the right/a working way to fetch remote packs and commit them to the local repository?

On Thu, Dec 06, 2012 at 02:34:39PM -0000, dom wrote:
> It's been a while but I still have this problem.
> I looked into bzr-git/fetch.py and experimented a bit
> I tried to following which is based on the fetch_objects method of fetch.py
>
> from dulwich.repo import Repo
> from dulwich.client import HttpGitClient
> from dulwich.repo import Repo
>
> ref_changes = {}
> def determine_wants(heads):
> old_refs = dict([(k, (v, None)) for (k, v) in heads.iteritems()])
> new_refs = old_refs
> ref_changes.update(new_refs)
> return [sha1 for (sha1, bzr_revid) in new_refs.itervalues()]
>
> def process(text):
> print text
>
> repo = Repo('django-csp')
> url = 'https://github.com/fmarier/django-csp.git'
> client = HttpGitClient(url)
>
> localrepo = Repo('django-csp\\.git')
> graphwalker = localrepo.get_graph_walker()
> f, commit = localrepo.object_store.add_pack()
> refs = client.fetch_pack(url,
> determine_wants, graphwalker, f.write,
> process)
> commit()
>
> But I still get the same error(s).
>
> Can somebody tell my what's the right/a working way to fetch remote
> packs and commit them to the local repository?
This code still assumes full packs. You'll want to use
localrepo.object_store.add_thin_pack rather than
localrepo.object_store.add_pack.

A trivial way to do it would be to create a temporary file/StringIO,
have fetch_pack write into that and then pass the read method on that
object to add_thin_pack.

Cheers,

Jelmer

dom (dominikruf) wrote :

Eureka finally it works

from dulwich.repo import Repo
from dulwich.client import HttpGitClient
from dulwich.repo import Repo

ref_changes = {}
def determine_wants(heads):
    old_refs = dict([(k, (v, None)) for (k, v) in heads.iteritems()])
    #new_refs = update_refs(old_refs)
    new_refs = old_refs
    ref_changes.update(new_refs)
    return [sha1 for (sha1, bzr_revid) in new_refs.itervalues()]

def process(text):
    print text

repo = Repo('django-csp')
url = 'https://github.com/fmarier/django-csp.git'
client = HttpGitClient(url)

localrepo = Repo('django-csp\\.git')
graphwalker = localrepo.get_graph_walker()
from StringIO import StringIO
f = StringIO()
refs = client.fetch_pack(url,
                        determine_wants, graphwalker, f.write,
                        process)
f.seek(0)
po = localrepo.object_store.add_thin_pack(f.read, None)

and with the following patch hg-git also works

https://bitbucket.org/domruf/hg-git/commits/47df57f2bb2b7a6fb2994f6f2b5fdf44d30aafd1

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers