Client.fetch() doesn't support thin packs

Bug #1025886 reported by Jody McIntyre
50
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Dulwich
Triaged
High
Unassigned

Bug Description

I'm getting a KeyError when attempting to use hg-git to pull changes from two repos:

scjody@ailuropoda:~$ hg init foo
scjody@ailuropoda:~$ cd foo
scjody@ailuropoda:~/foo$ hg pull git+https://github.com/mozilla/django-csp.git
pulling from git+https://github.com/mozilla/django-csp.git
importing git objects into hg
(run 'hg heads' to see heads)
scjody@ailuropoda:~/foo$ hg pull git+https://github.com/fmarier/django-csp.git
pulling from git+https://github.com/fmarier/django-csp.git
** unknown exception encountered, please report by visiting
** http://mercurial.selenic.com/wiki/BugTracker
** Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) [GCC 4.4.5]
** Mercurial Distributed SCM (version 2.1)
** Extensions loaded: color, children, fetch, graphlog, hgk, mq, purge, rebase, record, transplant, convert, kiln.bigpush, kiln.caseguard, kiln.gestalt, kiln.kilnauth, kiln.kilnpath, pager, hggit, extdiff, churn
Traceback (most recent call last):
  File "/usr/bin/hg", line 38, in <module>
    mercurial.dispatch.run()
  File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 27, in run
    sys.exit((dispatch(request(sys.argv[1:])) or 0) & 255)
  File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 64, in dispatch
    return _runcatch(req)
  File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 87, in _runcatch
    return _dispatch(req)
  File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 683, in _dispatch
    cmdpats, cmdoptions)
  File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 465, in runcommand
    ret = _runcommand(ui, options, cmd, d)
  File "/usr/lib/pymodules/python2.6/mercurial/extensions.py", line 184, in wrap
    return wrapper(origfn, *args, **kwargs)
  File "/usr/lib/pymodules/python2.6/hgext/pager.py", line 107, in pagecmd
    return orig(ui, options, cmd, cmdfunc)
  File "/usr/lib/pymodules/python2.6/mercurial/extensions.py", line 184, in wrap
    return wrapper(origfn, *args, **kwargs)
  File "/usr/lib/pymodules/python2.6/hgext/color.py", line 362, in colorcmd
    return orig(ui_, opts, cmd, cmdfunc)
  File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 737, in _runcommand
    return checkargs()
  File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 691, in checkargs
    return cmdfunc()
  File "/usr/lib/pymodules/python2.6/mercurial/dispatch.py", line 680, in <lambda>
    d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
  File "/usr/lib/pymodules/python2.6/mercurial/util.py", line 456, in check
    return func(*args, **kwargs)
  File "/usr/lib/pymodules/python2.6/mercurial/extensions.py", line 139, in wrap
    util.checksignature(origfn), *args, **kwargs)
  File "/usr/lib/pymodules/python2.6/mercurial/util.py", line 456, in check
    return func(*args, **kwargs)
  File "/usr/lib/pymodules/python2.6/hgext/rebase.py", line 652, in pullrebase
    orig(ui, repo, *args, **opts)
  File "/usr/lib/pymodules/python2.6/mercurial/util.py", line 456, in check
    return func(*args, **kwargs)
  File "/usr/lib/pymodules/python2.6/mercurial/extensions.py", line 139, in wrap
    util.checksignature(origfn), *args, **kwargs)
  File "/usr/lib/pymodules/python2.6/mercurial/util.py", line 456, in check
    return func(*args, **kwargs)
  File "/usr/lib/pymodules/python2.6/hgext/mq.py", line 3325, in mqcommand
    return orig(ui, repo, *args, **kwargs)
  File "/usr/lib/pymodules/python2.6/mercurial/util.py", line 456, in check
    return func(*args, **kwargs)
  File "/usr/lib/pymodules/python2.6/mercurial/commands.py", line 4340, in pull
    modheads = repo.pull(other, heads=revs, force=opts.get('force'))
  File "/home/scjody/pwbank/src/hg-git/hggit/hgrepo.py", line 14, in pull
    return git.fetch(remote.path, heads)
  File "/home/scjody/pwbank/src/hg-git/hggit/git_handler.py", line 154, in fetch
    refs = self.fetch_pack(remote, heads)
  File "/home/scjody/pwbank/src/hg-git/hggit/git_handler.py", line 821, in fetch_pack
    commit()
  File "/usr/lib/pymodules/python2.6/dulwich/object_store.py", line 568, in commit
    return self.move_in_pack(path)
  File "/usr/lib/pymodules/python2.6/dulwich/object_store.py", line 542, in move_in_pack
    entries = p.sorted_entries()
  File "/usr/lib/pymodules/python2.6/dulwich/pack.py", line 1098, in sorted_entries
    ret = list(self.iterentries(progress=progress))
  File "/usr/lib/pymodules/python2.6/dulwich/pack.py", line 1086, in iterentries
    for i, result in enumerate(PackIndexer.for_pack_data(self)):
  File "/usr/lib/pymodules/python2.6/dulwich/pack.py", line 1233, in _walk_all_chains
    for result in self._walk_ref_chains():
  File "/usr/lib/pymodules/python2.6/dulwich/pack.py", line 1243, in _walk_ref_chains
    self._ensure_no_pending()
  File "/usr/lib/pymodules/python2.6/dulwich/pack.py", line 1239, in _ensure_no_pending
    raise KeyError([sha_to_hex(s) for s in self._pending_ref])
KeyError: ['f69f776453ccec1a9f9bdc5bfaa49fd0e375f9b3', '2f01f206a41522c7e57ebce050b70dbecd93dd7a', 'a8c54dd23c7d444774aec1404bbaa326638aff1d', '8069b7ba8e8b8bfd60ed84e949de3db08e7886c3']
scjody@ailuropoda:~/foo$

As a workaround, the problem goes away if I use git:// instead of git+https:// in the second URL.

I'm using an up to date clone of hg-git with Dulwich 0.8.1 (I've confirmed it still happens with an up to date clone of Dulwich).

Tags: hg-git https
Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 1025886] [NEW] KeyError when using git+https in hg-git

Hi,

Thanks for the bugreport.

On Tue, Jul 17, 2012 at 09:02:26PM -0000, Jody McIntyre wrote:
> I'm getting a KeyError when attempting to use hg-git to pull changes
> from two repos:
>
> scjody@ailuropoda:~$ hg init foo
> scjody@ailuropoda:~$ cd foo
> scjody@ailuropoda:~/foo$ hg pull git+https://github.com/mozilla/django-csp.git
> pulling from git+https://github.com/mozilla/django-csp.git
[...]
>
> I'm using an up to date clone of hg-git with Dulwich 0.8.1 (I've
> confirmed it still happens with an up to date clone of Dulwich).

Can you reproduce this problem with plain dulwich ? dulwich itself
seems to happily clone this repository.

$ dulwich clone https://github.com/mozilla/django-csp.git
$

  affects dulwich
  status incomplete

Cheers,

Jelmer

Changed in dulwich:
status: New → Incomplete
Revision history for this message
Jody McIntyre (scjody) wrote :

On Tue, Jul 17, 2012 at 5:15 PM, Jelmer Vernooij <<email address hidden>
> wrote:

> Can you reproduce this problem with plain dulwich ? dulwich itself
> seems to happily clone this repository.
>

hg-git is happy to clone either repository too. The issue occurs when I
try to pull the "fmarier" repository into a local repository already
containing the "mozilla" repository. I don't know how to do that with
Dulwich.

scjody@ailuropoda:~$ hg clone git+https://github.com/mozilla/django-csp.git
destination directory: django-csp
importing git objects into hg
updating to branch default
20 files updated, 0 files merged, 0 files removed, 0 files unresolved
scjody@ailuropoda:~$ cd django-csp
scjody@ailuropoda:~/django-csp$ hg pull git+
https://github.com/fmarier/django-csp.git
pulling from git+https://github.com/fmarier/django-csp.git
** unknown exception encountered, please report by visiting
** http://mercurial.selenic.com/wiki/BugTracker
...

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

On Tue, Jul 17, 2012 at 09:25:56PM -0000, Jody McIntyre wrote:
> On Tue, Jul 17, 2012 at 5:15 PM, Jelmer Vernooij <<email address hidden>
> > wrote:
>
> > Can you reproduce this problem with plain dulwich ? dulwich itself
> > seems to happily clone this repository.
> >
>
> hg-git is happy to clone either repository too. The issue occurs when I
> try to pull the "fmarier" repository into a local repository already
> containing the "mozilla" repository. I don't know how to do that with
> Dulwich.
>
> scjody@ailuropoda:~$ hg clone git+https://github.com/mozilla/django-csp.git
> destination directory: django-csp
> importing git objects into hg
> updating to branch default
> 20 files updated, 0 files merged, 0 files removed, 0 files unresolved
> scjody@ailuropoda:~$ cd django-csp
> scjody@ailuropoda:~/django-csp$ hg pull git+
> https://github.com/fmarier/django-csp.git
> pulling from git+https://github.com/fmarier/django-csp.git
> ** unknown exception encountered, please report by visiting
> ** http://mercurial.selenic.com/wiki/BugTracker
> ...
Ah, thanks - I can reproduce that using dulwich too ("dulwich fetch
...")

Cheers,

Jelmer

Revision history for this message
Volodymyr Kostyrko (c-kworr) wrote : Re: KeyError when using git+https in hg-git

I'm getting this one too. Tracing the sources also gives this testcase:

https://gist.github.com/351435

The fix is incorrect after http://git.samba.org/?p=jelmer/dulwich.git;a=commit;h=1859500e97ac1090bb3cccdebf0c4d7073292002

Revision history for this message
Volodymyr Kostyrko (c-kworr) wrote :

This one is a dup of #783456 I think.

Revision history for this message
Bob Halley (rthalley) wrote :

FWIW, I worked around this by removing the "thin-pack" capability from the client, e.g.

client, path = get_transport_and_path(args.pop(0))
client._fetch_capabilities.remove('thin-pack')

Clearly this isn't a fix for whatever's going wrong receiving a thin pack, but it seems to be an effective workaround.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Is hg-git adding the rest of the repository as a fallback for the thin pack that is received? It sounds like it isn't.

Revision history for this message
dom (dominikruf) wrote :

I tried to add client._fetch_capabilities.remove('thin-pack') to git_handler.py
but this only leads to

KeyError: 'thin-pack'

So Bobs workaround doesn't work for me :-(

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

I'm inclined to close this as it seems to be a hg-git bug; it should either provide context *or* disable thin packs. Can somebody confirm that?

Revision history for this message
dom (dominikruf) wrote :

I don't know much about dulwich or hg-git but as I wrote before I was not able to disable thin packs the way bob did.
I just looked into the git_handler.py and found the following line
return client.HttpGitClient(uri, thin_packs=False), uri

So to me it seems that thin packs are already disabled.

What kind of context do you need?

Revision history for this message
dom (dominikruf) wrote :

since you can reproduce this with pure dulwich I don't think this is a hg-git problem

$ dulwich clone https://github.com/mozilla/django-csp.git
$ cd django-csp.git/
$ dulwich fetch https://github.com/fmarier/django-csp.git
...
    raise KeyError([sha_to_hex(s) for s in self._pending_ref])
KeyError: ['f69f776453ccec1a9f9bdc5bfaa49fd0e375f9b3', 'a8c54dd23c7d444774aec140
4bbaa326638aff1d']

Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 1025886] Re: KeyError when using git+https in hg-git

Am Monday, den 24.09.2012, 13:22 +0000 schrieb dom:
> since you can reproduce this with pure dulwich I don't think this is a
> hg-git problem
>
> $ dulwich clone https://github.com/mozilla/django-csp.git
> $ cd django-csp.git/
> $ dulwich fetch https://github.com/fmarier/django-csp.git
> ...
> raise KeyError([sha_to_hex(s) for s in self._pending_ref])
> KeyError: ['f69f776453ccec1a9f9bdc5bfaa49fd0e375f9b3', 'a8c54dd23c7d444774aec140
> 4bbaa326638aff1d']
Actually, the example dulwich script doesn't handle thin packs correctly
either (and doesn't disable them). It just assumes it only receives full
packs.

Cheers,

Jelmer

Revision history for this message
dom (dominikruf) wrote : Re: KeyError when using git+https in hg-git

OK I looked into the tutorials at http://www.samba.org/~jelmer/dulwich/docs/tutorial/index.html and tried to understand how all this works. Sadly my simple example lead to the same error.

What I tried is I cloned https://github.com/mozilla/django-csp.git

git clone https://github.com/mozilla/django-csp.git

then I started python and did the following

from dulwich.repo import Repo
from dulwich.client import HttpGitClient
from dulwich.repo import Repo

repo = Repo('django-csp')
client = HttpGitClient('https://github.com/fmarier/django-csp.git')
client.fetch('https://github.com/fmarier/django-csp.git', repo)

which leads to the well known Exception
...
C:\tools\Python27\lib\site-packages\dulwich-0.8.5-py2.7-win32.egg\dulwich\pack.p
yc in _ensure_no_pending(self)
   1242 def _ensure_no_pending(self):
   1243 if self._pending_ref:
-> 1244 raise KeyError([sha_to_hex(s) for s in self._pending_ref])
   1245
   1246 def _walk_ref_chains(self):

KeyError: ['f69f776453ccec1a9f9bdc5bfaa49fd0e375f9b3', '2f01f206a41522c7e57ebce0
50b70dbecd93dd7a', 'a8c54dd23c7d444774aec1404bbaa326638aff1d', '8069b7ba8e8b8bfd
60ed84e949de3db08e7886c3']

So my question now is what would be the right way to do this?

cheers
Dominik

Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 1025886] Re: KeyError when using git+https in hg-git

On Thu, 2012-09-27 at 08:57 +0000, dom wrote:
> OK I looked into the tutorials at
> http://www.samba.org/~jelmer/dulwich/docs/tutorial/index.html and tried
> to understand how all this works. Sadly my simple example lead to the
> same error.
>
> What I tried is I cloned https://github.com/mozilla/django-csp.git
>
> git clone https://github.com/mozilla/django-csp.git
>
> then I started python and did the following
>
> from dulwich.repo import Repo
> from dulwich.client import HttpGitClient
> from dulwich.repo import Repo
>
> repo = Repo('django-csp')
> client = HttpGitClient('https://github.com/fmarier/django-csp.git')
> client.fetch('https://github.com/fmarier/django-csp.git', repo)
>
> which leads to the well known Exception
> ...
> C:\tools\Python27\lib\site-packages\dulwich-0.8.5-py2.7-win32.egg\dulwich\pack.p
> yc in _ensure_no_pending(self)
> 1242 def _ensure_no_pending(self):
> 1243 if self._pending_ref:
> -> 1244 raise KeyError([sha_to_hex(s) for s in self._pending_ref])
> 1245
> 1246 def _walk_ref_chains(self):
>
> KeyError: ['f69f776453ccec1a9f9bdc5bfaa49fd0e375f9b3', '2f01f206a41522c7e57ebce0
> 50b70dbecd93dd7a', 'a8c54dd23c7d444774aec1404bbaa326638aff1d', '8069b7ba8e8b8bfd
> 60ed84e949de3db08e7886c3']
>
> So my question now is what would be the right way to do this?

bin/dulwich just calls out to client.fetch() too, like your code does.
The problem is that fetch doesn't use add_thin_pack but add_pack which
can't handle thin packs and doesn't resolve external references.

Arguably this is a major bug in GitClient.fetch(). bzr-git doesn't use
GitClient.fetch(), which is why we haven't really noticed this before.
I'll bump the priority of this bug.

Cheers,

Jelmer

Jelmer Vernooij (jelmer)
Changed in dulwich:
status: Incomplete → Triaged
summary: - KeyError when using git+https in hg-git
+ Client.fetch() doesn't support thin packs
Changed in dulwich:
importance: Undecided → High
Revision history for this message
dom (dominikruf) wrote :

It's been a while but I still have this problem.
I looked into bzr-git/fetch.py and experimented a bit
I tried to following which is based on the fetch_objects method of fetch.py

from dulwich.repo import Repo
from dulwich.client import HttpGitClient
from dulwich.repo import Repo

ref_changes = {}
def determine_wants(heads):
    old_refs = dict([(k, (v, None)) for (k, v) in heads.iteritems()])
    new_refs = old_refs
    ref_changes.update(new_refs)
    return [sha1 for (sha1, bzr_revid) in new_refs.itervalues()]

def process(text):
    print text

repo = Repo('django-csp')
url = 'https://github.com/fmarier/django-csp.git'
client = HttpGitClient(url)

localrepo = Repo('django-csp\\.git')
graphwalker = localrepo.get_graph_walker()
f, commit = localrepo.object_store.add_pack()
refs = client.fetch_pack(url,
                        determine_wants, graphwalker, f.write,
                        process)
commit()

But I still get the same error(s).

Can somebody tell my what's the right/a working way to fetch remote packs and commit them to the local repository?

Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 1025886] Re: Client.fetch() doesn't support thin packs

On Thu, Dec 06, 2012 at 02:34:39PM -0000, dom wrote:
> It's been a while but I still have this problem.
> I looked into bzr-git/fetch.py and experimented a bit
> I tried to following which is based on the fetch_objects method of fetch.py
>
> from dulwich.repo import Repo
> from dulwich.client import HttpGitClient
> from dulwich.repo import Repo
>
> ref_changes = {}
> def determine_wants(heads):
> old_refs = dict([(k, (v, None)) for (k, v) in heads.iteritems()])
> new_refs = old_refs
> ref_changes.update(new_refs)
> return [sha1 for (sha1, bzr_revid) in new_refs.itervalues()]
>
> def process(text):
> print text
>
> repo = Repo('django-csp')
> url = 'https://github.com/fmarier/django-csp.git'
> client = HttpGitClient(url)
>
> localrepo = Repo('django-csp\\.git')
> graphwalker = localrepo.get_graph_walker()
> f, commit = localrepo.object_store.add_pack()
> refs = client.fetch_pack(url,
> determine_wants, graphwalker, f.write,
> process)
> commit()
>
> But I still get the same error(s).
>
> Can somebody tell my what's the right/a working way to fetch remote
> packs and commit them to the local repository?
This code still assumes full packs. You'll want to use
localrepo.object_store.add_thin_pack rather than
localrepo.object_store.add_pack.

A trivial way to do it would be to create a temporary file/StringIO,
have fetch_pack write into that and then pass the read method on that
object to add_thin_pack.

Cheers,

Jelmer

Revision history for this message
dom (dominikruf) wrote :

Eureka finally it works

from dulwich.repo import Repo
from dulwich.client import HttpGitClient
from dulwich.repo import Repo

ref_changes = {}
def determine_wants(heads):
    old_refs = dict([(k, (v, None)) for (k, v) in heads.iteritems()])
    #new_refs = update_refs(old_refs)
    new_refs = old_refs
    ref_changes.update(new_refs)
    return [sha1 for (sha1, bzr_revid) in new_refs.itervalues()]

def process(text):
    print text

repo = Repo('django-csp')
url = 'https://github.com/fmarier/django-csp.git'
client = HttpGitClient(url)

localrepo = Repo('django-csp\\.git')
graphwalker = localrepo.get_graph_walker()
from StringIO import StringIO
f = StringIO()
refs = client.fetch_pack(url,
                        determine_wants, graphwalker, f.write,
                        process)
f.seek(0)
po = localrepo.object_store.add_thin_pack(f.read, None)

and with the following patch hg-git also works

https://bitbucket.org/domruf/hg-git/commits/47df57f2bb2b7a6fb2994f6f2b5fdf44d30aafd1

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.