Importer clobbers existing directory before importing

Bug #1700946 reported by Robie Basak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
git-ubuntu
Fix Released
Medium
Unassigned

Bug Description

My use case: investigating an importer failure

git ubuntu import -v -d texinfo texinfo
# fails as expected
git ubuntu import -v -d texinfo texinfo

Expected behaviour: resume import from where it left off, and fail almost immediately.
Actual behaviour: reimport everything from scratch and then fail.

Perhaps my expectations are mismatched with some other use cases (such as always trusting pkg). But I also tried --no-fetch, and it also clobbers in that case AFAICT.

Previously I think we concluded that pkg should be favoured. I now think that if the local branches are strictly fast forward of them, or if pkg doesn't exist, then perhaps we should perhaps resume from the current state, as used to happen before the restructuring.

There may be cases I haven't considered which means that this doesn't make sense. If so, I'd at least like an option for it to not clobber.

Another use case that a fix for this might support is maintaining some kind of forked importer and a mutated local imported branch for development purposes.

Revision history for this message
Nish Aravamudan (nacc) wrote :

Relevent bit of spec:

git fetch (according to `git ubuntu clone` refspecs)
# do we want a flag to prevent the following?
delete refs/heads/importer/*
delete refs/tags/importer/*
copy refs/remotes/pkg/* to refs/heads/importer/{debian,ubuntu}/*
fetch refs/tags/* to refs/tags/importer/* # in case the local tags have been manipulated

Relevent bit of code (importer.py):

        self.local_repo.add_base_remotes(self.pkgname, repo_owner=owner)
        if not args.no_fetch:
            self.local_repo.fetch_base_remotes(must_exist=False)

        self.local_repo.delete_branches_in_namespace(self.namespace)
        self.local_repo.delete_tags_in_namespace(self.namespace)

        self.local_repo.copy_base_references(self.namespace)
        if not args.no_fetch:
            self.local_repo.fetch_remote_refspecs('pkg',
                refspecs=['refs/tags/*:refs/tags/%s/*' % self.namespace],
                must_exist=False)

I think the base reference copy must be failing. I wonder if it's not copying the tags correctly in that case.

Revision history for this message
Robie Basak (racb) wrote : Re: [Bug 1700946] Re: Importer clobbers existing directory before importing

Thank you for isolating the relevant bits.

On Wed, Jun 28, 2017 at 03:16:32PM -0000, Nish Aravamudan wrote:
> Relevent bit of spec:
>
> git fetch (according to `git ubuntu clone` refspecs)
> # do we want a flag to prevent the following?
> delete refs/heads/importer/*
> delete refs/tags/importer/*
> copy refs/remotes/pkg/* to refs/heads/importer/{debian,ubuntu}/*
> fetch refs/tags/* to refs/tags/importer/* # in case the local tags have been manipulated
>
> Relevent bit of code (importer.py):
>
> self.local_repo.add_base_remotes(self.pkgname, repo_owner=owner)
> if not args.no_fetch:
> self.local_repo.fetch_base_remotes(must_exist=False)
>
> self.local_repo.delete_branches_in_namespace(self.namespace)
> self.local_repo.delete_tags_in_namespace(self.namespace)

My expectation this morning was that these two lines would be indented
one more level (ie. be excluded by --no-fetch). Our spec isn't clear.
I'm not sure that it makes sense to delete when not fetching, since the
purpose was to have pkg override everything in importer.

Revision history for this message
Nish Aravamudan (nacc) wrote :

On Wed, Jun 28, 2017 at 8:33 AM, Robie Basak <email address hidden> wrote:
> Thank you for isolating the relevant bits.
>
> On Wed, Jun 28, 2017 at 03:16:32PM -0000, Nish Aravamudan wrote:
>> Relevent bit of spec:
>>
>> git fetch (according to `git ubuntu clone` refspecs)
>> # do we want a flag to prevent the following?
>> delete refs/heads/importer/*
>> delete refs/tags/importer/*
>> copy refs/remotes/pkg/* to refs/heads/importer/{debian,ubuntu}/*
>> fetch refs/tags/* to refs/tags/importer/* # in case the local tags have been manipulated
>>
>> Relevent bit of code (importer.py):
>>
>> self.local_repo.add_base_remotes(self.pkgname, repo_owner=owner)
>> if not args.no_fetch:
>> self.local_repo.fetch_base_remotes(must_exist=False)
>>
>> self.local_repo.delete_branches_in_namespace(self.namespace)
>> self.local_repo.delete_tags_in_namespace(self.namespace)
>
> My expectation this morning was that these two lines would be indented
> one more level (ie. be excluded by --no-fetch). Our spec isn't clear.
> I'm not sure that it makes sense to delete when not fetching, since the
> purpose was to have pkg override everything in importer.

Right, the idea behind these two lines is to remove any user-defined
entries in refs/{heads,tags}/$namespace/ (e.g., refs/heads/importer)
as we need to 'trust' that data in order to proceed. I wonder if
that's too defensive, though? I don't see a clear way to distinguish
between --no-fetch meaning --catch-up-using-an-existing-directory and
--no-fetch meaning --start-over. I guess we can look at the use of
--directory and if it's already a git repository (as opposed to one we
create), but that seems fragile.

Does it seem reasonable (for now) to just drop the
delete_*_in_namespace calls and definitions? Worst case, an offline
user gets a non-matching import, but that would only happen if they
intentional manipulated something in the "reserved" importer/
namespace?

-Nish

Revision history for this message
Nish Aravamudan (nacc) wrote :

On Wed, Jun 28, 2017 at 3:46 PM, Nish Aravamudan
<email address hidden> wrote:
> On Wed, Jun 28, 2017 at 8:33 AM, Robie Basak <email address hidden> wrote:
>> Thank you for isolating the relevant bits.
>>
>> On Wed, Jun 28, 2017 at 03:16:32PM -0000, Nish Aravamudan wrote:
>>> Relevent bit of spec:
>>>
>>> git fetch (according to `git ubuntu clone` refspecs)
>>> # do we want a flag to prevent the following?
>>> delete refs/heads/importer/*
>>> delete refs/tags/importer/*
>>> copy refs/remotes/pkg/* to refs/heads/importer/{debian,ubuntu}/*
>>> fetch refs/tags/* to refs/tags/importer/* # in case the local tags have been manipulated
>>>
>>> Relevent bit of code (importer.py):
>>>
>>> self.local_repo.add_base_remotes(self.pkgname, repo_owner=owner)
>>> if not args.no_fetch:
>>> self.local_repo.fetch_base_remotes(must_exist=False)
>>>
>>> self.local_repo.delete_branches_in_namespace(self.namespace)
>>> self.local_repo.delete_tags_in_namespace(self.namespace)
>>
>> My expectation this morning was that these two lines would be indented
>> one more level (ie. be excluded by --no-fetch). Our spec isn't clear.
>> I'm not sure that it makes sense to delete when not fetching, since the
>> purpose was to have pkg override everything in importer.
>
> Right, the idea behind these two lines is to remove any user-defined
> entries in refs/{heads,tags}/$namespace/ (e.g., refs/heads/importer)
> as we need to 'trust' that data in order to proceed. I wonder if
> that's too defensive, though? I don't see a clear way to distinguish
> between --no-fetch meaning --catch-up-using-an-existing-directory and
> --no-fetch meaning --start-over. I guess we can look at the use of
> --directory and if it's already a git repository (as opposed to one we
> create), but that seems fragile.
>
> Does it seem reasonable (for now) to just drop the
> delete_*_in_namespace calls and definitions? Worst case, an offline
> user gets a non-matching import, but that would only happen if they
> intentional manipulated something in the "reserved" importer/
> namespace?

I think there might also be a bug (I'm not sure if they are explicitly
used by the importer itself, but it's something we should clarify in
the code) -- copy_base_references doesn't copy the refs/tags/pkg tags
to refs/tags/<namespace>.

Robie Basak (racb)
tags: added: local-importer-ux
Changed in usd-importer:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 1.0
Nish Aravamudan (nacc)
Changed in usd-importer:
status: Triaged → In Progress
Nish Aravamudan (nacc)
Changed in usd-importer:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.