git object sharing is suboptimal in Launchpad

Bug #1661600 reported by Robie Basak on 2017-02-03
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Undecided
Unassigned
usd-importer
High
Nish Aravamudan

Bug Description

If the git importer pushes to lp:~IMPORTER/ubuntu/+source/package, and an uploader subsequently pushes to lp:~UPLOADER/ubuntu/+source/package, these two repositories will not share the underlying storage.

This means that:

1) Launchpad's storage needs for git-based workflows will explode as more uploaders start using the system.

2) Uploaders won't get their upload bandwidth optimised when they push.

This isn't a hard blocker for progress, but we should fix it before many people start using the new git workflow, as there isn't a particularly reasonable way to fix object sharing on existing repositories.

Background: https://answers.launchpad.net/launchpad/+question/426641, https://irclogs.ubuntu.com/2017/02/03/%23launchpad.html#t12:02

I understand that the Launchpad team doesn't currently expect to have the resources to implement this; the team working on the importer will need to supply the code.

Related branches

Robie Basak (racb) on 2017-02-03
summary: - git object sharing is suboptimal
+ git object sharing is suboptimal in Launchpad
Changed in usd-importer:
status: New → Triaged
importance: Undecided → High
Colin Watson (cjwatson) wrote :

IMO this is only a problem because we haven't yet taken a project-wide decision to bless the importer branches as the default repositories for their corresponding source packages (i.e. lp:ubuntu/+source/PACKAGE). Once that happens, object sharing will automatically be set up for newly-pushed repositories.

On Tue, Jul 25, 2017 at 10:43:19AM -0000, Colin Watson wrote:
> IMO this is only a problem because we haven't yet taken a project-wide
> decision to bless the importer branches as the default repositories for
> their corresponding source packages (i.e. lp:ubuntu/+source/PACKAGE).
> Once that happens, object sharing will automatically be set up for
> newly-pushed repositories.

FTR, we probably don't want to do that until the importer team declares
"commit hash stability" (IOW, a branch fast-forwarding guarantee). So
this may merely become an ordering issue. We may not want to do this
until we've seen wider use to gain confidence, and may not want wider
use until we've got object sharing sorted :)

An example is how we're handling empty directories in source packages
(bug 1687057). We discovered that problem only after some level of
increased use, and the current plan (implemented in the importer) is to
put them into the git commit DAG even though the git porcelain doesn't
yet support them. This makes all previously imported commits of sources
with empty directories "wrong", which makes us want to re-import and
break fast-forwarding. Ideally we'd avoid declaring commit hash
stability until upstream have accepted empty directory support in git
porcelain (yet to be written).

There may be other issues like this one that we have yet to discover.

At some point we'll have to draw a line I expect, and accept any
historical errors in the importer in already import commits after that
point. But it'd be nice to get wider testing before we do, so I'm not
sure where that sits with the lack of git object sharing with the
repositories as they are at the moment.

Colin Watson (cjwatson) wrote :

Yes, I see that point.

In theory it would be possible to designate a repository as the default baseline for object sharing even though it's not the default for name lookup, but it would be quite a lot of extra configuration complexity and so I think it's probably not a good idea.

We've discussed having an explicit fork action in the past, which would give us explicit knowledge of the repository to object-share with rather than having to guess, and would let us tell users what URL to push to rather than relying on them knowing how Launchpad's Git URL namespace is laid out. I'm not opposed to that, but it would be an extra chunk of work.

Robie Basak (racb) wrote :

Could we perhaps put the importer branches in their "official" places, but make it clear that they are experimental and non-fast-forwarding until we declare hash stability at some future 1.0 date? I'm not sure I'm keen on this, but it may be the least worst option.

Nish Aravamudan (nacc) on 2017-08-02
Changed in usd-importer:
milestone: none → 1.0
Nish Aravamudan (nacc) on 2017-10-23
Changed in usd-importer:
assignee: nobody → Robie Basak (racb)
Nish Aravamudan (nacc) wrote :

@racb: do you have a spec of the default aliases (even pseudocode, or what is necessary where)? I can work on implementing it, if so.

Nish Aravamudan (nacc) on 2017-10-31
Changed in usd-importer:
assignee: Robie Basak (racb) → Nish Aravamudan (nacc)
Nish Aravamudan (nacc) on 2017-11-03
Changed in usd-importer:
status: Triaged → In Progress
Nish Aravamudan (nacc) on 2017-11-16
Changed in usd-importer:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers