Merge detection times out during ref scans in repositories with deep history

Bug #1955040 reported by Andy Whitcroft
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Colin Watson
turnip
Fix Released
High
Colin Watson

Bug Description

When there is a lot of references and there is an open merge-proposal against the repository the git_repository refs_collection may end up not matching the output of git ls-remote; some of the refs are absent.

===

apw
13:41
@cristiangsp hey ... i have a git repository for which a git ls-remote --refs shows we have 340 refs in the repository, but the repo.refs iterator for the launchpad API element only has 217.

apw
13:41
git://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic

13:42
i assume this is not to be expected.

cristiangsp
14:06
checking what exactly repo.refs is returning

apw
14:07
looking at the Ubuntu-4.15 prefix tags it seems to stop at -66.75. Whereas a git ls-remote on it they go up to -159.167.

apw
14:14
i will note that the git ls-remote output is 26K, if i truncate it at 16K it is 221 entries, really close to the 217 that is reported in launchpad.

cristiangsp
14:32
@cjwatson do you think this could be related with some issue with the GitRefScanJob?

cjwatson
14:32
I guess it must be but I don't know

cristiangsp
14:32
ok, checking logs

apw
14:32
i have not requested any kind of rescan, is that a thing?

cjwatson
14:32
You shouldn't need to

14:33
Hm, though https://code.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic shows a failed scan in fact

14:33
So maybe try poking that before looking at anything else

apw
14:34
should i hit the rescan button, or is "poke" here "look into" and intended for cristian

cjwatson
14:35
Probably look into in case it's reliably timing out

14:35
It'll be in ackee's celeryd-production_launchpad_job.log

cristiangsp
14:35
yep already there

apw
14:36
i do know how to torture git.l.n. i am glad we don't have everything in the one repo, that is for sure.

cristiangsp
14:44
I have got the OOPS:

[2021-09-22 10:49:02,278: INFO/ForkPoolWorker-3] Job resulted in OOPS: OOPS-05b70d0993a73f4ce305655ff3e46e36
[2021-09-22 10:49:02,280: INFO/ForkPoolWorker-3] Task lp.services.job.celeryjob.celery_run_job_ignore_result[GitRefScanJob_66227954_e7916523-a8dc-4f6a-b643-756b838236b2] succeeded in 33.15658360719681s: None

cjwatson
14:45
Looks like a timeout in merge detection

cristiangsp
14:45
https://oops.canonical.com/?oopsid=05b70d0993a73f4ce305655ff3e46e36

14:45
is not loading :slightly_frowning_face:

cjwatson
14:45
It loads eventually, just very big and slow

cristiangsp
14:45
oh ok

cjwatson
14:46
Shows >31s in a /repo/17131/detect-merges/7d376fbce68bd07f5ba0c41ba41dfd9d3076538b call to turnip which then 502s (probably haproxy or gunicorn timeout)

14:47
We should probably start by finding out (somehow) how long that code actually takes if allowed to finish, and seeing whether we should be looking at optimizing it or bumping timeouts

cristiangsp
14:48
ok, will create an issue with all these details

cjwatson
14:49
Detecting merges is fairly tricky; the implementation goes to some effort to avoid having to scan the entire history, but maybe something is going wrong there

cristiangsp
14:50
how much this is blocking you @apw?

apw
14:51
it is currently blocking my processing, but i likely can switch back to the old code.

cjwatson
14:51
Should be possible to analyse it by figuring out what the MP in question is (I'm fairly sure it's https://code.launchpad.net/~andyliuliming/ubuntu/+source/linux/+git/bionic/+merge/351395), cloning the source and target repositories, and running the detect_merges code on it manually

apw
14:52
ironic as we don't use merges pretty much :slightly_smiling_face:

cjwatson
14:52
Right, but it still needs to check

14:53
If all else fails, either merging or rejecting that MP would likely fix it, but obviously it would be better for the infrastructure to cope

apw
14:54
i have turned off the git launchpad integration in swm, so it should fall back to git ls-remote for its data; so i can leave things as they are

cjwatson
14:55
It's also possible it might help to change the detect-merges API to send the previously-scanned head commits as explicit stop commits for the commit graph walk

14:55
Since it may in fact just be trying to walk back through the full history and that obviously won't work for Linux ...

apw
14:56
ok confirmed we are unblocked for now

cjwatson
14:58
If it can wait a few weeks, and if it is what I think it is, this could make a good new-starter task (not the first thing to give somebody, but a good intro to git hosting mechanisms)

cristiangsp
14:58
agreed

apw
15:34
its preventing me implementing an optimisation; but i can live with that for now

15:35

(LP git repositories have a date_last_changed) which i cannot see with git ls-remote.

Related branches

Revision history for this message
Colin Watson (cjwatson) wrote :

The number of references is probably not relevant, but the depth of history is.

summary: - git_repository refs_collection does not match git ls-remote
+ Merge detection times out during ref scans in repositories with deep
+ history
tags: added: git lp-code timeout
Changed in turnip:
status: New → Triaged
importance: Undecided → High
Changed in launchpad:
status: New → Triaged
importance: Undecided → High
Colin Watson (cjwatson)
Changed in turnip:
status: Triaged → In Progress
assignee: nobody → Colin Watson (cjwatson)
Colin Watson (cjwatson)
Changed in launchpad:
status: Triaged → In Progress
assignee: nobody → Colin Watson (cjwatson)
Changed in turnip:
status: In Progress → Fix Released
Colin Watson (cjwatson)
Changed in launchpad:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.