Random "revision not found" errors using git over dumb http

Bug #1169976 reported by Tyler Baker
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro Infrastructure Misc
Fix Released
Critical
Milo Casagrande

Bug Description

Data corruptions occurs when using repo (git-over-http) to pull from the Linaro Android mirror.

Revision history for this message
Milo Casagrande (milo) wrote :

When I first looked into it, the problem always seemed not reproducible: it didn't happen always, or at different point in download time.

Marking it as opinion for the moment and if we have more time to investigate it we'll pick it up.

Changed in linaro-android-mirror:
importance: Undecided → Low
status: New → Opinion
Milo Casagrande (milo)
affects: linaro-android-mirror → linaro-infrastructure-misc
Changed in linaro-infrastructure-misc:
status: Opinion → Confirmed
importance: Low → Critical
Milo Casagrande (milo)
Changed in linaro-infrastructure-misc:
assignee: nobody → Milo Casagrande (milo)
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

https://android-build.linaro.org/jenkins/job/linaro-android_vexpress-linaro/103/console :

Fetching projecterror: Unable to find 9d8721f8abb018429e3dba2296bce789b776aa0c under http://android.git.linaro.org/git-ro/platform/frameworks/base
Cannot obtain needed object 9d8721f8abb018429e3dba2296bce789b776aa0c
while processing commit 7fa6f9930189cab44579441260f91376c6a50ad1.
error: Fetch failed.

error: Cannot fetch platform/frameworks/base

Issues popped up completely randomly, https://android-build.linaro.org/jenkins/job/linaro-android_vexpress-linaro/101/ done ~20mins before it didn't have it.

Changed in linaro-infrastructure-misc:
assignee: Milo Casagrande (milo) → Paul Sokolovsky (pfalcon)
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

On android-build:

ubuntu@ip-10-194-98-139:/mnt/seed/uniseed.http-proto$ repo sync
error: Unable to find 31e85d88f552cfd709ed45f8f32758a68ecc64a6 under http://android.git.linaro.org/git-ro/device/asus/grouper
Cannot obtain needed object 31e85d88f552cfd709ed45f8f32758a68ecc64a6
while processing commit bc7426f04bd2770d71e93ec6503394f0b9225804.
error: Fetch failed.
Fetching projects: 7% (31/438) error: Unable to find f83054ad0bff2e533dedd51d1ee5944bc499d693 under http://android.git.linaro.org/git-ro/device/lge/mako-kernel
Cannot obtain needed object f83054ad0bff2e533dedd51d1ee5944bc499d693
error: Fetch failed.
Fetching projects: 8% (36/438) error: Unable to find e138a77acca8ce389c7b0a6c2fb2f79b87dbc676 under http://android.git.linaro.org/git-ro/device/linaro/common
Cannot obtain needed object e138a77acca8ce389c7b0a6c2fb2f79b87dbc676
error: Fetch failed.
From http://android.git.linaro.org/git-ro/device/linaro/vexpress
   b0db1a5..5bbed60 linaro-jb -> linaro-jb
Fetching projects: 18% (79/438) error: Unable to find 31e85d88f552cfd709ed45f8f32758a68ecc64a6 under http://android.git.linaro.org/git-ro/device/asus/grouper
Cannot obtain needed object 31e85d88f552cfd709ed45f8f32758a68ecc64a6
while processing commit bc7426f04bd2770d71e93ec6503394f0b9225804.
error: Fetch failed.
Fetching projects: 23% (101/438) error: Unable to find f83054ad0bff2e533dedd51d1ee5944bc499d693 under http://android.git.linaro.org/git-ro/device/lge/mako-kernel
Cannot obtain needed object f83054ad0bff2e533dedd51d1ee5944bc499d693
error: Fetch failed.
error: Unable to find e138a77acca8ce389c7b0a6c2fb2f79b87dbc676 under http://android.git.linaro.org/git-ro/device/linaro/common
Cannot obtain needed object e138a77acca8ce389c7b0a6c2fb2f79b87dbc676
error: Fetch failed.
Fetching projects: 24% (106/438) error: Unable to find 01623f9e653d74dbf0361c93c0250bab4aee2556 under http://android.git.linaro.org/git-ro/platform/external/icu4c
Cannot obtain needed object 01623f9e653d74dbf0361c93c0250bab4aee2556
error: Fetch failed.
error: Cannot fetch device/asus/grouper
error: Cannot fetch device/lge/mako-kernel
error: Unable to find 01623f9e653d74dbf0361c93c0250bab4aee2556 under http://android.git.linaro.org/git-ro/platform/external/icu4c
Cannot obtain needed object 01623f9e653d74dbf0361c93c0250bab4aee2556
error: Fetch failed.
error: Cannot fetch device/linaro/common
...

Yesterday there were just 2 such errors.

Revision history for this message
Paul Sokolovsky (pfalcon) wrote :
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Couple more attempts, and "suddenly" recovered https://android-build.linaro.org/jenkins/job/linaro-android_vexpress-linaro/107/ . I didn't do any active changes to android.git.linaro.org, instead in the meantime tried to clean-room reproduce this issue (i.e with a direct git command). That turned out to be surprisingly hard and time-consuming, as it involves frameworks/base which has ~1Gb of content. So, fetch just hanged couple of times on android-build.l.o where I tried checkout.

summary: - Data corruption occurs when using git over http with repo
+ Random "revision not found" errors using git over dumb http
Revision history for this message
Paul Sokolovsky (pfalcon) wrote :

Mail form Tyler:

I have setup two LAVA ci jobs to test both smart + dumb http installations:

https://validation.linaro.org/dashboard/image-reports/lava-master-installation-git-dumb-http

https://validation.linaro.org/dashboard/image-reports/lava-master-installation-git-smart-http

The LAVA installation fails randomly during a buildout step (known issue),
however last night I was able to install it six times in a row with both
dumb and smart http without a failure. These jobs will run every hour.
Additionally, I am going to add a job to upgrade an existing installation.

Changed in linaro-infrastructure-misc:
assignee: Paul Sokolovsky (pfalcon) → Milo Casagrande (milo)
Revision history for this message
Milo Casagrande (milo) wrote :

Tried to create a new seed directly on android-build, this is the result:

error: Unable to find f83054ad0bff2e533dedd51d1ee5944bc499d693 under http://android.git.linaro.org/git-ro/device/lge/mako-kernel
Cannot obtain needed object f83054ad0bff2e533dedd51d1ee5944bc499d693
error: Fetch failed.
error: Unable to find f83054ad0bff2e533dedd51d1ee5944bc499d693 under http://android.git.linaro.org/git-ro/device/lge/mako-kernel
Cannot obtain needed object f83054ad0bff2e533dedd51d1ee5944bc499d693
error: Fetch failed.
error: Cannot fetch device/lge/mako-kernel

Looking back at android.git.l.o, the object f83054ad0bff2e533dedd51d1ee5944bc499d693 does not exists on the file system, and it doesn't seem to have a pack file.

Removing the cloned directory and retrying the repo process solves the problem.

Revision history for this message
Milo Casagrande (milo) wrote :

android.git.l.o now also has the same Apache configuration as used on staging.git.l.o: we force the data coming out using the dumb HTTP protocol, to have no-cache, no-store, no-transform in the Cache-Control header.

The only difference is in how the git-ro path is handled: android.git.l.o is using a rewrite rule, staging.git.l.o is using an alias match.

This should solve problems related to transparent proxies along the way (looks like Amazon is doing something like that), and also possible problems we might have had with the Squid proxy hosted on android-build.

Changed in linaro-infrastructure-misc:
status: Confirmed → In Progress
Revision history for this message
Milo Casagrande (milo) wrote :

After fixing a couple of the problems as per comment #7, I didn't see anymore those errors, but the seed creation blocks when downloading private git repositories. Error is: permission denied (publickey).

Revision history for this message
Milo Casagrande (milo) wrote :

I re-run the jobs even today, and I didn't see any errors so far.

It has bee reported that the cause of this error might also be due to bad connectivity, or an interrupted fetch operation that corrupted the local checkout/clone.

In these cases, it can help resolve the issue by running the following commands in the local directory:
git fsck
git gc

Since the same errors can randomly be experienced with the actual staging git instance, that has a gerrit instance running as well, the suspect is that gerrit merges might cause some problems when pulling changes via git dumb HTTP.

Revision history for this message
Milo Casagrande (milo) wrote :

I'm closing this bug.

The problem with android.git.l.o does not happen anymore, and the problem with the staging.git.l.o instance when happens looks like is due to the merge policy used by gerrit, that does not trigger a git update-server-info on the repository. Merge policy was changed on staging.review.l.o to be "cherry pick": if the problem re-appears also on android.git.l.o it might be worth looking into the gerrit instance running there as well.

Changed in linaro-infrastructure-misc:
status: In Progress → Fix Released
milestone: none → 2013.10
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.