intermitent failure of git clone from git.openstack.org

Bug #1229352 reported by Sean Dague
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Core Infrastructure
Fix Released
Critical
Jeremy Stanley

Bug Description

On review - https://review.openstack.org/#/c/47075/ we failed because of a failed clone from git.openstack.org:

2013-09-23 16:44:11.067 | Started by user anonymous
2013-09-23 16:44:11.073 | [EnvInject] - Loading node environment variables.
2013-09-23 16:44:11.128 | Building remotely on centos6-6 in workspace /home/jenkins/workspace/gate-nova-python26
2013-09-23 16:44:11.191 | [gate-nova-python26] $ /bin/bash -xe /tmp/hudson6221067160592390109.sh
2013-09-23 16:44:11.194 | + /usr/local/jenkins/slave_scripts/gerrit-git-prep.sh https://review.openstack.org http://zuul.openstack.org git://git.openstack.org
2013-09-23 16:44:11.203 | Triggered by: https://review.openstack.org/47075
2013-09-23 16:44:11.204 | + [[ ! -e .git ]]
2013-09-23 16:44:11.204 | + git remote set-url origin git://git.openstack.org/openstack/nova
2013-09-23 16:44:11.205 | + git remote update
2013-09-23 16:44:11.212 | Fetching origin
2013-09-23 16:45:13.839 | fatal: The remote end hung up unexpectedly
2013-09-23 16:45:13.848 | error: Could not fetch origin
2013-09-23 16:45:13.849 | + git remote update
2013-09-23 16:45:13.859 | Fetching origin
2013-09-23 16:46:16.422 | error: fetch died of signal 13
2013-09-23 16:46:16.423 | error: Could not fetch origin
2013-09-23 16:46:16.449 | Build step 'Execute shell' marked build as failure
2013-09-23 16:46:16.486 | [SCP] Connecting to static.openstack.org
2013-09-23 16:46:17.685 | [SCP] No file(s) found: **/*nose_results.html
2013-09-23 16:46:20.927 | [SCP] '**/*nose_results.html' doesn't match anything: '**' exists but not '**/*nose_results.html'
2013-09-23 16:46:21.345 | [SCP] No file(s) found: **/*testr_results.html.gz
2013-09-23 16:46:24.586 | [SCP] '**/*testr_results.html.gz' doesn't match anything: '**' exists but not '**/*testr_results.html.gz'
2013-09-23 16:46:24.661 | [SCP] Trying to create /srv/static/logs/75/47075/1/gate
2013-09-23 16:46:24.675 | [SCP] Trying to create /srv/static/logs/75/47075/1/gate/gate-nova-python26
2013-09-23 16:46:24.688 | [SCP] Trying to create /srv/static/logs/75/47075/1/gate/gate-nova-python26/18cd2fe
2013-09-23 16:46:24.697 | [SCP] uploading file: '/srv/static/logs/75/47075/1/gate/gate-nova-python26/18cd2fe/tmpetliam'
2013-09-23 16:46:25.646 | [SCP] No file(s) found: **/*subunit_log.txt.gz
2013-09-23 16:46:28.901 | [SCP] '**/*subunit_log.txt.gz' doesn't match anything: '**' exists but not '**/*subunit_log.txt.gz'
2013-09-23 16:46:28.902 | [SCP] Connecting to static.openstack.org
2013-09-23 16:46:29.726 | [SCP] Copying console log.
2013-09-23 16:46:29.736 | Finished: FAILURE

Not entirely sure the cause, as that's about the only log messages that exist. However it looks like we are not as HA as expected on the new mirror.

Revision history for this message
Jeremy Stanley (fungi) wrote :

The last time we witnessed this, the error was due to a corrupt gate-nova-python26 workspace on the offending unit test slave.

Changed in openstack-ci:
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Jeremy Stanley (fungi)
Revision history for this message
Jeremy Stanley (fungi) wrote :

So far it seems to have effected centos6-2 and centos6-6 so I have temporarily taken them both out of service in jenkins02 for further investigation.

Revision history for this message
Jeremy Stanley (fungi) wrote :

And centos6-5 was just reported exhibiting this as well. I've confirmed that the behavior is basically the same as what I saw on centos6-8 last week...

    -bash-4.1$ cp -ax workspace/gate-nova-python26.broken testing
    -bash-4.1$ cd testing/
    -bash-4.1$ git remote -v
    origin git://git.openstack.org/openstack/nova (fetch)
    origin git://git.openstack.org/openstack/nova (push)
    -bash-4.1$ git remote update
    Fetching origin
    error: fetch died of signal 13
    error: Could not fetch origin
    -bash-4.1$ git fsck
    dangling commit 7015605b3e639350a1a7203b332256e6394014f1
    dangling commit 0c47cac76a88edf97d995f01d8f2c52e1c4d8f56
    dangling commit e2a225491ccabb5722c57829f65d1951db7cf937
    -bash-4.1$ git gc
    Counting objects: 215925, done.
    Delta compression using up to 4 threads.
    Compressing objects: 100% (81823/81823), done.
    Writing objects: 100% (215925/215925), done.
    Total 215925 (delta 156513), reused 183643 (delta 127638)
    -bash-4.1$ git fsck
    -bash-4.1$ git remote update
    Fetching origin
    remote: Counting objects: 656, done.
    remote: Compressing objects: 100% (409/409), done.
    remote: Total 474 (delta 327), reused 135 (delta 52)
    Receiving objects: 100% (474/474), 91.95 KiB, done.
    Resolving deltas: 100% (327/327), completed with 66 local objects.
    From git://git.openstack.org/openstack/nova
       3db4372..852d7c7 master -> origin/master

The fact that we're only seeing this so far with the nova tests on CentOS 6.4 unit test slaves suggests there's something about the version of git client being used there which is exposed/exacerbated by the size of nova's Git repository or the length of time it takes to perform a remote update for it.

As a stop-gap, we might want to make the gerrit-git-prep builder perform a prophylactic 'git gc' before beginning to remote update.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.openstack.org/47915

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.openstack.org/47915
Committed: http://github.com/openstack-infra/config/commit/61c375a8fcdee54f76c03a63b3ab3a3b1cccc415
Submitter: Jenkins
Branch: master

commit 61c375a8fcdee54f76c03a63b3ab3a3b1cccc415
Author: Jeremy Stanley <email address hidden>
Date: Mon Sep 23 19:57:18 2013 +0000

    Garbage collect Git repos in gerrit-git-prep

    * modules/jenkins/files/slave_scripts/gerrit-git-prep.sh: Sometimes
    Git repositories can be left in a dirty state, preventing subsequent
    operations. If a git remote update fails, garbage collect and then
    try again.

    Change-Id: I8455a3193081f9a0c9372a10f5ffdbc25fc864d9
    Fixes-Bug:1229352

Changed in openstack-ci:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.