Random errors tranferring artifacts to snapshots.l.o

Bug #911184 reported by Paul Sokolovsky on 2012-01-03
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linaro Android Infrastructure
Fix Released
Critical
Paul Sokolovsky

Bug Description

There started to appear random errors during SFTP artifact upload phase of the builds:

https://android-build.linaro.org/jenkins/job/linaro-android_staging-imx53/131/
https://android-build.linaro.org/jenkins/job/linaro-android_staging-imx53/132/

The error is:

SSH: Connecting from host [ip-10-243-34-224]
SSH: Connecting with configuration [snapshots.linaro.org] ...
SSH: Disconnecting configuration [snapshots.linaro.org] ...
ERROR: Exception when publishing, exception message [No such file]
Build step 'Send files or execute commands over SSH' changed build result to UNSTABLE

Looking at http://snapshots.linaro.org/android/~linaro-android/staging-imx53/131/target/product/iMX53/ , at least userdata.tar.bz2
was not transferred.

So, the bad (but expectable) news is that non-deterministic errors do happen. Good news is that they are detected, and build is marked as UNSTABLE, which distinguishes such build well from failed compile. So, not treating the issue as serious, keeping watching rate of such errors.

Related branches

Changed in linaro-android-infrastructure:
importance: Undecided → Medium
Changed in linaro-android-infrastructure:
status: New → Triaged
importance: Medium → Critical
milestone: none → 12.01
Paul Sokolovsky (pfalcon) wrote :

We also now getting quite often following errors running reshuffle-files script:

Moving /home/android-build-linaro/android/.tmp/linaro-android_beagle to /srv3/snapshots.linaro.org/www/android//~linaro-android/beagle... mv: inter-device move failed: `/home/android-build-linaro/android/.tmp/linaro-android_beagle/342' to `/srv3/snapshots.linaro.org/www/android//~linaro-android/beagle/342'; unable to remove target: Is a directory
Moving /home/android-build-linaro/android/.tmp/pfalcon_lava-job-info-transfer to /srv3/snapshots.linaro.org/www/android//~pfalcon/lava-job-info-transfer... done

There're 2 questions regarding that:

1. Where's source of reshuffle-files? I never saw it (final version), nor know where master copy of it is.
2. It seems to want to replace top-level directory of destination, and we'd rather need it move any new files into existing directory structure (or create if doesn't exist).

2nd feature is essentially needed for solving issue with lava-job-info not being pushed to s.l.o (more info in https://blueprints.launchpad.net/linaro-android-infrastructure/+spec/linaro-android-snaphosts-publish-finish)

It lives in lp:linaro-license-protection in the scripts/ directory. The version being run lives in

  mombin.canonical.com:/home/android-build-linaro/scripts/jenkins-post-sftp.sh

The goal was to clean that up to do what you describe, and to also disallow overwriting of existing files (to ensure nothing can mess up previous build results). There was not enough time to come up with a proper variant to do all that at the time.

Changed in linaro-android-infrastructure:
assignee: nobody → Paul Sokolovsky (pfalcon)
status: Triaged → In Progress
Paul Sokolovsky (pfalcon) wrote :

Ok, so Danilo's suspicion was that, if there're roughly concurrent transfers from finishing jobs, then reshuffle-files run for one transfer may "clean up" files being transferred by another job (because reshuffle-files cleaned up entire uplaod area). We fixed this by explicitly passing job name to reshuffle-files, so it operates only on a specific dir. This was deployed last Thu, and since then no UNSTABLE builds popped up, so apparently the issue was indeed that (and not random transport errors). Closing.

Changed in linaro-android-infrastructure:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers