Network failures when downloading assets from Launchpad are not notified about and can result in broken images

Bug #1896215 reported by Iain Lane on 2020-09-18
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu CD Images
High
Unassigned
debian-cd
New
High
Unassigned

Bug Description

See today's groovy daily, log here:

  https://people.canonical.com/~ubuntu-archive/cd-build-logs/ubuntu/groovy/daily-live-20200918.log

The amd64 image failed to build

mv: cannot stat '/srv/cdimage.ubuntu.com/scratch/ubuntu/groovy/daily-live/tmp/groovy-amd64/CD1/casper/filesystem.kernel-generic': No such file or directory
make: *** [Makefile:903: /srv/cdimage.ubuntu.com/scratch/ubuntu/groovy/daily-live/tmp/groovy-amd64/bootable-stamp] Error 1
ERROR WHILE BUILDING OFFICIAL IMAGES !!

This is because some of the assets failed to be downloaded from Launchpad

===== Downloading live filesystem images =====
Fri Sep 18 08:28:21 UTC 2020
failed: Network is unreachable.
failed: Network is unreachable.

But there was no notification of this failure to people subscribed to receive them.

I think that we fail to bubble download failures up, possibly somewhere around here

  https://bazaar.launchpad.net/~ubuntu-cdimage/ubuntu-cdimage/mainline/view/head:/lib/cdimage/livefs.py#L651

(we could also do with some backoff/retry logic, maybe in osextras.py/fetch itself)

or maybe very slightly later on we should assert that all the files we need are in place.

Also, in debian-cd we have, since roughly forever, essentially ignored images failing in favour of continuing to build any other arches

  https://bazaar.launchpad.net/~ubuntu-cdimage/debian-cd/ubuntu/view/head:/build_all.sh#L110

I wonder if we should revisit this in some way? e.g. continue to build all the arches but store the bad error code and exit with it later on. Or if that's not desirable, get cdimage to check for the output file(s) being present and notify if they're not?

Worst thing: such an asset download failure can actually result in a successful image build, but with missing assets on the image. As per:
https://people.canonical.com/~ubuntu-archive/cd-build-logs/ubuntu-server/groovy/daily-live-20200930.log

Iain Lane (laney) on 2020-09-18
summary: - Network failures when downloading the build do not abort the build
+ Network failures when downloading assets from Launchpad are not notified
+ about
Łukasz Zemczak (sil2100) wrote :

We just hit the same bug but in a bit of a worse way. This time there was a networking error while building ubuntu-server images that resulted in the image build to SUCCEED but with certain image bits missing as not pulled from LP:

https://people.canonical.com/~ubuntu-archive/cd-build-logs/ubuntu-server/groovy/daily-live-20200930.log

(in this case it was amd64.modules.squashfs-generic, so no modules on the image!)

I'm modifying the bug description to include this info. I think this should be prioritized properly as it can result in us wasting a lot of time. No feedback, no image build failure, nothing - just a broken image shipped 'successfully'.

summary: Network failures when downloading assets from Launchpad are not notified
- about
+ about and can result in broken images
Changed in ubuntu-cdimage:
importance: Undecided → High
Changed in debian-cd:
importance: Undecided → High
description: updated
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers