snap builds: network access timeout

Bug #1885164 reported by Chris Patterson
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
High
Unassigned
Rutabaga
Triaged
High
Unassigned

Bug Description

It appears that launchpad builds are limited to 120 minutes of network connectivity. This will break builds that require network access after 120 minutes. It can be quite challenging to force all network requirements to be satisfied in that time-frame, particularly for large builds on slower architectures.

Historically snapcraft has had a number of issues by attempting to get all/most of the network access done in the `pull` phase. It required brittle constructs that tend to break. So core20 snaps tend to rely on the natural build system constructs which typically expect network access at any point in the build process. Even for non-core20, the use of `after` tends to require pull steps be run late(r) in the build process.

Can we relax the timeout and perhaps allow configuration for select snaps to enable an extended time-frame? For example, Chromium takes a really long time to build and has to work around the time constraints.

Thanks!

Changed in launchpad:
importance: Undecided → High
status: New → Triaged
tags: added: build-infrastructure buildfarm lp-snappy
Revision history for this message
glancr team (glancr) wrote :

I'm again running into this issue with the wpe-webkit-mir-kiosk snap, because WPE WebKit takes > 4 hours to build on ARM, then fails on stage-packages (not even to mention a part that needs WPE WebKit built before it can start). Any update on this?

Revision history for this message
Michał Sawicz (saviq) wrote :

Tail of the log for a failed build (full log attached):

Err libavahi-glib1_0.7-4ubuntu7.1_arm64.deb

  407 Proxy Authentication Required [IP: 10.10.10.1 8222]

Fetched 0 B in 0s (0 B/s)
Package fetch error: The item '/root/.cache/snapcraft/download/libavahi-glib1_0.7-4ubuntu7.1_arm64.deb' could not be fetched: 407 Proxy Authentication Required [IP: 10.10.10.1 8222]
Build failed
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/lpbuildd/target/build_snap.py", line 222, in run
    self.pull()
  File "/usr/lib/python3/dist-packages/lpbuildd/target/build_snap.py", line 192, in pull
    env=env)
  File "/usr/lib/python3/dist-packages/lpbuildd/target/build_snap.py", line 100, in run_build_command
    return self.backend.run(args, env=full_env, **kwargs)
  File "/usr/lib/python3/dist-packages/lpbuildd/target/lxd.py", line 537, in run
    subprocess.check_call(cmd, **kwargs)
  File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['lxc', 'exec', 'lp-focal-arm64', '--env', 'LANG=C.UTF-8', '--env', 'SHELL=/bin/sh', '--env', 'http_proxy=http://10.10.10.1:8222/', '--env', 'https_proxy=http://10.10.10.1:8222/', '--env', 'GIT_PROXY_COMMAND=/usr/local/bin/lpbuildd-git-proxy', '--env', 'SNAPPY_STORE_NO_CDN=1', '--env', 'SNAPCRAFT_LOCAL_SOURCES=1', '--env', 'SNAPCRAFT_SETUP_CORE=1', '--env', 'SNAPCRAFT_BUILD_INFO=1', '--env', 'SNAPCRAFT_IMAGE_INFO={"build-request-id": "lp-67760438", "build-request-timestamp": "2021-12-03T12:19:48Z", "build_url": "https://launchpad.net/~saviq/+snap/snapcraft-wpe-webkit-mir-kiosk-e08b19bacdf39f81c2d161436f172dfd/+build/1600431"}', '--env', 'SNAPCRAFT_BUILD_ENVIRONMENT=host', '--', '/bin/sh', '-c', 'cd /build/snapcraft-wpe-webkit-mir-kiosk-e08b19bacdf39f81c2d161436f172dfd && linux64 snapcraft pull']' returned non-zero exit status 2.
Revoking proxy token...
Unable to revoke token for SNAPBUILD-1600431-1638533990: HTTP Error 401: UnauthorizedRUN: /usr/share/launchpad-buildd/bin/in-target scan-for-processes --backend=lxd --series=focal --arch=arm64 SNAPBUILD-1600431
Scanning for processes to kill in build SNAPBUILD-1600431

Revision history for this message
Colin Watson (cjwatson) wrote :

The limit is 3 hours now rather than 2, but it sounds like that still isn't enough for everything.

The timeout is an anti-abuse mechanism, so I'm not comfortable with continuing to raise it arbitrarily across the board. I think we're going to have to come up with a way to set this on a per-snap-recipe basis (which won't help "snapcraft remote-build", but it will at least allow us to configure this for persistent snap recipes).

Unfortunately this is complicated somewhat by the design of rutabaga (the component which issues authentication tokens for the builder proxy and tracks their validity): rather than setting the expiry when issuing a token, it computes it based on the issuing time when checking whether the token is still valid. This design can only cope with a single lifetime for all tokens, and will need a schema change to be more flexible. However, rutabaga's database has never had a schema change and in fact has no infrastructure for making them, so we're going to have to fix that first. (Maybe we should convert it from SQLite to PostgreSQL while we're there, since we're a lot more used to managing PostgreSQL instances; that would allow us to deploy multiple rutabaga units and so remove a single point of failure.)

Once rutabaga can take an expiry time when issuing a token, it's fairly easy to add a column to the `Snap` table in Launchpad, editable by admins, to let us configure this differently for different snap recipes. But it's going to take a little while to get to that point!

Changed in rutabaga:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Michał Sawicz (saviq) wrote :

Hey @cjwatson, would this also work for snapcraft.io builds?

Revision history for this message
Colin Watson (cjwatson) wrote :

Yes, snapcraft.io builds use persistent snap recipes, so if we had per-recipe timeout configuration then Launchpad staff could configure that as needed.

Revision history for this message
glancr team (glancr) wrote :

Might it be possible to automate the new snapcraft offline mode (https://snapcraft.io/docs/snapcraft-offline) for this? I.e. do a `snapcraft pull` to fetch all required data and then build from that for as long as it takes? This would still break builds which require network access in their override-build/stage/prime scriptlets, but I guess that's less likely than a part that builds > 3h on some architectures.

Oh, and please document that time limit somewhere – took me a while to figure out what's wrong when I initially ran into this :-) If it's a good fit on https://snapcraft.io/docs/build-options, I can propose a line through the snapcraft forums.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.