Desktop netboot crashes on startup due to cloud-init schema validation handling

Bug #2062988 reported by Chris Peterson
146
This bug affects 22 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
New
Undecided
Unassigned
subiquity
Fix Committed
Medium
Chris Peterson
ubuntu-desktop-provision
Invalid
Undecided
Unassigned

Bug Description

When netbooting the desktop image, Subiquity crashes due to a KeyError trying to access cloud-config data that doesn't exist.

Subiquity inspects bad keys as reported by "cloud-init schema --system" and checks if they are misplaced autoinstall keys. Sometimes the bad keys are not in the combined cloud config that subiquity picks up, so a KeyError is thrown when subiquity tries to access this data:

https://github.com/canonical/subiquity/blob/74c37fe0c2d1c11c4e3cecad1712e37877d6970f/subiquity/server/server.py#L791

The interesting part is that this only happens when trying to netboot the desktop iso (server is fine) and especially the fact that by the time the system is up "cloud-init schema --system" reports no errors.

The offending key is "broadcast" according to the traceback and the only reference to this key is in the network-config at /var/lib/cloud/instances/nocloud/network-config.json which validates fine.

Output of "sudo cloud-init schema --system":

Found cloud-config data types: user-data, network-cofig

1. user-data at /var/lib/cloud/instances/nocloud/cloud-config.txt:
Empty 'cloud-config' found at /var/lib/cloud/instances/nocloud/cloud-config.txt.
 Nothing to validate

2. network-config at /var/lib/cloud/instances/nocloud/network-cofig.json:
  Valid schema network-config

No cloud-config is passed to the system.

ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: subiquity (unknown)
ProcVersionSignature: Ubuntu 6.8.0-22.22-generic 6.8.1
Uname: Linux 6.8.0-22-generic x86_64
NonfreeKernelModules: zfs
ApportVersion: 2.28.0-0ubuntu1
Architecture: amd64
CasperMD5CheckResult: pass
CasperVersion: 1.497
CloudArchitecture: x86_64
CloudID: nocloud
CloudName: unknown
CloudPlatform: nocloud
CloudSubPlatform: seed-dir (/var/lib/cloud/seed/nocloud)
Date: Sat Apr 20 19:26:47 2024
ExecutablePath: /snap/ubuntu-desktop-bootstrap/149/bin/subiquity/subiquity/cmd/server.py
InterpreterPath: /snap/ubuntu-desktop-bootstrap/149/usr/bin/python3.10
LiveMediaBuild: Ubuntu 24.04 LTS "Noble Numbat" - Beta amd64 (20240418)
Lsusb: Error: command ['lsusb'] failed with exit code 1:
Lsusb-t:

Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
ProcAttrCurrent: snap.hostname-desktop-bootstrap.subiquity-server (complain)
ProcCmdline: /snap/hostname-desktop-bootstrap/149/usr/bin/python3.10 -m subiquity.cmd.server --use-os-prober --storage-version=2 --postinst-hooks-dir=/snap/hostname-desktop-bootstrap/149/etc/subiquity/postinst.d
ProcEnviron:
 LANG=C.UTF-8
 LD_LIBRARY_PATH=<set>
 PATH=(custom, no user)
ProcKernelCmdLine: BOOT_IMAGE=linux iso-url=http://10.0.0.138/noble-desktop-amd64-04-19.iso ip=dhcp ---
Python3Details: /usr/bin/python3.12, Python 3.12.3, python3-minimal, 3.12.3-0ubuntu1
PythonDetails: N/A
SnapUpdated: False
SourcePackage: subiquity
Title: unknown error crashed with KeyError
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 02/06/2015
dmi.bios.release: 0.0
dmi.bios.vendor: EFI Development Kit II / OVMF
dmi.bios.version: 0.0.0
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-jammy
dmi.modalias: dmi:bvnEFIDevelopmentKitII/OVMF:bvr0.0.0:bd02/06/2015:br0.0:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-jammy:cvnQEMU:ct1:cvrpc-i440fx-jammy:sku:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-jammy
dmi.sys.vendor: QEMU

Revision history for this message
Chris Peterson (cpete) wrote :
summary: - Desktop netboot crashes on startup
+ Desktop netboot crashes on startup due to cloud-init schema validation
+ handling
Changed in subiquity:
status: Confirmed → Triaged
Chris Peterson (cpete)
Changed in ubuntu-desktop-provision:
status: New → Invalid
Chris Peterson (cpete)
Changed in subiquity:
importance: High → Medium
Revision history for this message
Dan Bungert (dbungert) wrote :

I mistrust having cloud-init in the snap due to mis-match problem with the host system, and this bug is another version of that problem.

An older version of cloud-init is in the snap - SRU pending LP: #2056100 - and a newer version outside the snap. The copy of cloud-init in the snap is what is used to run `cloud-init schema --system`. I have attached a script that allows one to run arbitrary programs as if they were in the subiquity-like snap, and doing so with `cloud-init schema --system` shows the following:

+ cloud-init schema --system
Found cloud-config data types: user-data, network-config

1. user-data at /var/lib/cloud/instances/nocloud/cloud-config.txt:
Empty 'cloud-config' found at /var/lib/cloud/instances/nocloud/cloud-config.txt. Nothing to validate.

2. network-config at /var/lib/cloud/instances/nocloud/network-config.json:
  Invalid network-config /var/lib/cloud/instances/nocloud/network-config.json
  Error: Cloud config schema errors: config.0.subnets.0: Additional properties are not allowed ('broadcast' was unexpected)

Error: Invalid schema: network-config

----

The options to resolve this and fix desktop netboot appear to be:
1. rebuild snaps with cloud-init 24.1.3, either after SRU or forcing the issue now with PPA tricks
2. run `cloud-init schema --system` not from the snap, but from the host
3. adjust handling after `cloud-init schema --system` so that it is not allowed to fail the install

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote :

Is there any workaround hat can be used right now? I tested the beta to be prepared for the new release, and netboot and autoinstall were fine there. That such a bug was introduced between the beta and the final release is really annoying, I'm sorry to say.

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote :

OK, a possible workaround is to boot the system with kernel parameter cloud-init=disabled, clicking through the first few installer pages, and then manually specifying an autoinstall URL.

(But don't mistype the URL, otherwise the installer will crash. Clicking the button to show the error will result in a blank, black page in the installer window. After this, the installer won't work again in the same session, but a reboot is necessary. So these are four bugs in a row, when including the bug this issue is about.

Normally I'm not the one to say such a thing, and this is not meant as a personal attack on anybody, but: Having an installer in such a state in an LTS release should be embarrassing to Canonical!)

Revision history for this message
Ozgur As (ozgur-as) wrote :

just wanted to add that booting the last beta iso over netboot with a proper cloud-config-url to autoinstall worked without problems. it's broken with the lts iso, getting a crash with a complain of a missing "broadcast" key just like described in the main report.

Revision history for this message
Yiu-Chung Lee (lee-yiu-chung) wrote :

I found another workaround:

1. Close the failed installer and get into desktop environment

2. Open terminal, sudo to root, and enter these these commands:
# rm /var/lib/cloud/instance/network-config.json
# systemctl restart snap.ubuntu-desktop-bootstrap-subiquity-server

3. Open the installer again, it should work now

Chris Peterson (cpete)
Changed in subiquity:
assignee: nobody → Chris Peterson (cpete)
Chris Peterson (cpete)
Changed in subiquity:
status: Triaged → In Progress
Revision history for this message
Ozgur As (ozgur-as) wrote :

is there a workaround to run netboot autoinstall unattended? what's the proper way to delete network-config.json from the extracted iso since it's coming from a snap? or is replacing the cloud-init snap in the iso possible? early-commands don't work because cloud-init is already crashed before the commands are processed.

Revision history for this message
Kendall Link (kendall-link77) wrote :

Just testing this now on an air-gaped network. I can confirm that what Marian Rainer-Harbach (marianrh) wrote on 2024-04-26 is correct. By passing the cloud-init=disabled option kernel parameter in PXE the crash doesn't occur and I am able to actually install Ubuntu 24.04 as expected.

This at least gets me to a point where I can proceed testing with my autoinstall.yaml file but as Ozgur As (ozgur-as) wrote on 2024-04-30 I will also be looking to implement a completely hands off provisioning process via PXE.

Revision history for this message
Bernhard Suttner (sbernhard) wrote :

Will Ubuntu release a new version of the ISO image? AFAIK, the same procedure worked in the beta ISO images and was just broken in the release ISO image. It would be good to solve this issue and then release a new ISO image!

Revision history for this message
Bastian Schmidt (bastians) wrote :

I'm experiencing the same issue not only with the Desktop version, but also when running an Autoinstall (nocloud) installation of the 24.04 server image:

The 24.04 server image ISO uses subiquity 24.04.1 which relies on its snap-bundled cloud-init 23.4.4 (even though cloud-init 24.1.3 comes with the installer). I'm ending up in the same "broadcast" error mentioned in #2056460.
When running "sudo cloud-init schema --system" in the paused/stopped installer, it uses the installed deb-version cloud-init 24.1.3 which does not fail when validating the network schema.

The known issues of subiquity 24.04.1 mention this error only for the Desktop installation: https://discourse.ubuntu.com/t/subiquity-24-04-1-has-been-released-to-the-stable-channel/44493#known-issues-14

Can someone verify this for the Server installation? What are the chances that a new ISO image is released soon?

Revision history for this message
AYUNTAMIENTO DE VERA (aytovera) wrote :

I can confirm it @bastians, I have the same error and the same thing happens in the latest Jammy iso server
https://cdimage.ubuntu.com/ubuntu-server/jammy/daily-live/current/

Revision history for this message
Franck Iaropoli (franck-iaropoli-arm) wrote :

I just started a new build and it's working with refresh-installer set to true and channel set to use latest/edge

  refresh-installer:
    update: true
    channel: "latest/edge"

# snap info subiquity
name: subiquity
summary: Ubuntu installer
publisher: Canonical✓
store-url: https://snapcraft.io/subiquity
contact: https://bugs.launchpad.net/subiquity
license: unset
description: |
  The Ubuntu server installer
commands:
  - subiquity.curtin
  - subiquity.probert
  - subiquity
services:
  subiquity.subiquity-server: simple, enabled, active
  subiquity.subiquity-service: simple, enabled, active
snap-id: ba2aj8guta0zSRlT3QM5aJNAUXPlBtf9
tracking: latest/stable
refresh-date: today at 14:27 UTC
channels:
  latest/stable: 24.04.1 2024-04-25 (5741) 21MB classic
  latest/candidate: ↑
  latest/beta: 24.04.1 2024-04-17 (5741) 21MB classic
  latest/edge: 24.10-devel+git40.7a2002bc 2024-05-10 (5803) 21MB classic
installed: 24.04.1

Revision history for this message
Bastian Schmidt (bastians) wrote :

@franck-iaropoli-arm thanks for your response!
I tried several builds today but the refresh-installer option does not change any behavior for me. Even a minimal user-data config like:

  #cloud-config
  autoinstall:
    version: 1
    refresh-installer:
      update:true
      channel: "latest/edge"

Just prompts me to the "broadcast" error.

Can you maybe share your user-data file?

Revision history for this message
Franck Iaropoli (franck-iaropoli-arm) wrote :

Oh no this doesn't work anymore (I was using beta iso yesterday and final release one today).

Refreshing the subiquity snap after the error makes the install to continue

snap refresh subiquity --channel=latest/edge

I found this other bug report https://bugs.launchpad.net/ubuntu/+source/subiquity/+bug/2063813 and comment https://bugs.launchpad.net/ubuntu/+source/subiquity/+bug/2063813/comments/3

So we need a new iso with a more recent subiquity version ?

Revision history for this message
Bastian Schmidt (bastians) wrote :

@@franck-iaropoli-arm I think we need a new iso, yes.

I can confirm that updating subiquity manually makes the install to continue.

Revision history for this message
Ozgur As (ozgur-as) wrote :

even if the workaround of using the edge channel worked today, we wouldn't know of the concurrent versions that would be pushed to the edge channel going forward.

is there a method for manually modifying the iso to replace the subiquity snap with the latest version? 24.04.1 is scheduled for august and unattended install in the lts is broken.

Revision history for this message
Kendall Link (kendall-link77) wrote :

I'll also throw my hat into the "Hoping for an ISO" pool. All of the systems I'm trying to netboot are all on an air-gaped network. For me this means that the installer can't dynamically call home and update. If there is a way to update the version of the subiquity snap on the ISO I'd be open to giving that a try.

Either way... I'm patiently waiting for the status of this bug to change ;)

Revision history for this message
Franck Iaropoli (franck-iaropoli-arm) wrote :

thanks for the confirmation all and agree we need a new iso or a way to update subiquity on current one.

Revision history for this message
Bastian Schmidt (bastians) wrote :

Hey everyone,

Also from me, thanks a lot for the confirmation! I did some further testing of the issue:

Even though this seems odd to me, but the error occurs only when I'm deploying a VM on VMWare vSphere. Using other hypervisors like Proxmox or a local libvirt/kvm instance, the schema validation error does not occur.

What are you guys using to deploy Ubuntu 24.04? Can anyone verify this?

Besides that, there is a script in the subiquity repo which allows one to patch new versions of the subiquity installer snap into an ISO image:

https://github.com/canonical/subiquity?tab=readme-ov-file#build-and-inject-your-changes-into-an-iso

Following those steps, I was able to inject the latest version of the subiquity installer into the current release ISO and run an Autoinstall without further issues. I hope this helps!

Revision history for this message
Ozgur As (ozgur-as) wrote :

i was using a netboot image (netboot.xyz) to boot over lan with ipxe, with nfsroot pointing to the extracted 24.04 lts iso and cloud-config-url with a valid user-data file served over http.

Chris Peterson (cpete)
Changed in subiquity:
status: In Progress → Fix Committed
Revision history for this message
Sebastian Grebe (swer21) wrote :

Hi

I'm trying to create an iso with updated subiquity. I tryed the make-edge-iso.sh script. And I manually downloaded a updated snap from the edge channel and use the method linked by bastians to create a new iso. But in both cases the behavior doesn't change.
Can some point out to me how I can download a fixed version of subiquity or bette a fixed iso.

Revision history for this message
Franck Iaropoli (franck-iaropoli-arm) wrote :

Hi all,

@Chris Peterson is your fix already available in a daily build of Noble server iso (if this exists somewhere? I am only seeing builds for 24.10 https://cdimage.ubuntu.com/ubuntu-server/daily-live/current/)

Any plans to update the iso ?

Revision history for this message
Ozgur As (ozgur-as) wrote :

I've also injected the subiquity snap described like in the github but the crash continues.

the path to the executable that throws the error in the report shows:

/snap/ubuntu-desktop-bootstrap/171/bin/subiquity/subiquity/cmd/server.py

should we be updating the ubuntu-desktop-bootstrap snap instead of subiquity or both of them?

Revision history for this message
Bastian Schmidt (bastians) wrote :

@swer21 @ozgur-as I'm using the server image only, and that one works fine for me when injecting the latest snap.

I've installed the following packages on my system:

- xorriso
- squashfs-tools
- python3-debian
- liblz4-tool
- git
- python3-venv
- python3-yaml
- python3-pip
- snapd

Then, run:

# Downloading the snap
snap download subiquity --edge

# Run the actual inject command
./inject-subiquity-snap.sh <original iso path>.iso <path to the downloaded snap>.snap patched.iso

The issue should not appear anymore when using patched.iso.

Revision history for this message
David Boman (mail-davidboman) wrote :

Hi all,

I'm trying to get this to work with dekstop but have the same issue.

Tried injecting the subiquity snap without luck. Also tried the latest daily desktop image from https://cdimage.ubuntu.com/noble/daily-live/current but the bug is still present.

Anyone found a work around or solution for desktop?

Revision history for this message
Ozgur As (ozgur-as) wrote :

apparently the desktop iso uses the subiquity binary packed in the ubuntu-desktop-bootstrap snap which contains the ubiquity version from the LTS day. that's why injecting subiquity edge snap to the desktop iso doesn't work. also the ubuntu-desktop-bootstrap snaps in daily desktop images are pointing to the old subiquity binaries, so even the candidate snaps for ubuntu-desktop-bootstrap doesn't have the fixed subiquity.

you can see my question and the answers at github:
https://github.com/canonical/ubuntu-desktop-provision/issues/753

Revision history for this message
Sebastian Grebe (swer21) wrote :

@bastians Thanks for your instructions. I was already doing this but know I have a working ISO.
Im not sure if this was relevant but in the passed I only replaced the iso in my setup. This time I also replaced the initrd and vmlinuz

Revision history for this message
D Ledford (dledford-work) wrote :

We ran into the same issue with Subiquity erroring out on an invalid "broadcast" key.
We're using a locally hosted 'user-data' on an HTTP server with the 'nocloud' datasource.

We figured out that the Subiquity snap could be updated to the one with the fix before it parses the file by inserting a cloud-init 'runcmd' into our 'user-data' that's processed by the installer live image 'cloud-init' run on boot.
We then had to disable the Subiquity installer refresh in the 'autoinstall' key or it would "upgrade" from the edge snap to the stable snap, and break again.

i.e.
#cloud-config
### SECTION01: Configures user-data values for the cloud-init that launches the LiveOS installer image.
...
runcmd:
  - systemctl disable --now ssh
  - systemctl mask ssh
  - snap refresh subiquity --edge
...
### SECTION02: The autoinstall key turns into /autoinstall.yaml that is read by Subiquity on startup, the early-commands are run, and then /autoinstall.yaml is re-read.
autoinstall:
  version: 1
  refresh-installer:
    update: no
...

That fixed the issue for us. Once the 24.04 ISO is updated or the stable Subiquity snap version is updated we can update/remove that cloud-init 'runcmd' from our 'user-data' as it (hopefully) won't be needed.

Thankfully this avoided us needing to spin a custom install ISO with an updated Subiquity snap. We can just use the release ISO synced to the local mirror.

Revision history for this message
David Boman (mail-davidboman) wrote :

This issue has now been fixed in the latest pending release! (2024-06-12)

https://cdimage.ubuntu.com/noble/daily-live/pending/

Revision history for this message
Raymond van der velden (rayhvh) wrote :

this is still not fixed for the latest ubuntu server 24.04. absolutely ridiculous if you ask me. https://cdimage.ubuntu.com/ubuntu-server/noble/daily-live/current/

Revision history for this message
Kirt Runolfson (kirtr) wrote : Re: [Bug 2062988] Re: Desktop netboot crashes on startup due to cloud-init schema validation handling

Ubuntu server's netbooting autoinstall should work, Raymond. I've been
using it since release.

Just the Desktop autoinstall was broken on the release, which I do have
working with a daily-live release.

On Tue, Jul 16, 2024 at 1:31 AM Raymond van der velden <
<email address hidden>> wrote:

> this is still not fixed for the latest ubuntu server 24.04. absolutely
> ridiculous if you ask me. https://cdimage.ubuntu.com/ubuntu-
> server/noble/daily-live/current/
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/2062988
>
> Title:
> Desktop netboot crashes on startup due to cloud-init schema validation
> handling
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/subiquity/+bug/2062988/+subscriptions
>
>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.