Desktop netboot crashes on startup due to cloud-init schema validation handling

Bug #2062988 reported by Chris Peterson
80
This bug affects 10 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
New
Undecided
Unassigned
subiquity
In Progress
Medium
Chris Peterson
ubuntu-desktop-provision
Invalid
Undecided
Unassigned

Bug Description

When netbooting the desktop image, Subiquity crashes due to a KeyError trying to access cloud-config data that doesn't exist.

Subiquity inspects bad keys as reported by "cloud-init schema --system" and checks if they are misplaced autoinstall keys. Sometimes the bad keys are not in the combined cloud config that subiquity picks up, so a KeyError is thrown when subiquity tries to access this data:

https://github.com/canonical/subiquity/blob/74c37fe0c2d1c11c4e3cecad1712e37877d6970f/subiquity/server/server.py#L791

The interesting part is that this only happens when trying to netboot the desktop iso (server is fine) and especially the fact that by the time the system is up "cloud-init schema --system" reports no errors.

The offending key is "broadcast" according to the traceback and the only reference to this key is in the network-config at /var/lib/cloud/instances/nocloud/network-config.json which validates fine.

Output of "sudo cloud-init schema --system":

Found cloud-config data types: user-data, network-cofig

1. user-data at /var/lib/cloud/instances/nocloud/cloud-config.txt:
Empty 'cloud-config' found at /var/lib/cloud/instances/nocloud/cloud-config.txt.
 Nothing to validate

2. network-config at /var/lib/cloud/instances/nocloud/network-cofig.json:
  Valid schema network-config

No cloud-config is passed to the system.

ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: subiquity (unknown)
ProcVersionSignature: Ubuntu 6.8.0-22.22-generic 6.8.1
Uname: Linux 6.8.0-22-generic x86_64
NonfreeKernelModules: zfs
ApportVersion: 2.28.0-0ubuntu1
Architecture: amd64
CasperMD5CheckResult: pass
CasperVersion: 1.497
CloudArchitecture: x86_64
CloudID: nocloud
CloudName: unknown
CloudPlatform: nocloud
CloudSubPlatform: seed-dir (/var/lib/cloud/seed/nocloud)
Date: Sat Apr 20 19:26:47 2024
ExecutablePath: /snap/ubuntu-desktop-bootstrap/149/bin/subiquity/subiquity/cmd/server.py
InterpreterPath: /snap/ubuntu-desktop-bootstrap/149/usr/bin/python3.10
LiveMediaBuild: Ubuntu 24.04 LTS "Noble Numbat" - Beta amd64 (20240418)
Lsusb: Error: command ['lsusb'] failed with exit code 1:
Lsusb-t:

Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
ProcAttrCurrent: snap.hostname-desktop-bootstrap.subiquity-server (complain)
ProcCmdline: /snap/hostname-desktop-bootstrap/149/usr/bin/python3.10 -m subiquity.cmd.server --use-os-prober --storage-version=2 --postinst-hooks-dir=/snap/hostname-desktop-bootstrap/149/etc/subiquity/postinst.d
ProcEnviron:
 LANG=C.UTF-8
 LD_LIBRARY_PATH=<set>
 PATH=(custom, no user)
ProcKernelCmdLine: BOOT_IMAGE=linux iso-url=http://10.0.0.138/noble-desktop-amd64-04-19.iso ip=dhcp ---
Python3Details: /usr/bin/python3.12, Python 3.12.3, python3-minimal, 3.12.3-0ubuntu1
PythonDetails: N/A
SnapUpdated: False
SourcePackage: subiquity
Title: unknown error crashed with KeyError
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 02/06/2015
dmi.bios.release: 0.0
dmi.bios.vendor: EFI Development Kit II / OVMF
dmi.bios.version: 0.0.0
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-jammy
dmi.modalias: dmi:bvnEFIDevelopmentKitII/OVMF:bvr0.0.0:bd02/06/2015:br0.0:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-jammy:cvnQEMU:ct1:cvrpc-i440fx-jammy:sku:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-jammy
dmi.sys.vendor: QEMU

Revision history for this message
Chris Peterson (cpete) wrote :
summary: - Desktop netboot crashes on startup
+ Desktop netboot crashes on startup due to cloud-init schema validation
+ handling
Changed in subiquity:
status: Confirmed → Triaged
Chris Peterson (cpete)
Changed in ubuntu-desktop-provision:
status: New → Invalid
Chris Peterson (cpete)
Changed in subiquity:
importance: High → Medium
Revision history for this message
Dan Bungert (dbungert) wrote :

I mistrust having cloud-init in the snap due to mis-match problem with the host system, and this bug is another version of that problem.

An older version of cloud-init is in the snap - SRU pending LP: #2056100 - and a newer version outside the snap. The copy of cloud-init in the snap is what is used to run `cloud-init schema --system`. I have attached a script that allows one to run arbitrary programs as if they were in the subiquity-like snap, and doing so with `cloud-init schema --system` shows the following:

+ cloud-init schema --system
Found cloud-config data types: user-data, network-config

1. user-data at /var/lib/cloud/instances/nocloud/cloud-config.txt:
Empty 'cloud-config' found at /var/lib/cloud/instances/nocloud/cloud-config.txt. Nothing to validate.

2. network-config at /var/lib/cloud/instances/nocloud/network-config.json:
  Invalid network-config /var/lib/cloud/instances/nocloud/network-config.json
  Error: Cloud config schema errors: config.0.subnets.0: Additional properties are not allowed ('broadcast' was unexpected)

Error: Invalid schema: network-config

----

The options to resolve this and fix desktop netboot appear to be:
1. rebuild snaps with cloud-init 24.1.3, either after SRU or forcing the issue now with PPA tricks
2. run `cloud-init schema --system` not from the snap, but from the host
3. adjust handling after `cloud-init schema --system` so that it is not allowed to fail the install

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote :

Is there any workaround hat can be used right now? I tested the beta to be prepared for the new release, and netboot and autoinstall were fine there. That such a bug was introduced between the beta and the final release is really annoying, I'm sorry to say.

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote :

OK, a possible workaround is to boot the system with kernel parameter cloud-init=disabled, clicking through the first few installer pages, and then manually specifying an autoinstall URL.

(But don't mistype the URL, otherwise the installer will crash. Clicking the button to show the error will result in a blank, black page in the installer window. After this, the installer won't work again in the same session, but a reboot is necessary. So these are four bugs in a row, when including the bug this issue is about.

Normally I'm not the one to say such a thing, and this is not meant as a personal attack on anybody, but: Having an installer in such a state in an LTS release should be embarrassing to Canonical!)

Revision history for this message
Ozgur As (ozgur-as) wrote :

just wanted to add that booting the last beta iso over netboot with a proper cloud-config-url to autoinstall worked without problems. it's broken with the lts iso, getting a crash with a complain of a missing "broadcast" key just like described in the main report.

Revision history for this message
Yiu-Chung Lee (lee-yiu-chung) wrote :

I found another workaround:

1. Close the failed installer and get into desktop environment

2. Open terminal, sudo to root, and enter these these commands:
# rm /var/lib/cloud/instance/network-config.json
# systemctl restart snap.ubuntu-desktop-bootstrap-subiquity-server

3. Open the installer again, it should work now

Chris Peterson (cpete)
Changed in subiquity:
assignee: nobody → Chris Peterson (cpete)
Chris Peterson (cpete)
Changed in subiquity:
status: Triaged → In Progress
Revision history for this message
Ozgur As (ozgur-as) wrote :

is there a workaround to run netboot autoinstall unattended? what's the proper way to delete network-config.json from the extracted iso since it's coming from a snap? or is replacing the cloud-init snap in the iso possible? early-commands don't work because cloud-init is already crashed before the commands are processed.

Revision history for this message
Kendall Link (kendall-link77) wrote :

Just testing this now on an air-gaped network. I can confirm that what Marian Rainer-Harbach (marianrh) wrote on 2024-04-26 is correct. By passing the cloud-init=disabled option kernel parameter in PXE the crash doesn't occur and I am able to actually install Ubuntu 24.04 as expected.

This at least gets me to a point where I can proceed testing with my autoinstall.yaml file but as Ozgur As (ozgur-as) wrote on 2024-04-30 I will also be looking to implement a completely hands off provisioning process via PXE.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.