Desktop netboot crashes on startup due to cloud-init schema validation handling

Bug #2062988 reported by Chris Peterson
104
This bug affects 15 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
New
Undecided
Unassigned
subiquity
In Progress
Medium
Chris Peterson
ubuntu-desktop-provision
Invalid
Undecided
Unassigned

Bug Description

When netbooting the desktop image, Subiquity crashes due to a KeyError trying to access cloud-config data that doesn't exist.

Subiquity inspects bad keys as reported by "cloud-init schema --system" and checks if they are misplaced autoinstall keys. Sometimes the bad keys are not in the combined cloud config that subiquity picks up, so a KeyError is thrown when subiquity tries to access this data:

https://github.com/canonical/subiquity/blob/74c37fe0c2d1c11c4e3cecad1712e37877d6970f/subiquity/server/server.py#L791

The interesting part is that this only happens when trying to netboot the desktop iso (server is fine) and especially the fact that by the time the system is up "cloud-init schema --system" reports no errors.

The offending key is "broadcast" according to the traceback and the only reference to this key is in the network-config at /var/lib/cloud/instances/nocloud/network-config.json which validates fine.

Output of "sudo cloud-init schema --system":

Found cloud-config data types: user-data, network-cofig

1. user-data at /var/lib/cloud/instances/nocloud/cloud-config.txt:
Empty 'cloud-config' found at /var/lib/cloud/instances/nocloud/cloud-config.txt.
 Nothing to validate

2. network-config at /var/lib/cloud/instances/nocloud/network-cofig.json:
  Valid schema network-config

No cloud-config is passed to the system.

ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: subiquity (unknown)
ProcVersionSignature: Ubuntu 6.8.0-22.22-generic 6.8.1
Uname: Linux 6.8.0-22-generic x86_64
NonfreeKernelModules: zfs
ApportVersion: 2.28.0-0ubuntu1
Architecture: amd64
CasperMD5CheckResult: pass
CasperVersion: 1.497
CloudArchitecture: x86_64
CloudID: nocloud
CloudName: unknown
CloudPlatform: nocloud
CloudSubPlatform: seed-dir (/var/lib/cloud/seed/nocloud)
Date: Sat Apr 20 19:26:47 2024
ExecutablePath: /snap/ubuntu-desktop-bootstrap/149/bin/subiquity/subiquity/cmd/server.py
InterpreterPath: /snap/ubuntu-desktop-bootstrap/149/usr/bin/python3.10
LiveMediaBuild: Ubuntu 24.04 LTS "Noble Numbat" - Beta amd64 (20240418)
Lsusb: Error: command ['lsusb'] failed with exit code 1:
Lsusb-t:

Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
ProcAttrCurrent: snap.hostname-desktop-bootstrap.subiquity-server (complain)
ProcCmdline: /snap/hostname-desktop-bootstrap/149/usr/bin/python3.10 -m subiquity.cmd.server --use-os-prober --storage-version=2 --postinst-hooks-dir=/snap/hostname-desktop-bootstrap/149/etc/subiquity/postinst.d
ProcEnviron:
 LANG=C.UTF-8
 LD_LIBRARY_PATH=<set>
 PATH=(custom, no user)
ProcKernelCmdLine: BOOT_IMAGE=linux iso-url=http://10.0.0.138/noble-desktop-amd64-04-19.iso ip=dhcp ---
Python3Details: /usr/bin/python3.12, Python 3.12.3, python3-minimal, 3.12.3-0ubuntu1
PythonDetails: N/A
SnapUpdated: False
SourcePackage: subiquity
Title: unknown error crashed with KeyError
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 02/06/2015
dmi.bios.release: 0.0
dmi.bios.vendor: EFI Development Kit II / OVMF
dmi.bios.version: 0.0.0
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-jammy
dmi.modalias: dmi:bvnEFIDevelopmentKitII/OVMF:bvr0.0.0:bd02/06/2015:br0.0:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-jammy:cvnQEMU:ct1:cvrpc-i440fx-jammy:sku:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-jammy
dmi.sys.vendor: QEMU

Revision history for this message
Chris Peterson (cpete) wrote :
summary: - Desktop netboot crashes on startup
+ Desktop netboot crashes on startup due to cloud-init schema validation
+ handling
Changed in subiquity:
status: Confirmed → Triaged
Chris Peterson (cpete)
Changed in ubuntu-desktop-provision:
status: New → Invalid
Chris Peterson (cpete)
Changed in subiquity:
importance: High → Medium
Revision history for this message
Dan Bungert (dbungert) wrote :

I mistrust having cloud-init in the snap due to mis-match problem with the host system, and this bug is another version of that problem.

An older version of cloud-init is in the snap - SRU pending LP: #2056100 - and a newer version outside the snap. The copy of cloud-init in the snap is what is used to run `cloud-init schema --system`. I have attached a script that allows one to run arbitrary programs as if they were in the subiquity-like snap, and doing so with `cloud-init schema --system` shows the following:

+ cloud-init schema --system
Found cloud-config data types: user-data, network-config

1. user-data at /var/lib/cloud/instances/nocloud/cloud-config.txt:
Empty 'cloud-config' found at /var/lib/cloud/instances/nocloud/cloud-config.txt. Nothing to validate.

2. network-config at /var/lib/cloud/instances/nocloud/network-config.json:
  Invalid network-config /var/lib/cloud/instances/nocloud/network-config.json
  Error: Cloud config schema errors: config.0.subnets.0: Additional properties are not allowed ('broadcast' was unexpected)

Error: Invalid schema: network-config

----

The options to resolve this and fix desktop netboot appear to be:
1. rebuild snaps with cloud-init 24.1.3, either after SRU or forcing the issue now with PPA tricks
2. run `cloud-init schema --system` not from the snap, but from the host
3. adjust handling after `cloud-init schema --system` so that it is not allowed to fail the install

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote :

Is there any workaround hat can be used right now? I tested the beta to be prepared for the new release, and netboot and autoinstall were fine there. That such a bug was introduced between the beta and the final release is really annoying, I'm sorry to say.

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote :

OK, a possible workaround is to boot the system with kernel parameter cloud-init=disabled, clicking through the first few installer pages, and then manually specifying an autoinstall URL.

(But don't mistype the URL, otherwise the installer will crash. Clicking the button to show the error will result in a blank, black page in the installer window. After this, the installer won't work again in the same session, but a reboot is necessary. So these are four bugs in a row, when including the bug this issue is about.

Normally I'm not the one to say such a thing, and this is not meant as a personal attack on anybody, but: Having an installer in such a state in an LTS release should be embarrassing to Canonical!)

Revision history for this message
Ozgur As (ozgur-as) wrote :

just wanted to add that booting the last beta iso over netboot with a proper cloud-config-url to autoinstall worked without problems. it's broken with the lts iso, getting a crash with a complain of a missing "broadcast" key just like described in the main report.

Revision history for this message
Yiu-Chung Lee (lee-yiu-chung) wrote :

I found another workaround:

1. Close the failed installer and get into desktop environment

2. Open terminal, sudo to root, and enter these these commands:
# rm /var/lib/cloud/instance/network-config.json
# systemctl restart snap.ubuntu-desktop-bootstrap-subiquity-server

3. Open the installer again, it should work now

Chris Peterson (cpete)
Changed in subiquity:
assignee: nobody → Chris Peterson (cpete)
Chris Peterson (cpete)
Changed in subiquity:
status: Triaged → In Progress
Revision history for this message
Ozgur As (ozgur-as) wrote :

is there a workaround to run netboot autoinstall unattended? what's the proper way to delete network-config.json from the extracted iso since it's coming from a snap? or is replacing the cloud-init snap in the iso possible? early-commands don't work because cloud-init is already crashed before the commands are processed.

Revision history for this message
Kendall Link (kendall-link77) wrote :

Just testing this now on an air-gaped network. I can confirm that what Marian Rainer-Harbach (marianrh) wrote on 2024-04-26 is correct. By passing the cloud-init=disabled option kernel parameter in PXE the crash doesn't occur and I am able to actually install Ubuntu 24.04 as expected.

This at least gets me to a point where I can proceed testing with my autoinstall.yaml file but as Ozgur As (ozgur-as) wrote on 2024-04-30 I will also be looking to implement a completely hands off provisioning process via PXE.

Revision history for this message
Bernhard Suttner (sbernhard) wrote :

Will Ubuntu release a new version of the ISO image? AFAIK, the same procedure worked in the beta ISO images and was just broken in the release ISO image. It would be good to solve this issue and then release a new ISO image!

Revision history for this message
Bastian Schmidt (bastians) wrote :

I'm experiencing the same issue not only with the Desktop version, but also when running an Autoinstall (nocloud) installation of the 24.04 server image:

The 24.04 server image ISO uses subiquity 24.04.1 which relies on its snap-bundled cloud-init 23.4.4 (even though cloud-init 24.1.3 comes with the installer). I'm ending up in the same "broadcast" error mentioned in #2056460.
When running "sudo cloud-init schema --system" in the paused/stopped installer, it uses the installed deb-version cloud-init 24.1.3 which does not fail when validating the network schema.

The known issues of subiquity 24.04.1 mention this error only for the Desktop installation: https://discourse.ubuntu.com/t/subiquity-24-04-1-has-been-released-to-the-stable-channel/44493#known-issues-14

Can someone verify this for the Server installation? What are the chances that a new ISO image is released soon?

Revision history for this message
AYUNTAMIENTO DE VERA (aytovera) wrote :

I can confirm it @bastians, I have the same error and the same thing happens in the latest Jammy iso server
https://cdimage.ubuntu.com/ubuntu-server/jammy/daily-live/current/

Revision history for this message
Franck Iaropoli (franck-iaropoli-arm) wrote :

I just started a new build and it's working with refresh-installer set to true and channel set to use latest/edge

  refresh-installer:
    update: true
    channel: "latest/edge"

# snap info subiquity
name: subiquity
summary: Ubuntu installer
publisher: Canonical✓
store-url: https://snapcraft.io/subiquity
contact: https://bugs.launchpad.net/subiquity
license: unset
description: |
  The Ubuntu server installer
commands:
  - subiquity.curtin
  - subiquity.probert
  - subiquity
services:
  subiquity.subiquity-server: simple, enabled, active
  subiquity.subiquity-service: simple, enabled, active
snap-id: ba2aj8guta0zSRlT3QM5aJNAUXPlBtf9
tracking: latest/stable
refresh-date: today at 14:27 UTC
channels:
  latest/stable: 24.04.1 2024-04-25 (5741) 21MB classic
  latest/candidate: ↑
  latest/beta: 24.04.1 2024-04-17 (5741) 21MB classic
  latest/edge: 24.10-devel+git40.7a2002bc 2024-05-10 (5803) 21MB classic
installed: 24.04.1

Revision history for this message
Bastian Schmidt (bastians) wrote :

@franck-iaropoli-arm thanks for your response!
I tried several builds today but the refresh-installer option does not change any behavior for me. Even a minimal user-data config like:

  #cloud-config
  autoinstall:
    version: 1
    refresh-installer:
      update:true
      channel: "latest/edge"

Just prompts me to the "broadcast" error.

Can you maybe share your user-data file?

Revision history for this message
Franck Iaropoli (franck-iaropoli-arm) wrote :

Oh no this doesn't work anymore (I was using beta iso yesterday and final release one today).

Refreshing the subiquity snap after the error makes the install to continue

snap refresh subiquity --channel=latest/edge

I found this other bug report https://bugs.launchpad.net/ubuntu/+source/subiquity/+bug/2063813 and comment https://bugs.launchpad.net/ubuntu/+source/subiquity/+bug/2063813/comments/3

So we need a new iso with a more recent subiquity version ?

Revision history for this message
Bastian Schmidt (bastians) wrote :

@@franck-iaropoli-arm I think we need a new iso, yes.

I can confirm that updating subiquity manually makes the install to continue.

Revision history for this message
Ozgur As (ozgur-as) wrote :

even if the workaround of using the edge channel worked today, we wouldn't know of the concurrent versions that would be pushed to the edge channel going forward.

is there a method for manually modifying the iso to replace the subiquity snap with the latest version? 24.04.1 is scheduled for august and unattended install in the lts is broken.

Revision history for this message
Kendall Link (kendall-link77) wrote :

I'll also throw my hat into the "Hoping for an ISO" pool. All of the systems I'm trying to netboot are all on an air-gaped network. For me this means that the installer can't dynamically call home and update. If there is a way to update the version of the subiquity snap on the ISO I'd be open to giving that a try.

Either way... I'm patiently waiting for the status of this bug to change ;)

Revision history for this message
Franck Iaropoli (franck-iaropoli-arm) wrote :

thanks for the confirmation all and agree we need a new iso or a way to update subiquity on current one.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.