Interrupting the build later than the load_gadget_yaml step creates broken images

Bug #2055152 reported by Oliver Grawert
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu Image
Fix Released
High
Paul Mars

Bug Description

When pausing ubuntu-image with the -u option to i.e. inspect the populated workdir and then resuming the image build always results in broken filesystems ...

To reproduce run an image build that stops with the -u option and define a workdir using the -w option like:

`ubuntu-image snap --debug -w workdir -u generate_disk_info ubuntu-core-22-arm64+raspi.model-assertion`

then simply restart the build using the -r option:

`ubuntu-image snap --debug -w workdir -r ubuntu-core-22-arm64+raspi.model-assertion`

results in the following:

```
$ LC_ALL=C sudo partx -av pi.img
partition: none, disk: pi.img, lower: 0, upper: 0
Trying to use '/dev/loop180' for the loop device
/dev/loop180: partition table type 'dos' detected
range recount: max partno=1, lower=0, upper=0
/dev/loop180: partition #1 added
$ LC_ALL=C sudo mount /dev/loop180p1 mnt/
mount: /home/ogra/datengrab/uc22-test/mnt: wrong fs type, bad option, bad superblock on /dev/loop180p1, missing codepage or helper program, or other error
$
```

this is 100% reproducible on amd64 pc builds and arm64 pi builds, it seems to make no difference if the model assertion is "signed" or "dangerous"

this is slightly fatal since we have customers that need to inject certificate files for automatically onboarding their enterprise network on first boot (and they can not ship that cert in a snap for obvious reasons), if you want to add this file to the "systems/$DATE" directory, you can not use the "load_gadget_yaml" step to interrupt, but need to at least stop at "populate_rootfs_contents" which will then result in an unbootable image ...

dmesg shows the following (but nothing more):
```
[25292407.714934] loop180: detected capacity change from 0 to 7168000
[25292445.112522] FAT-fs (loop180p1): bogus number of reserved sectors
[25292445.112526] FAT-fs (loop180p1): Can't find a valid FAT filesystem
```

Oliver Grawert (ogra)
description: updated
description: updated
Revision history for this message
Paul Mars (upils) wrote :

Hey Oliver,

This may not make sense for your use case, but were you able to reproduce after stopping at a different step than generate_disk_info? We already did some improvements on the resume feature in the past months but we may have missed something.

Revision history for this message
Oliver Grawert (ogra) wrote :

it works fine up to the load_gadget_yaml step, anything after that seems to produce a garbage filesystem ...

oh, and i forgot, the two test machines i ran it on were both 22.04, in desperation i updated one of them to latest/edge of ubuntu-image but with no different result ...

i then also tried it in a jammy lxd container which resulted in https://bugs.launchpad.net/ubuntu-image/+bug/1970150 (which i thought was fixed since a year ?)

Revision history for this message
Isaac True (itrue) wrote (last edit ):

I can reproduce this on my system using ubuntu-image 3.2 using this model https://github.com/snapcore/models/blob/master/ubuntu-core-22-amd64.model

$ ubuntu-image snap --debug -w workdir -u generate_disk_info ubuntu-core-22-amd64.model
[0] make_temporary_directories
duration: 230.332µs
[1] determine_output_directory
duration: 141ns
[2] prepare_image
WARNING: proceeding to download snaps ignoring validations, this default will change in the future. For now use --validation=enforce for validations to be taken into account, pass instead --validation=ignore to preserve current behavior going forward
Fetching snapd (20671)
Fetching pc-kernel (1646)
Fetching core22 (1122)
Fetching pc (146)
duration: 2m44.20546767s
[3] load_gadget_yaml
duration: 483.381µs
[4] set_artifact_names
duration: 1.252µs
[5] populate_rootfs_contents
duration: 265.69µs
$ ubuntu-image snap --debug -w workdir -r ubuntu-core-22-amd64.model
[6] generate_disk_info
duration: 571ns
[7] calculate_rootfs_size
duration: 3.419536ms
[8] populate_bootfs_contents
duration: 143.716005ms
[9] populate_prepare_partitions
duration: 693.805841ms
[10] make_disk
duration: 23.436820413s
[11] generate_manifest
duration: 70.399µs
[12] finish
duration: 281ns

So far, so good, ubuntu-image doesn't fail. However, the partition written to the disk image seems to consist entirely of zeroes:

$ sudo losetup -f pc.img
$ sudo partprobe /dev/loop127
$ sudo hexdump /dev/loop127p2
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
4b000000

The file in the working directory is correct and mountable:

$ file workdir/volumes/pc/part2.img
workdir/volumes/pc/part2.img: DOS/MBR boot sector, code offset 0x58+2, OEM-ID "mkfs.fat", Media descriptor 0xf8, sectors/track 63, heads 64, sectors 2457567 (volumes > 32 MB), FAT (32 bit), sectors/FAT 18905, serial number 0xde7299c8, label: "ubuntu-seed"
$ sudo losetup -d /dev/loop127
$ sudo losetup -f workdir/volumes/pc/part2.img
$ sudo fsck /dev/loop127
fsck from util-linux 2.39.1
fsck.fat 4.2 (2021-01-31)
/dev/loop127: 22 files, 930783/2419725 clusters
$ sudo mount /dev/loop127 /mnt/tmp
$ ls /mnt/tmp
EFI/ snaps/ systems/

The issues seems to be that the ubuntu-seed partition is not actually being written to.

Paul Mars (upils)
Changed in ubuntu-image:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Paul Mars (upils)
tags: added: foundations-todo
Paul Mars (upils)
Changed in ubuntu-image:
status: Confirmed → In Progress
Revision history for this message
Paul Mars (upils) wrote :

I found the root cause. It looks like the resume feature never worked with snap images. I have a fix but would like to be sure it will be reliable.

See https://github.com/canonical/ubuntu-image/pull/196. It also contains other small improvements around the resume feature to make it more reliable.

Paul Mars (upils)
Changed in ubuntu-image:
status: In Progress → Fix Committed
Revision history for this message
Paul Mars (upils) wrote :

The fix is now merged and available to test in latest/edge.

Paul Mars (upils)
Changed in ubuntu-image:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.