kernel refresh fail with non-gpt partitioning

Bug #2017297 reported by Paul Larson
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
Fix Committed
High
Alfonso Sanchez-Beato

Bug Description

We have a device where the kernel refresh is now failing. I thought this might be a kernel issue at first, but I was alerted to a recent change in snapd that might be the cause.

core 16-2.59.1 15151 latest/beta canonical✓ core,ignore-validation
I reverted to the core in stable and it also seems to be affected by this:
core 16-2.57.1 13745 latest/stable canonical✓ core
$ snap --version
snap 2.57.1
snapd 2.57.1
series 16
kernel 4.4.0-1054-cascade

$ sudo snap refresh
error: cannot perform the following tasks:
Update assets from kernel "XXXXX-kernel" (124) (cannot read current gadget snap details: invalid volume "u-boot-XXXXX": invalid structure: GPT header or GPT partition table overlapped with structure "u-boot"

In this case, it's an older uc16 based device that did not use gpt partitioning. Here's a section of the gadget.yaml:

...
    structure:
      - name: u-boot
        type: bare
        size: 512000
        offset: 0
        content:
...

Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :
Changed in snapd:
status: New → Fix Committed
importance: Undecided → Critical
importance: Critical → High
assignee: nobody → Alfonso Sanchez-Beato (alfonsosanchezbeato)
Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :

Released in 2.59.3

Revision history for this message
Paul Larson (pwlars) wrote :

I've tried refreshing this kernel on this same system after installing the core 16-2.59.4 from beta but I'm still not successful. I don't get the same error, but it still fails. After the system reboots, I'm unable to connect for about 15 minutes, then when I can finally connect, it looks like it's rolled back to the previous kernel:
2023-05-24T19:15:29Z ERROR cannot finish cascade-kernel installation, there was a rollback across reboot

I don't see any other indications about what went wrong, except that it looks like the first step that had an error was connecting plugs and slots:
Status Spawn Ready Summary
Done today at 19:02 UTC today at 19:16 UTC Ensure prerequisites for "cascade-kernel" are available
Undone today at 19:02 UTC today at 19:16 UTC Download snap "cascade-kernel" (148) from channel "latest/beta"
Done today at 19:02 UTC today at 19:15 UTC Fetch and check assertions for snap "cascade-kernel" (148)
Undone today at 19:02 UTC today at 19:15 UTC Mount snap "cascade-kernel" (148)
Undone today at 19:02 UTC today at 19:15 UTC Run pre-refresh hook of "cascade-kernel" snap if present
Undone today at 19:02 UTC today at 19:15 UTC Stop snap "cascade-kernel" services
Undone today at 19:02 UTC today at 19:15 UTC Remove aliases for snap "cascade-kernel"
Undone today at 19:02 UTC today at 19:15 UTC Make current revision for snap "cascade-kernel" unavailable
Done today at 19:02 UTC today at 19:15 UTC Update assets from kernel "cascade-kernel" (148)
Undone today at 19:02 UTC today at 19:15 UTC Copy snap "cascade-kernel" data
Undone today at 19:02 UTC today at 19:15 UTC Setup snap "cascade-kernel" (148) security profiles
Undone today at 19:02 UTC today at 19:15 UTC Make snap "cascade-kernel" (148) available to the system
Error today at 19:02 UTC today at 19:15 UTC Automatically connect eligible plugs and slots of snap "cascade-kernel"
Hold today at 19:02 UTC today at 19:15 UTC Set automatic aliases for snap "cascade-kernel"
Hold today at 19:02 UTC today at 19:15 UTC Setup snap "cascade-kernel" aliases
Hold today at 19:02 UTC today at 19:15 UTC Run post-refresh hook of "cascade-kernel" snap if present
Hold today at 19:02 UTC today at 19:15 UTC Start snap "cascade-kernel" (148) services
Hold today at 19:02 UTC today at 19:15 UTC Remove data for snap "cascade-kernel" (132)
Hold today at 19:02 UTC today at 19:15 UTC Remove snap "cascade-kernel" (132) from the system
Hold today at 19:02 UTC today at 19:15 UTC Clean up "cascade-kernel" (148) install
Hold today at 19:02 UTC today at 19:15 UTC Run configure hook of "cascade-kernel" snap if present
Hold today at 19:02 UTC today at 19:15 UTC Run health check of "cascade-kernel" snap
Done today at 19:02 UTC today at 19:16 UTC Handling re-refresh of "cascade-kernel" as needed

Revision history for this message
Aristo Chen (aristochen) wrote :

Did a test on my device, I am able to refresh the kernel snap to rev148 without issue

The steps that I have done
1. refresh core snap to latest/beta, system reboot automatically
2. refresh kernel snap to latest/beta, system reboot automatically
3. login to system and check "snap list" command output
$ snap list
Name Version Rev Tracking Publisher Notes
...
cascade-kernel 4.4.0-1055.60 148 latest/beta canonical✓ kernel
core 16-2.59.4 15424 latest/beta canonical✓ core
...

Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote (last edit ):

I don't think that the failure is related to the original problem. Unfortunately the error shown by snap change can be confusing as actually it happens when you reboot to try the new kernel, that is actually a rollback across reboots.

Initially I thought that the problem is that the kernel snap in the system was revision 139, which is the one that cannot be reverted unless a special step is performed, but then I noticed this is an update from 132 to 148.

I compared the initramfs for these two and actually only differ in the kernel modules, so the problem is probably unrelated to it.

I suggest to capture serial output to see what is happening on the reboot.

Revision history for this message
Paul Larson (pwlars) wrote :

fwiw, I tried refreshing to 132 in stable and it also failed. I'll see if someone in the lab can get serial ouput from it.

Revision history for this message
Kevin Yeh (kevinyeh) wrote :

This is the serial output after refresh kernel to beta.
https://pastebin.canonical.com/p/8Qx2cTwc3G/

Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :

@Kevin could you please share output of snap list on the device?

Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :

In any case the issue is not related to the original bug, so I have opened LP:#2020884

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.