Mantic hangs during install with continuous 'Job snapd.seeded.service/start running' messages

Bug #2028862 reported by Frank Heimes
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Snappy Developers
snapd
Invalid
Undecided
Unassigned
subiquity
Invalid
Undecided
Michael Hudson-Doyle
livecd-rootfs (Ubuntu)
Fix Released
Undecided
Michael Hudson-Doyle
Jammy
Fix Released
Undecided
Dan Bungert

Bug Description

[ Impact ]

 * The subiquity systemd units and cloud-init 23.3 have an
   incompatibility that results in a stuck boot, when cloud-init
   23.3 is present on an unpatched install ISO.
   The symptom looks like the following:
   "A start job is waiting for Wait until snapd is fully seeded"
   This will never complete.

 * This is a necessary prerequisite to allow for Jammy dailies /
   eventually 22.04.4 with updated cloud-init.

[ Test Plan ]

 * Live-server Daily ISOs constructed with the fixed version
   should boot correctly to the Subqiuity TUI, regardless of the
   cloud-init version present in that ISO.

 * If desired, a test plan with ~16 steps can be provided with
   detailed steps to modify ISOs to simulate the bug and the fix.

[ Where problems could occur ]

 * Changes to livecd-rootfs cause frequent regressions to other
   build projects. The risk here is relatively localized to
   live-server builds as we are only modifying drop-in systemd
   files in use by live-server.

 * If this change is done incorrectly, when cloud-init 23.3 is
   SRUed and allowed to migrate, the boot of the live-server
   dailies may show the same problem as the original bug.

 * "The livecd-rootfs package is a frequent target of SRUs as part
   of development of changes to image builds for the target
   series, and is not intended for general installation on
   end-user systems. The risk of user-affecting regression is
   lower as a result, because the impact of changes to this
   package to end users is mediated by way of image builds."
https://wiki.ubuntu.com/StableReleaseUpdates?action=show&redirect=StableReleaseUpdate#livecd-rootfs

[ Other Info ]

 * Using kernel command line "systemd.mask=snapd.seeded.service"
   allows affected ISOs to boot successfully

---
previous description:

While trying to install the latest mantic ISO image (tried current and pending) on s390x (but it's probably not limited to a certain architecture) the installation hangs with never-ending messages like this:

M[K[[0;1;31m*[0m[0;31m* [0m] Job snapd.seeded.service/start running (16min 23s / no limit)
M[K[[0m[0;31m* [0m] Job snapd.seeded.service/start running (16min 23s / no limit)
M[K[[0;1;31m*[0m[0;31m* [0m] Job snapd.seeded.service/start running (16min 24s / no limit)
M[K[[0;31m*[0;1;31m*[0m[0;31m* [0m] Job snapd.seeded.service/start running (16min 24s / no limit)
M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] Job snapd.seeded.service/start running (16min 25s / no limit)
M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] Job snapd.seeded.service/start running (16min 25s / no limit)
M[K[ [0;31m*[0;1;31m*[0m[0;31m*[0m] Job snapd.seeded.service/start running (16min 26s / no limit)
M[K[ [0;31m*[0;1;31m*[0m] Job snapd.seeded.service/start running (16min 26s / no limit)
M[K[ [0;31m*[0m] Job snapd.seeded.service/start running (16min 27s / no limit)
M[K[ [0;31m*[0;1;31m*[0m] Job snapd.seeded.service/start running (16min 27s / no limit)
M[K[ [0;31m*[0;1;31m*[0m[0;31m*[0m] Job snapd.seeded.service/start running (16min 28s / no limit)
M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] Job snapd.seeded.service/start running (16min 28s / no limit)
M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] Job snapd.seeded.service/start running (16min 29s / no limit)
M[K[[0;31m*[0;1;31m*[0m[0;31m* [0m] Job snapd.seeded.service/start running (16min 29s / no limit)
M[K[[0;1;31m*[0m[0;31m* [0m] Job snapd.seeded.service/start running (16min 30s / no limit)
M[K[[0m[0;31m* [0m] Job snapd.seeded.service/start running (16min 30s / no limit)
M[K[[0;1;31m*[0m[0;31m* [0m] Job snapd.seeded.service/start running (16min 31s / no limit)
M[K[[0;31m*[0;1;31m*[0m[0;31m* [0m] Job snapd.seeded.service/start running (16min 31s / no

(sorry for the 'special' characters, but it's because I copied the content from the HMC console)

I was told that using the kernel arg "systemd.mask=snapd.seeded.service" should help to workaround this, still need to try this out ...

Related branches

Frank Heimes (fheimes)
summary: - Mantic hands during install with continuous 'Job
+ Mantic hangs during install with continuous 'Job
snapd.seeded.service/start running' messages
description: updated
Dan Bungert (dbungert)
Changed in subiquity:
status: New → Invalid
Revision history for this message
Frank Heimes (fheimes) wrote :

I can confirm that the kernel option "systemd.mask=snapd.seeded.service" is workaround (but things break again later, which is unrelated to this and will be reported separately).

Btw. the ISO image that I used is from Jul 23.

Revision history for this message
Frank Heimes (fheimes) wrote :
Download full text (4.7 KiB)

Output of "snap tasks --last=seed" after having booted using the workaround:

$ snap tasks --last=seed
Status Spawn Ready Summary
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Ensure prerequisites for "snapd" are available
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Prepare snap "/var/lib/snapd/seed/snaps/snapd_19460.snap" (19460)
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Mount snap "snapd" (19460)
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Copy snap "snapd" data
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Setup snap "snapd" (19460) security profiles
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Make snap "snapd" (19460) available to the system
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Automatically connect eligible plugs and slots of snap "snapd"
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Set automatic aliases for snap "snapd"
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Setup snap "snapd" aliases
Done 10 days ago, at 06:52 UTC today at 08:48 UTC Run install hook of "snapd" snap if present
Done 10 days ago, at 06:52 UTC today at 08:48 UTC Start snap "snapd" (19460) services
Done 10 days ago, at 06:52 UTC today at 08:48 UTC Run configure hook of "core" snap if present
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Ensure prerequisites for "core22" are available
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Prepare snap "/var/lib/snapd/seed/snaps/core22_820.snap" (820)
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Mount snap "core22" (820)
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Copy snap "core22" data
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Setup snap "core22" (820) security profiles
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Make snap "core22" (820) available to the system
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Automatically connect eligible plugs and slots of snap "core22"
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Set automatic aliases for snap "core22"
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Setup snap "core22" aliases
Done 10 days ago, at 06:52 UTC today at 08:48 UTC Run install hook of "core22" snap if present
Done 10 days ago, at 06:52 UTC today at 08:48 UTC Start snap "core22" (820) services
Done 10 days ago, at 06:52 UTC today at 08:48 UTC Run health check of "core22" snap
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Ensure prerequisites for "lxd" are available
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Prepare snap "/var/lib/snapd/seed/snaps/lxd_25113.snap" (25113)
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Mount snap "lxd" (25113)
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Copy snap "lxd" data
Done 10 days ago, at 06:52 UTC 10 days ago, at 06:52 UTC Setup snap "lxd" (25113) security profiles
Done 10 days ago...

Read more...

Revision history for this message
Frank Heimes (fheimes) wrote :

$ snap changes
ID Status Spawn Ready Summary
1 Done 10 days ago, at 06:52 UTC today at 08:48 UTC Initialize system state
2 Done today at 08:48 UTC today at 08:48 UTC Initialize device

Revision history for this message
Frank Heimes (fheimes) wrote :

$ journalctl -u snapd
Aug 02 08:48:43 hwe0004 systemd[1]: Starting snapd.service - Snap Daemon...
Aug 02 08:48:43 hwe0004 snapd[801]: overlord.go:272: Acquiring state lock file
Aug 02 08:48:43 hwe0004 snapd[801]: overlord.go:277: Acquired state lock file
Aug 02 08:48:43 hwe0004 snapd[801]: patch.go:64: Patching system state level 6 >
Aug 02 08:48:43 hwe0004 snapd[801]: patch.go:64: Patching system state level 6 >
Aug 02 08:48:43 hwe0004 snapd[801]: patch.go:64: Patching system state level 6 >
Aug 02 08:48:43 hwe0004 snapd[801]: daemon.go:247: started snapd/2.60.1+23.10 (>
Aug 02 08:48:43 hwe0004 snapd[801]: daemon.go:340: adjusting startup timeout by>
Aug 02 08:48:43 hwe0004 snapd[801]: backends.go:58: AppArmor status: apparmor i>
Aug 02 08:48:44 hwe0004 systemd[1]: Started snapd.service - Snap Daemon.

Changed in ubuntu-z-systems:
importance: Undecided → High
assignee: nobody → Snappy Developers (snappy-dev)
Revision history for this message
Sergio Cazzolato (sergio-j-cazzolato) wrote :

Hi Frank, thanks for raising this issue, I have some questions.

is it a desktop image right?

Are you using either real hardware or qemu? in case of qemu, could you please share the command line you are using?

do you have steps to reproduce it? or you are installing just with default parameters?

Thanks

Revision history for this message
Frank Heimes (fheimes) wrote (last edit ):

Hi Sergio,
that happens to me on s390x (but is probably not limited to it) and with that it's a server image.

And it's actually "real" hardware (as real as it can be on this platform).
Means it happens if I install on a logical partition (LPAR, which is as close to read hw as it can be),
but also happens if I try to install a z/VM guest (an instance on IBM's commercial hypervisor 'z/VM').

But Dan B. mentioned that he also saw that on an amd64 system.

I am (personally) just doing a default installation (like I did dozens of times with lunar, so in this regard I would even call it a regression).
And it's also reproducible, happens reliably every time.

Revision history for this message
Dan Bungert (dbungert) wrote :

Hi Sergio,

One reproducer is to use a VM on amd64. snapd.seeded.service on the mantic live-server will run seemingly forever, long after the jammy equivalent has finished startup.

A sample command line to show this is:
kvm -no-reboot -m 8G -bios /usr/share/qemu/OVMF.fd -cdrom mantic-live-server-amd64.iso

I'm testing today the 20230802 build of mantic-live-server-amd64 but this isn't specific to today's build.

Revision history for this message
Sergio Cazzolato (sergio-j-cazzolato) wrote :

Something interesting I just saw is that I just can reproduce the error when running with 1 processor on s390x qemu emulator, when I assign 2 processors to the vm the seeding is completed without errors and snapd.service does not fail to start.

I'll continue digging on the issue.

Revision history for this message
Dan Bungert (dbungert) wrote :

Also reported on risc-v, so I think this is non architecture specific.
Interesting that adjusting the number of processors affects the outcome like that!

Revision history for this message
Heinrich Schuchardt (xypron) wrote :
Revision history for this message
Frank Heimes (fheimes) wrote (last edit ):

Just fyi: I ran into this on 'real' s390x systems (means not emulated), that all have multiple processors.
(If it should be helpful, I could reconfigure one of these to have only one processor).

With the fact that it affects amd64, risc-v and s390x - I would say it's proven that it is non architecture specific.

Dan Bungert (dbungert)
Changed in subiquity:
status: Invalid → New
tags: added: foundations-todo
Changed in livecd-rootfs (Ubuntu):
assignee: nobody → Michael Hudson-Doyle (mwhudson)
Changed in livecd-rootfs (Ubuntu):
status: New → In Progress
Changed in subiquity:
status: New → In Progress
assignee: nobody → Michael Hudson-Doyle (mwhudson)
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package livecd-rootfs - 23.10.12

---------------
livecd-rootfs (23.10.12) mantic; urgency=medium

  * live-build/auto/build: Avoid purging packages for ubuntu-cpc.
    With the switch to the ubuntu-cloud-minimal seed, we
    don't really need to purge anything now. On the contrary,
    the purging of packages if not installed, fails with the
    exit code of 100.

 -- Utkarsh Gupta <email address hidden> Tue, 08 Aug 2023 15:44:42 +0530

Changed in livecd-rootfs (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Dan Bungert (dbungert) wrote :

@Frank - please retest when you get a moment, should be working now.
Marking invalid for snapd, aside from extra logging or something I'm not sure what snapd would do.
Marking invalid for subiquity, as the code change for this is outside the subiquity codebase.

Changed in snapd:
status: New → Invalid
Changed in subiquity:
status: In Progress → Invalid
Revision history for this message
Heinrich Schuchardt (xypron) wrote :

With today's mantic-live-server-riscv64.img.gz the problem cannot be observed anymore.

Revision history for this message
Patricia Domingues (patriciasd) wrote :

Ok. Don't see issue anymore- testing a s390x LPAR: The installation could be completed using image `20230817` (but couldn't complete post-install boot the system due to LP#2029388).

Revision history for this message
Frank Heimes (fheimes) wrote :

Also tried on z/VM and this problem also didn't happened to me anymore (but later bumped into LP#2029479 again).

So with that I will mark the affected 'ubuntu-z-systems' project entry as Fix Released.

Many thx!

Changed in ubuntu-z-systems:
status: In Progress → Fix Released
Dan Bungert (dbungert)
Changed in livecd-rootfs (Ubuntu):
milestone: none → ubuntu-22.04.4
milestone: ubuntu-22.04.4 → none
Changed in livecd-rootfs (Ubuntu Jammy):
assignee: nobody → Dan Bungert (dbungert)
Dan Bungert (dbungert)
description: updated
Dan Bungert (dbungert)
description: updated
Dan Bungert (dbungert)
Changed in livecd-rootfs (Ubuntu Jammy):
status: New → In Progress
Revision history for this message
Steve Langasek (vorlon) wrote : Please test proposed package

Hello Frank, or anyone else affected,

Accepted livecd-rootfs into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/livecd-rootfs/2.765.25 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in livecd-rootfs (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
Dan Bungert (dbungert) wrote :

Verification steps looked a little different than normal, as I needed to run a livefs build using the updated livecd-rootfs. I did so from a PPA, not clear how to persuade the livefs-builder to grab from proposed otherwise.

https://launchpadlibrarian.net/685576720/buildlog_ubuntu_jammy_amd64_test_BUILDING.txt.gz

The desired changes are present in the resulting build files, and replacing an existing ISO with those files produces a working result. Two test builds were done, with and without cloud-init 23.3 present, and those are also both fine.

I feel comfortable calling the above verified.

tags: added: verification-done verification-done-jammy
removed: verification-needed verification-needed-jammy
Revision history for this message
Dan Bungert (dbungert) wrote :

Is anything further needed for the Jammy SRU?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I'm processing the SRUs today, and I see this bug in the list[1]. I'll get to it, right now I'm in the OEM ones a few lines above, for jammy.

I process the list from top to bottom, starting with the most recent ubuntu release (lunar in this case), so that I get to the oldest SRUs first.

1. https://ubuntu-archive-team.ubuntu.com/pending-sru.html

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Dan,

thanks for testing with both cloud-init versions (upcoming 23.3, and an older one).

But test really needs to be done with the package from jammy-proposed I'm afraid :/. There should be a way to use the package from proposed. I checked some previous bugs in this package, to see if this was done before, and came across https://bugs.launchpad.net/ubuntu/+source/livecd-rootfs/+bug/2016022 which mentions a PROPOSED=1 setting, would that be it perhaps?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Or how about this comment[1], it has even a command line:

bin/start-livefs-build --livefs=~lool/+livefs/ubuntu/jammy/test --arch=arm64 --metadata subarch='"tegra"' --pocket Proposed

1. https://bugs.launchpad.net/ubuntu/+source/livecd-rootfs/+bug/2015644/comments/11

Revision history for this message
Dan Bungert (dbungert) wrote :

Thank you for the testing suggestions. I have retested as suggested with a `--pocket Proposed` build and without, and the fix still looks correct.

https://launchpad.net/~dbungert/+livefs/ubuntu/jammy/test/+build/500745
https://launchpad.net/~dbungert/+livefs/ubuntu/jammy/test/+build/500746

Steve's build also looks correct

https://launchpad.net/~ubuntu-cdimage/+livefs/ubuntu/jammy/ubuntu-server-live/+build/500751

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package livecd-rootfs - 2.765.25

---------------
livecd-rootfs (2.765.25) jammy; urgency=medium

  [ Michael Hudson-Doyle ]
  * Remove additional dependencies from subiquity units as they are now
    interfering with the boot process. (LP: #2028862)

 -- Dan Bungert <email address hidden> Mon, 28 Aug 2023 14:13:58 -0600

Changed in livecd-rootfs (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Update Released

The verification of the Stable Release Update for livecd-rootfs has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Benjamin Drung (bdrung)
tags: removed: foundations-todo
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.