Live session, snapd fails to start or behaves strangely on slow systems

Bug #1970477 reported by Aaron Rainbolt
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
snapd
New
Undecided
Unassigned

Bug Description

It has been reported on the Lubuntu Discourse forum that, when booting into a live session using the Lubuntu 22.04 ISO, there are strange Snap errors that occur on certain hardware. This usually manifests itself as a "Failed to start Snap daemon" error, and since Firefox has been distributed as a snap in Ubuntu 22.04, this results in an inability to use Firefox within the live session. (The full thread is here: https://discourse.lubuntu.me/t/bug-failed-snap-daemon-in-lubuntu-22-04-lts-no-firefox-help-please/3246)

From my own testing, I have found that booting both Lubuntu 22.04 and Ubuntu 22.04 ISOs in very slow VMs (QEMU with no KVM) usually results in snapd failures (and Firefox doesn't load), while booting the same ISOs in fast VMs (virt-manager, QEMU/KVM) reveals no problems - no snapd errors, Firefox launches just fine.

Steps to reproduce:

1: Ensure that qemu-system-x86_64 is installed, and that you have a Lubuntu 22.04 ISO available.
2: In a terminal, cd to the directory containing the Lubuntu 22.04 ISO.
3: Run the following command: "qemu-system-x86_64 -cpu kvm64 -m 4G ./lubuntu-22.04-desktop-amd64.iso"
4: Select "Try or Install Lubuntu", and wait for the VM to boot. It will take a long time, since KVM is purposefully left off.
5: Once the live session has booted, press Ctrl+Alt+T to bring up a terminal.
6: Type "firefox" and press Enter. It is very likely that the command will fail, though the particular error you get may vary.
7: Type "nano /var/log/syslog" and press Enter. Search for "Failed to start Snap". You will almost certainly find the "Failed to start Snap daemon" error.

Expected result: Firefox should launch in step 6, no Snap errors should be present in step 7.

Actual result: Firefox fails to launch in step 6, Snap errors are found in step 7.

Extra info:

This bug appears to be hardware-dependent. This is my setup:

    HP Z220 SFF Workstation
    Intel Xeon processor E3-1225v2
    16 GB of A-TECH RAM (DDR3, 1600MHz, no ECC)
    Lubuntu 22.04 ISO is located on a 12TB WD Easystore USB drive. ISO checks as good using sha256sum.
    Host OS is Ubuntu Studio 21.10.

When booting the Lubuntu 22.04 ISO in virt-manager (QEMU/KVM), everything works as expected - Firefox launches, no Snap errors are found in the log. **This bug is not reproducible for me on a fast VM.**

This problem is **not** unique to Lubuntu - I have also booted Ubuntu 22.04 in pure QEMU (no KVM), and found similar errors. In the Ubuntu 22.04 test, the window that allows me to select "Try Ubuntu" never appears - instead, an error message showed up saying that Ubuntu had experienced an internal error. I was able to switch to a TTY, and check /var/log/syslog - there were three "Failed to start Snap daemon" errors in the log.

Booting up both Lubuntu 22.04 and Ubuntu 22.04 images in virt-manager (QEMU/KVM) with slow graphics (VGA instead of QXL) still allowed Firefox to launch without issues.

The particular error you get in step 6 may vary. In one test, I was told that Firefox was not installed. In another test, I received this error message:

    error: error running snapctl: cannot invoke snapctl operation commands (here "is-connected") from outside of a snap
    ERROR: not connected to the gnome-3-38-2004 content interface.

Both of these tests were performed using Lubuntu 22.04 in pure QEMU (no KVM).

I have not yet tested either Lubuntu 22.04 or Ubuntu 22.04 on bare metal, but on the Lubuntu Discourse thread linked above, multiple users experienced problems on bare metal similar to the ones I'm experiencing, while other users were not able to reproduce the issues on bare metal. I intend to test these images on a relatively slow Chromebook (Dell Chromebook 3120) to see if I can reproduce the issue there.

I have the QEMU instance from one of my failed tests still open and paused, so if you need info from the logs from a known-bad boot, I have them. My Internet connection is metered, and I gave the failed test VM 4 GB of RAM, so I don't want to snapshot the VM and upload the snapshot file, but I could probably reproduce the issue in a VM with 1 GB of RAM, snapshot that, and then upload it.

Tags: jammy snapd
tags: added: jammy snapd
description: updated
description: updated
summary: - snapd fails to start or behaves strangely on slow systems
+ Live session, snapd fails to start or behaves strangely on slow systems
description: updated
Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

Thank you for reporting this. I was able to partially reproduce the problem in a VM. I think there are at least 2 issues that interact in a bad way.

First, snapd uses a systemd watchdog to monitor itself. We extend the watchdog wait for startup to complete by some generous estimate, but that estimate proves to be too low for a very slow VM. When inspecting `journalctl -u snapd` you would see a log similar to this:

Apr 27 07:38:10 lubuntu snapd[858]: daemon.go:340: adjusting startup timeout by 1m0s (pessimistic estimate of 30s plus 5s per snap)
Apr 27 07:38:13 lubuntu snapd[858]: backend.go:146: snapd enabled root filesystem on overlay support, additional upperdir permissions granted
Apr 27 07:39:11 lubuntu systemd[1]: snapd.service: start operation timed out. Terminating.
Apr 27 07:40:41 lubuntu systemd[1]: snapd.service: State 'stop-sigterm' timed out. Killing.
Apr 27 07:40:41 lubuntu systemd[1]: snapd.service: Killing process 858 (snapd) with signal SIGKILL.
Apr 27 07:40:41 lubuntu systemd[1]: snapd.service: Main process exited, code=killed, status=9/KILL

Then, firefox is one of the seeded snaps. So it will be installed in the system, but running the firefox's configure hook fails, because it takes way longer than expected to even start it. In this case, the problem appears to be in snap-confine, when it launches a program to adjust the snap's mount namespace according to the layout it requires, there is a 30s timeout, which is hit before the setup is completed:

Apr 27 07:43:01 lubuntu snapd[1156]: taskrunner.go:271: [change 1 "Run configure hook of \"firefox\" snap if present" task] failed: run hook "configure":
Apr 27 07:43:01 lubuntu snapd[1156]: -----
Apr 27 07:43:01 lubuntu snapd[1156]: sanity timeout expired: Interrupted system call
Apr 27 07:43:01 lubuntu snapd[1156]: update.go:85: cannot change mount namespace according to change mount (/var/lib/snapd/hostfs/usr/share/libreoffice/help /usr/share/libreoffice/help none bind,ro 0 0): cannot create directory "/usr/share/libreoffice/help": permission denied
Apr 27 07:43:01 lubuntu snapd[1156]: cannot send command 1 to helper process: Broken pipe
Apr 27 07:43:01 lubuntu snapd[1156]: -----

Since the hook failed, the whole process gets undone, and hence firefox is unavailable. Now, snapd has support for preseeding, but I'm not sure if it was used in this scenario, though the timinigs data suggest that plug connections took only couple ms, so I'm guessing pre-generated profiles were used after all.

Some logs I colleted from the VM:
- full log here: https://pastebin.ubuntu.com/p/x9Yk7xPFR9/
- snap change 1: https://pastebin.ubuntu.com/p/qV8WqGw4dM/
- snap debug seeding: https://pastebin.ubuntu.com/p/zJc9Z9fmRJ/
- timinigs of seeding: https://pastebin.ubuntu.com/p/8yzcpqNXmN/

The question is whether the problem happens only in a painfully slow VM (TBH I can't even run the 'Try Ubuntu' in such VM), or does it also happen on real, but slow machines?

Revision history for this message
Aaron Rainbolt (arraybolt3) wrote :

Thanks for your help!

On the Lubuntu forum, it *is* happening on real but slow machines. The problem surfaced first when a user was attempting to use the Lubuntu 22.04 live session from a burned DVD. The live session takes ~20 minutes to boot in this scenario, and the original poster noticed that Firefox was gone once the system got booted up. Various different members on the forum have tried to replicate the issue with varying degrees of success (one person with 2008-2010 hardware was able to replicate it to some degree on bare metal with both USB keys and DVDs, while other people with 8th gen Intel i7s weren't able to reproduce it even with DVDs).

I finally managed to do a bare metal test on a Dell Chromebook 3120 (a old, slow, expired Chromebook) but was unable to reproduce the issue there.

What I find interesting is that, even with the very slow boot time induced by a DVD, otherwise fast hardware doesn't show up this problem, while even with the faster boot times provided by a USB key, slow hardware may exhibit this issue.

Another interesting point is that the issue can show up in different ways across different boots on the painfully slow VM. My first boot of Lubuntu in pure QEMU resulted in Firefox vanishing from the system menu entirely. Attempting to start it from a terminal ended up with it telling me it wasn't installed at all. A second boot *from the same ISO* resulted in Firefox appearing. A third boot from the same ISO also ended up with Firefox appearing, but being unable to launch.

The fact that the particular error can vary depending on the boot suggests to me that this might be a race condition, though with the way I'm booting the VM, the boot time might be able to vary widely, which might cause the same problem.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.