huge and slow image 20181002 due to seeded lxd snap

Bug #1796137 reported by Scott Moser
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-images
Fix Released
High
Unassigned
snapd (Ubuntu)
Triaged
High
Paweł Stołowski

Bug Description

The current squashfs/lxd image (20181002) is almost twice the size:
  170M -> 323M
as the previous image (20180928) and boots dramatically slower:
  10s -> 38s

At first, just looking at manifests you'd blame systemd change, but I
think the real problem is seeded snaps.

$ lxc image list ubuntu-daily: | grep 18.10 | grep x86_64
| c (7 more) | 6cad1409aaac | yes | ubuntu 18.10 amd64 (daily) (20181002) | x86_64 | 323.85MB | Oct 2, 2018 at 12:00am (UTC) |
| | 21f94ea409ac | yes | ubuntu 18.10 amd64 (daily) (20180928) | x86_64 | 170.25MB | Sep 28, 2018 at 12:00am (UTC) |
| | 5011b07c621c | yes | ubuntu 18.10 amd64 (daily) (20180926) | x86_64 | 170.25MB | Sep 26, 2018 at 12:00am (UTC) |

## The old image
$ lxc launch ubuntu-daily:21f94ea409ac cosmic-test
$ lxc exec cosmic-test -- sh -xc 'cat /etc/cloud/build.info; dpkg-query --show | pastebinit; systemd-analyze; systemd-analyze critical-chain'
+ cat /etc/cloud/build.info
build_name: server
serial: 20180928
+ dpkg-query --show
+ pastebinit
http://paste.ubuntu.com/p/gyrNWkdSRG/
+ systemd-analyze
Startup finished in 11.822s (userspace)
graphical.target reached after 10.139s in userspace
+ systemd-analyze critical-chain
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

graphical.target @10.139s
└─multi-user.target @10.139s
  └─snapd.seeded.service @6.041s +4.096s
    └─snapd.service @5.770s +269ms
      └─basic.target @5.698s
        └─sockets.target @5.698s
          └─snapd.socket @5.694s +3ms
            └─sysinit.target @5.682s
              └─cloud-init.service @4.676s +1.003s
                └─systemd-networkd-wait-online.service @1.659s +3.015s
                  └─systemd-networkd.service @1.608s +49ms
                    └─network-pre.target @1.605s
                      └─cloud-init-local.service @198ms +1.406s
                        └─systemd-journald.socket @187ms
                          └─-.mount @184ms
                            └─system.slice @184ms
                              └─-.slice @184ms

## Try the current one.
$ lxc launch ubuntu-daily:cosmic c2
$ lxc exec c2 -- sh -xc 'cat /etc/cloud/build.info; dpkg-query --show | pastebinit; systemd-analyze; systemd-analyze critical-chain'
+ cat /etc/cloud/build.info
build_name: server
serial: 20181002
+ dpkg-query --show
+ pastebinit
http://paste.ubuntu.com/p/MCkrygb9sM/
+ systemd-analyze
Startup finished in 38.639s (userspace)
graphical.target reached after 36.990s in userspace
+ systemd-analyze critical-chain
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

graphical.target @36.990s
└─multi-user.target @36.989s
  └─snapd.seeded.service @5.149s +31.839s
    └─snapd.service @12.863s +67ms
      └─snap-lxd-8959.mount @15.227s +15ms
        └─snap.mount @11.543s +14ms
          └─local-fs-pre.target @363ms
            └─keyboard-setup.service @164ms +198ms
              └─systemd-journald.socket @162ms
                └─-.mount @159ms
                  └─system.slice @159ms
                    └─-.slice @159ms

Heres a diff of their manifests:
$ diff -u /tmp/20180928.manifest /tmp/20181002.manifest | grep "^[-+]"
--- /tmp/20180928.manifest 2018-10-04 12:16:28.593773558 -0400
+++ /tmp/20181002.manifest 2018-10-04 12:16:57.945328943 -0400
-apparmor 2.12-4ubuntu7
+apparmor 2.12-4ubuntu8
-libapparmor1:amd64 2.12-4ubuntu7
+libapparmor1:amd64 2.12-4ubuntu8
-libnss-systemd:amd64 239-7ubuntu8
+libnss-systemd:amd64 239-7ubuntu9
-libpam-systemd:amd64 239-7ubuntu8
+libpam-systemd:amd64 239-7ubuntu9
-libsystemd0:amd64 239-7ubuntu8
+libsystemd0:amd64 239-7ubuntu9
-libudev1:amd64 239-7ubuntu8
+libudev1:amd64 239-7ubuntu9
-libzstd1:amd64 1.3.3+dfsg-2ubuntu1
+libzstd1:amd64 1.3.5+dfsg-1ubuntu1
-networkd-dispatcher 1.7-0ubuntu6
+networkd-dispatcher 1.7-0ubuntu8
-python3-configobj 5.0.6-2
+python3-configobj 5.0.6-3
-shared-mime-info 1.9-2
+shared-mime-info 1.10-1
-systemd 239-7ubuntu8
-systemd-sysv 239-7ubuntu8
+systemd 239-7ubuntu9
+systemd-sysv 239-7ubuntu9
-udev 239-7ubuntu8
+udev 239-7ubuntu9

Scott Moser (smoser)
description: updated
description: updated
Revision history for this message
Scott Moser (smoser) wrote :

# ls -lh /var/lib/snapd/seed/snaps/
total 154M
-rw------- 1 root root 88M Oct 3 03:55 core_5548.snap
-rw------- 1 root root 67M Oct 3 03:55 lxd_8959.snap

summary: - huge and slow image 20181002
+ huge and slow image 20181002 due to seeded lxd snap
Scott Moser (smoser)
Changed in cloud-images:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Dan Watkins (oddbloke) wrote :

Our testing suggests that we're now seeing ~22s of first boot slowdown across all the cloud platforms. We would expect to see some slowdown in when introducing a new pre-seeded snap (as there is more work for the system to do on first-boot), but that is excessive. Furthermore, we are _also_ seeing slowdown on subsequent boots (when the cost of an extra snap should be negligible).

I'm pretty reliably seeing this 14s wait in the execution of the lxd.activate unit, both on first boot and subsequent boots, which accounts for a large part of it:

-- Logs begin at Thu 2018-10-04 20:59:47 UTC, end at Thu 2018-10-04 21:05:31 UTC. --
Oct 04 21:00:09 cosmic-test systemd[1]: Starting Service for snap application lxd.activate...
Oct 04 21:00:09 cosmic-test lxd.activate[1844]: => Starting LXD activation
Oct 04 21:00:09 cosmic-test lxd.activate[1844]: ==> Creating missing snap configuration
Oct 04 21:00:09 cosmic-test lxd.activate[1844]: ==> Loading snap configuration
Oct 04 21:00:09 cosmic-test lxd.activate[1844]: ==> Checking for socket activation support
Oct 04 21:00:09 cosmic-test lxd.activate[1844]: ==> Creating "lxd" user
Oct 04 21:00:23 cosmic-test lxd.activate[1844]: useradd: /var/lib/extrausers/passwd.1911: No such file or directory
Oct 04 21:00:23 cosmic-test lxd.activate[1844]: useradd: cannot lock /var/lib/extrausers/passwd; try again later.
Oct 04 21:00:23 cosmic-test lxd.activate[1844]: ==> Setting LXD socket ownership
Oct 04 21:00:23 cosmic-test lxd.activate[1844]: ==> Checking if LXD needs to be activated
Oct 04 21:00:24 cosmic-test systemd[1]: Started Service for snap application lxd.activate.
-- Reboot --
Oct 04 21:05:13 cosmic-test systemd[1]: Starting Service for snap application lxd.activate...
Oct 04 21:05:15 cosmic-test lxd.activate[810]: => Starting LXD activation
Oct 04 21:05:15 cosmic-test lxd.activate[810]: ==> Loading snap configuration
Oct 04 21:05:15 cosmic-test lxd.activate[810]: ==> Checking for socket activation support
Oct 04 21:05:15 cosmic-test lxd.activate[810]: ==> Creating "lxd" user
Oct 04 21:05:29 cosmic-test lxd.activate[810]: useradd: /var/lib/extrausers/passwd.911: No such file or directory
Oct 04 21:05:29 cosmic-test lxd.activate[810]: useradd: cannot lock /var/lib/extrausers/passwd; try again later.
Oct 04 21:05:29 cosmic-test lxd.activate[810]: ==> Setting LXD socket ownership
Oct 04 21:05:29 cosmic-test lxd.activate[810]: ==> Checking if LXD needs to be activated
Oct 04 21:05:30 cosmic-test systemd[1]: Started Service for snap application lxd.activate.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

is that blocking on NSS user lookup on gaining network connectivity? Also on a classic system i would have expected for /var/lib/extrausers/passwd to not be used, as the user should be either systemd's DynamicUser or system pre-allocated one.

Revision history for this message
Dan Watkins (oddbloke) wrote :

I was able to SSH in before the service completed, so I don't think it's just a lack of network connectivity.

http://paste.ubuntu.com/p/pSrJJVD8zt/ is the full journalctl output from the (GCE) instance.

(I'm not 100% sure if this is relevant to the second part of your comment, Dimitri, but this is a snap service, so this script is running in a snap context (i.e. not fully "in" a classic system).)

Revision history for this message
Robert C Jennings (rcj) wrote :

Attaching the 'journalctl --boot' output after first boot of an AWS AMI from today (ubuntu/images-testing/hvm-ssd/ubuntu-cosmic-daily-amd64-server-20181002 - ami-03a19e7ba42cf1f55 in us-west-2) on an m5.large instance

Revision history for this message
Robert C Jennings (rcj) wrote :

No output was returned for:

$ snap logs lxd.activate
$ snap logs lxd.daemon

Here's the rest of the info on the instance/image used to capture the journalctl output:

ubuntu@ip-172-31-35-137:~$ cat /etc/cloud/build.info
build_name: server
serial: 20181002

ubuntu@ip-172-31-35-137:~$ snap list
Name Version Rev Tracking Publisher Notes
amazon-ssm-agent 2.3.68.0 734 stable/… aws✓ classic
core 16-2.35.2 5548 stable canonical✓ core
lxd 3.5 8959 stable/… canonical✓ -

Revision history for this message
Stéphane Graber (stgraber) wrote :

Can you paste the content of /etc/nsswitch.conf from an affected instanced?

It looks like you may have nss-extrausers installed, causing a mis-trigger of part of our activate logic. Hopefully that's the source of the issue and I can solved it easy enough in our snap.

Revision history for this message
Scott Moser (smoser) wrote :

repeat boots could I guess be slow as a result of bug 1786165.
The snaps that are seeded are never removed. perhaps snapd doesn't realize (or has to re-check) to verify that they are not new.

I'd suggest debugging and re-create in lxd itself.
Doing so allows you to jump into the container before networking is available and push/pull files easily.

$ lxc launch ubuntu-daily:cosmic my-test
$ lxc exec my-test /bin/bash
...

Revision history for this message
Scott Moser (smoser) wrote :

@Stgraber,
I'm not sure why you wouldnt' have just done this yourself, so maybe I'm
missing your request.

$ lxc init ubuntu-daily:cosmic my-test
Creating my-test
$ lxc file pull my-test/etc/nsswitch.conf - | pastebinit
http://paste.ubuntu.com/p/7rpZ2N4grn/

$ lxc start my-test

$ lxc exec my-test -- cloud-init status --wait
......................................................................................................
status: done

$ lxc file pull my-test/etc/nsswitch.conf -
# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.

passwd: files systemd
group: files systemd
shadow: files
gshadow: files

hosts: files dns
networks: files

protocols: db files
services: db files
ethers: db files
rpc: db files

netgroup: nis

Revision history for this message
Stéphane Graber (stgraber) wrote :

Because I suspect the GCE image may have a different looking one.

The one in the base image that you posted wouldn't trip the code that uses extrausers.

Revision history for this message
Stéphane Graber (stgraber) wrote :

To get the extrausers in the log above, you need this "grep -q "^passwd.*extrausers" /etc/nsswitch.conf" to be true, so I'm wondering what else may be different in this image's nsswitch and also how we ended up with extrausers in nsswitch but no extrausers file on disk.

I suspect the solution will be to change our check to be a bit more specific in what we're looking for (which is effectively an Ubuntu Core system where /etc/passwd isn't writable and extrausers is the way to get extra users/groups).

Revision history for this message
Robert C Jennings (rcj) wrote :

@stgraber:

# Grab the latest cosmic daily (20181002 is what I get)
#lxc launch ubuntu-daily:c recreate

$ lxc exec recreate -- grep "^passwd.*extrausers" /snap/core/current/etc/nsswitch.conf
passwd: compat extrausers

$ lxc exec recreate -- snap list
Name Version Rev Tracking Publisher Notes
core 16-2.35.2 5548 stable canonical✓ core
lxd 3.5 8959 stable/… canonical✓ -

Revision history for this message
Robert C Jennings (rcj) wrote :

As Scott showed in comment #9, /etc/nsswitch.conf in the images do not contain extrausers, but the core snap does.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Oh, I see what's going on, this piece of code used to be running AFTER we setup our custom /etc for the snap (which includes the real host nsswitch.conf) but now that it's moved over when we setup socket activation, this isn't true anymore.

I'll push a fix to the snap tomorrow, that should fix that part of the issue at least.

Revision history for this message
Stéphane Graber (stgraber) wrote :

I've updated the LXD snap, note that there's also a snapd issue which may be in play here as it's been known to needlessly delay boot with snaps using socket activation.

I've accepted the new snapd into cosmic now, so I think we'll want for both the lxd snap to be in stable (should be in the next couple hours) AND the new snapd to hit the release pocket, then re-test to see if we got this issue fixed.

Revision history for this message
Scott Moser (smoser) wrote :

Bah. my comment 8 above is wrong.
I meant to reference bug 1796165 (seeded snaps never cleaned up.)
That could affect boot time of subsequent boots, if snapd is re-reading those preseed snaps.

Revision history for this message
Stéphane Graber (stgraber) wrote :

The updated LXD snap should be available now in the stable channel.
The updated snapd is in cosmic-proposed and will hopefully migrate over the weekend.

Assuming that does happen, it'd be good to re-check cloud image on Monday.

Revision history for this message
Stéphane Graber (stgraber) wrote :

snapd has now migrated to the release pocket

Revision history for this message
Scott Moser (smoser) wrote :

Just as a point of reference, a single container booted on the same system I
showed the 38 second boot time before, seems to be down to 25 or so now.
Much better, but still not 10 seconds as it was.

Note that I don't call this a scientific test. I've made no real effort to
control the environment that I ran this in.

$ lxc launch ubuntu-daily:cosmic c2
Creating c2
Starting c2
$ lxc exec c2 /bin/bash
root@c2:~# systemd-analyze
Startup finished in 27.010s (userspace)
graphical.target reached after 25.342s in userspace
root@c2:~# systemd-analyze
Startup finished in 27.010s (userspace)
graphical.target reached after 25.342s in userspace

root@c2:~# dpkg-query --show snapd
snapd 2.35.4+18.10
root@c2:~# systemd-analyze critical-chain
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

graphical.target @25.342s
└─multi-user.target @25.341s
  └─snapd.seeded.service @5.268s +20.073s
    └─snapd.service @13.190s +65ms
      └─snap-lxd-9010.mount @18.054s +19ms
        └─snap.mount @11.631s +9ms
          └─local-fs-pre.target @437ms
            └─keyboard-setup.service @212ms +223ms
              └─systemd-journald.socket @204ms
                └─-.mount @199ms
                  └─system.slice @199ms
                    └─-.slice @199ms

Revision history for this message
Stéphane Graber (stgraber) wrote :

I've heard that even cloud images that were shipping snaps before the LXD inclusion are seeing a ~10s slowdown. If that's indeed the case, can we get a before/after for those with:

 - systemd-analyze critical-chain
 - journalctl --boot 0
 - ps fauxww

The LXD snap at boot time should be spending less than a second running "snap.lxd.activate" (equivalent to "lxd-containers" pre-snap) at which point it should decide not to wake the daemon and move on with startup without spending any more time doing LXD related things.

So it could be that something's wrong in "lxd.activate" similar to what I fixed on Friday or snapd is somehow spending a lot of time mounting stuff at boot time or we've sufficiently reshuffled the systemd boot sequence to cause a massive delay somewhere?

I think we should be focusing on this right now, before we look into images that didn't use to have any snaps inside them before.

Revision history for this message
Dan Watkins (oddbloke) wrote :

This is from a cosmic GCE instance _after_ the lxd snap has been seeded:

$ systemd-analyze critical-chain
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

graphical.target @23.527s
└─multi-user.target @23.527s
  └─chrony.service @24.515s +57ms
    └─basic.target @7.698s
      └─sockets.target @7.687s
        └─snap.lxd.daemon.unix.socket @21.088s
          └─snap-lxd-9010.mount @15.903s +28ms
            └─local-fs-pre.target @886ms
              └─systemd-tmpfiles-setup-dev.service @851ms +21ms
                └─systemd-sysusers.service @773ms +64ms
                  └─systemd-remount-fs.service @523ms +159ms
                    └─systemd-journald.socket @442ms
                      └─system.slice @428ms
                        └─-.slice @428ms

$ journalctl --boot 0 | pastebinit
http://paste.ubuntu.com/p/RYd5mDB76q/

$ ps fauxww | pastebinit
http://paste.ubuntu.com/p/yxKtVTTm9D/

(Before data coming up in a minute.)

Revision history for this message
Dan Watkins (oddbloke) wrote :

This is from a cosmic GCE instance _before_ the lxd snap was seeded (i.e. with just the google-cloud-sdk snap seeded):

$ systemd-analyze critical-chain
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

graphical.target @15.614s
└─multi-user.target @15.614s
  └─chrony.service @16.650s +56ms
    └─basic.target @7.478s
      └─sockets.target @7.466s
        └─snapd.socket @7.076s +305ms
          └─sysinit.target @6.806s
            └─cloud-init.service @4.715s +2.013s
              └─systemd-networkd-wait-online.service @3.136s +1.575s
                └─systemd-networkd.service @3.007s +109ms
                  └─network-pre.target @2.994s
                    └─cloud-init-local.service @802ms +2.181s
                      └─systemd-remount-fs.service @634ms +152ms
                        └─systemd-journald.socket @454ms
                          └─-.mount @425ms
                            └─system.slice @425ms
                              └─-.slice @425ms

$ journalctl --boot 0 | pastebinit
http://paste.ubuntu.com/p/Xj9wzRhQrb/

$ ps fauxww | pastebinit
http://paste.ubuntu.com/p/TgK2FRW3KN/

Revision history for this message
Dan Watkins (oddbloke) wrote :

As requested on IRC:

$ systemctl cat snap.lxd.daemon.unix.socket
# /etc/systemd/system/snap.lxd.daemon.unix.socket
[Unit]
# Auto-generated, DO NOT EDIT
Description=Socket unix for snap application lxd.daemon
Requires=snap-lxd-9010.mount
After=snap-lxd-9010.mount
X-Snappy=yes

[Socket]
Service=snap.lxd.daemon.service
FileDescriptorName=unix
ListenStream=/var/snap/lxd/common/lxd/unix.socket
SocketMode=0660

[Install]
WantedBy=sockets.target

Revision history for this message
Dan Watkins (oddbloke) wrote :

http://people.canonical.com/~dwatkins/1796137.svg is a `systemd-analyze plot` run on the instance with lxd.

Revision history for this message
Dan Watkins (oddbloke) wrote :
Download full text (6.2 KiB)

Some more details about what's happening inside snapd:

$ snap changes --abs-time
ID Status Spawn Ready Summary
1 Done 2018-10-09T20:35:36Z 2018-10-09T20:35:47Z Initialize system state
2 Done 2018-10-09T20:35:36Z 2018-10-09T20:35:38Z Initialize device

$ snap change --abs-time 1
Status Spawn Ready Summary
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:36Z Ensure prerequisites for "core" are available
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:36Z Prepare snap "/var/lib/snapd/seed/snaps/core_5548.snap" (5548)
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:37Z Mount snap "core" (5548)
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:37Z Copy snap "core" data
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:38Z Setup snap "core" (5548) security profiles
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:38Z Make snap "core" (5548) available to the system
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:39Z Automatically connect eligible plugs and slots of snap "core"
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:39Z Set automatic aliases for snap "core"
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:39Z Setup snap "core" aliases
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:39Z Run install hook of "core" snap if present
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:39Z Start snap "core" (5548) services
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:39Z Run configure hook of "core" snap if present
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:39Z Ensure prerequisites for "lxd" are available
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:39Z Prepare snap "/var/lib/snapd/seed/snaps/lxd_9010.snap" (9010)
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:39Z Mount snap "lxd" (9010)
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:39Z Copy snap "lxd" data
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:40Z Setup snap "lxd" (9010) security profiles
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:40Z Make snap "lxd" (9010) available to the system
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:41Z Automatically connect eligible plugs and slots of snap "lxd"
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:44Z Set automatic aliases for snap "lxd"
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:44Z Setup snap "lxd" aliases
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:44Z Run install hook of "lxd" snap if present
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:45Z Start snap "lxd" (9010) services
Done 2018-10-09T20:35:35Z 2018-10-09T20:35:45Z Run configure hook of "lxd" snap if present
Done 2018-10-09T20:35:36Z 2018-10-09T20:35:45Z Ensure prerequisites for "google-cloud-sdk" are available
Done 2018-10-09T20:35:36Z 2018-10-09T20:35:45Z Prepare snap "/var/lib/snapd/seed/snaps/google-cloud-sdk_55.snap" (55)
Done 2018-10-09T20:35:36Z 2018-10-09T20:35:46Z Mount snap "google-cloud-sdk" (55)
Done 2018-10-09T20:35:36Z 2018-10-09T20:35:46Z Copy snap "google-cloud-sdk" data
Done 2018-10-09T20:35:36Z 2018-10-09T20:35:46Z Setup snap "google-cloud-sdk" (55) security profiles
Done 2018-10-09T20:35:36Z 2018-10-09T20:35:46Z Make snap "google-cl...

Read more...

Revision history for this message
Michael Vogt (mvo) wrote :

@Dan could we get the same data (plot + output of snap changes/tasks) for the image *before* lxd was seeded as well? This way we can more easily compare.

Revision history for this message
Dan Watkins (oddbloke) wrote :

From a pre-LXD-seeding instance, the plot is at http://people.canonical.com/~dwatkins/1796137-before.svg and the `snap change` output is:

$ snap change --abs-time 1
Status Spawn Ready Summary
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:33Z Ensure prerequisites for "core" are available
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:33Z Prepare snap "/var/lib/snapd/seed/snaps/core_5328.snap" (5328)
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:33Z Mount snap "core" (5328)
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:33Z Copy snap "core" data
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:34Z Setup snap "core" (5328) security profiles
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:34Z Make snap "core" (5328) available to the system
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:34Z Automatically connect eligible plugs and slots of snap "core"
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:34Z Set automatic aliases for snap "core"
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:34Z Setup snap "core" aliases
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:34Z Run install hook of "core" snap if present
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:34Z Start snap "core" (5328) services
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:35Z Run configure hook of "core" snap if present
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:35Z Ensure prerequisites for "google-cloud-sdk" are available
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:35Z Prepare snap "/var/lib/snapd/seed/snaps/google-cloud-sdk_54.snap" (54)
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:35Z Mount snap "google-cloud-sdk" (54)
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:35Z Copy snap "google-cloud-sdk" data
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:35Z Setup snap "google-cloud-sdk" (54) security profiles
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:35Z Make snap "google-cloud-sdk" (54) available to the system
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:35Z Automatically connect eligible plugs and slots of snap "google-cloud-sdk"
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:36Z Set automatic aliases for snap "google-cloud-sdk"
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:36Z Setup snap "google-cloud-sdk" aliases
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:36Z Run install hook of "google-cloud-sdk" snap if present
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:36Z Start snap "google-cloud-sdk" (54) services
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:36Z Run configure hook of "google-cloud-sdk" snap if present
Done 2018-10-09T20:41:30Z 2018-10-09T20:41:36Z Mark system seeded

......................................................................
Make snap "core" (5328) available to the system

2018-10-09T20:41:34Z INFO Requested daemon restart.

$ snap change --abs-time 2
Status Spawn Ready Summary
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:33Z Generate device key
Done 2018-10-09T20:41:32Z 2018-10-09T20:41:33Z Request device serial

Revision history for this message
Michael Vogt (mvo) wrote :

Thanks Dan! Thats really helpful. So from the log you provided it looks like the follow happend:

Boot time went from up by 8s (from 15s to 23s):
 - "graphical.target reached after 15.614s in userspace" (before)
 - "graphical.target reached after 23.527s in userspace" (after)

Seeding time went up by 7s:
 - "snapd.seeded.service (7.244s)" (before)
 - "snapd.seeded.service (14.183s)" (after)

Installing lxd via seeding takes takes around 10s (some in parallel with the installing of the cloud-sdk though):
 - ~3-4s of the install time goes into connecting the plugs which means apparmor operations

I think we should look into making connections faster, iirc this is a pain point for CE on slow (arm) hardware as well.

Revision history for this message
Michael Vogt (mvo) wrote :

Some work has been done to make connections after in https://github.com/snapcore/snapd/pull/5469 - this will be part of the upcoming 2.36 release which will most likely not be ready before cosmic though :/

Revision history for this message
Stéphane Graber (stgraber) wrote :

Subscribed the LXD team directly as we are obviously interested in this issue.

Then moved the bug from lxd over to snapd since as far as LXD's concerned, we've fixed the bug on our side (activate hanging for 10s), with this fixed, LXD spends less than a second on boot to check whether it's expected to startup or not. I'm actually testing a change to this logic which may shorten the process slightly but we're talking miliseconds at this point.

affects: lxd (Ubuntu) → snapd (Ubuntu)
Revision history for this message
Ian Johnson (anonymouse67) wrote :

I'm assign Pawel here as he has been working on some first-boot speed improvements for cloud images specifically. Also, since this was reported we now have `snap debug timings` command, which would be really useful to look at for slow first boot. For a first-boot, the first snap change is always seeding, so the output of `snap debug timings 1` would be useful in understanding what's still slow here.

Changed in snapd (Ubuntu):
status: New → Triaged
assignee: nobody → Paweł Stołowski (stolowski)
importance: Undecided → High
Revision history for this message
Stéphane Graber (stgraber) wrote :

I believe work in snapd was done to improve seeding so that this wouldn't be such an issue anymore.

Has this been done and are the cloud images making use of it now?
If so, this can probably be closed.

Revision history for this message
Scott Moser (smoser) wrote :

@stgraber,
The bug subject says "Huge and slow". Shame on me to make 1 bug for 2 issues.

But wrt "is this fixed", groovy images are still dramatically bigger *and*
slower to boot than bionic images.

a.) Huge. Still looks much bigger, 'Huge' can obviously be argued.
346M for current groovy daily versus 179MB for bionic.

  $ lxc image list ubuntu-daily: | grep 20.10 | grep x86_64
  | g (7 more) | 7a5b11633006 | yes | ubuntu 20.10 amd64 (daily) (20200601) | x86_64 | CONTAINER | 346.31MB | Jun
 1, 2020 at 12:00am (UTC) |

b.) Slow.
groovy is much faster than originally reported, but still >1.5x longer
than bionic. bionic booted here in 8.2s and groovy in 15.5s.

https://paste.ubuntu.com/p/pjJD6t4YCQ/

Revision history for this message
Soren Hansen (soren) wrote :

I just used the focal cloud images on systems with no internet connectivity and snapd is adding at least 20 seconds of delay:

Jun 18 14:04:59 ubuntu systemd[1]: Finished Service for snap application lxd.activate.
Jun 18 14:05:21 ubuntu snapd[1013]: stateengine.go:150: state ensure error: persistent network error: Get https://api.snapcraft.io/api/v1/snaps/sections: dial tcp: lookup api.snapcraft.io: Temporary failure in name resolution
Jun 18 14:05:22 ubuntu systemd[1]: Finished Wait until snapd is fully seeded.

This is pretty frustrating. I don't need neither snapd nor lxd on these systems.

John Chittum (jchittum)
Changed in cloud-images:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.