snapd.seeded.service never finishes on non-amd64

Bug #1944004 reported by Sergio Durigan Junior
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Undecided
Unassigned
cloud-images
Fix Released
Critical
John Chittum
snapd
Fix Released
Undecided
Unassigned
docker.io (Ubuntu)
Invalid
Undecided
Unassigned
livecd-rootfs (Ubuntu)
Fix Released
Undecided
John Chittum
snapd (Ubuntu)
Fix Released
Critical
Ian Johnson

Bug Description

When launching an lxd container on non-amd64 architectures, snapd.seeded.service hangs during "boot" time and never finishes, which ends up impacting other systemd units.

This bug was found while investigating what I thought was a cloud-init problem, because after launching a container I'd see:

# cloud-init status
status: running

and the status would never transition to "done". James Falcon (cloud-init dev) helped with the investigation and found that snapd was actually the culprit here. One can check it by doing:

# systemd-analyze blame
Bootup is not yet finished (org.freedesktop.systemd1.Manager.FinishTimestampMonotonic=0).
Please try again later.
Hint: Use 'systemctl list-jobs' to see active jobs
# systemctl list-jobs
JOB UNIT TYPE STATE
102 cloud-init.target start waiting
110 cloud-final.service start waiting
2 multi-user.target start waiting
131 snapd.autoimport.service start waiting
103 cloud-config.service start waiting
1 graphical.target start waiting
111 snapd.seeded.service start running
141 systemd-update-utmp-runlevel.service start waiting

8 jobs listed.
# systemctl status snapd.seeded.service
● snapd.seeded.service - Wait until snapd is fully seeded
     Loaded: loaded (/lib/systemd/system/snapd.seeded.service; enabled; vendor preset: enabled)
     Active: activating (start) since Fri 2021-09-17 19:46:40 UTC; 24min ago
   Main PID: 336 (snap)
      Tasks: 8 (limit: 4777)
     Memory: 35.6M
     CGroup: /system.slice/snapd.seeded.service
             └─336 /usr/bin/snap wait system seed.loaded

Sep 17 19:46:40 docker systemd[1]: Starting Wait until snapd is fully seeded...

Inspecting snapd.service's logs, I see some strange messages (for example, a segmentation fault on taskrunner.go) that could be related to this problem. I'm attaching the logs to this bug.

This issue is blocking docker.io on impish-proposed, which in turn is impacting the build of our OCI images for Ubuntu 21.10.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Adding a docker.io to reflect that this bug is blocking the package on impish-proposed.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

A few details/clarifications:

1) I can confirm that stopping the snapd.{service,socket} unblocks the other services and allows the boot to finish (cloud-init finally reports "done").

2) The problem happens when one launches an lxd container from a non-amd64 architecture. For example, I have an s390x host machine and cloud-init has successfully finished in it, but when I launch an ubuntu-daily:impish/s390x container, I can reproduce the problem inside it.

Revision history for this message
Ian Johnson (anonymouse67) wrote :

To be clear you see this behavior with normal containers and not with nested containers? Snapd doesn't support nested containers unfortunately

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Correct, this happens with normal containers. FWIW, I can provide access to the s390x machine I'm using (from canonistack) if needed. Thanks.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hi CPC folks,

It looks like there's something wrong with the non-amd64 LXD images for Ubuntu Impish. stgraber kindly helped me diagnose the issue and noticed that the problem is likely in the images themselves. Please refer to this issue:

https://github.com/lxc/lxd/issues/9263

or, more specifically, to his comment here:

https://github.com/lxc/lxd/issues/9263#issuecomment-924451756

It seems there's some problem snapd's initialization during the image build, which eventually makes snapd.seeded.service block the container boot and causes other troubles (described in this bug's description).

This is currently blocking the docker.io package from migration from impish-proposed.

Revision history for this message
John Chittum (jchittum) wrote :

ACK. I've started looking into it. initial things I can see:

1. there have been no code changes to how the image is being made. code is

https://git.launchpad.net/livecd-rootfs/tree/live-build/ubuntu-cpc/hooks.d/base/create-root-dir.binary

and

https://git.launchpad.net/livecd-rootfs/tree/live-build/ubuntu-cpc/hooks.d/base/root-xz.binary

2. from a date when our tests were 100% green (20210904) to our first failing tests (20210907) there was not an update to snapd

3. The failing case with `lxc launch ubuntu-daily:impish test-container && lxc shell test-container` should be pulling our last successfully tested rootfs, 202010904. We need to check to see what rootfs / container is being pulled on that command, and correlate to any possible changes.

there's more triage to be done on this issue.

Revision history for this message
John Chittum (jchittum) wrote :

Manifest Diff from 20210903 to 20210904 is now attached

20210903 rootfs is working fine. I did the following:

1. started an ARM image on Oracle
2. downloaded http://cloud-images.ubuntu.com/impish/20210903/impish-server-cloudimg-arm64-root.tar.xz and http://cloud-images.ubuntu.com/impish/20210903/impish-server-cloudimg-arm64-lxd.tar.xz
3. lxc image import impish-server-cloudimg-arm64-lxd.tar.xz impish-server-cloudimg-arm64-root.tar.xz --alias impish-20210903
4. lxc launch impish-20210903
5. lxc shell <image_name>
6. snap changes <done 20 days ago. done today>
7. cloud-init status <done>

repeat for 20210904 and see see the reproduction of the error (snap initialize system state ERROR)

The diff shows what packages changed between the working and failing versions. There is no snapd difference between working and failing versions.

Next steps for me are to dig into what changes may have occurred in livecd-rootfs that could change build behaviour, and to dig into the package diffs themselves and see if there is a specific package causing the issue in state.

Changed in cloud-images:
importance: Undecided → Critical
Revision history for this message
John Chittum (jchittum) wrote :

livecd-rootfs, used for building this, had a release on 20210830 and on 20210914. This feels like it may be a dead end, as I should have been able to reproduce the error on the 20210903 image built with the 20210830 release of livecd-rootfs if there was an issue in the base build scripts.

for cpc extras, there were changes, but not to any paths touched by a rootfs build (they were to hyperv and oracle builds).

At this point, I'm leaning towards a package change rather than a build change.

Revision history for this message
John Chittum (jchittum) wrote :

Apologies, i had some file confusion locally in creating the diff of packages, which led to confusion. Attached is the proper diff. I'll also attached the manifest used to generate this diff

From the list of package changes here, I'm not seeing a smoking gun. `cloud-init` upreved, as well as a bunch of gcc

Revision history for this message
John Chittum (jchittum) wrote :
Revision history for this message
John Chittum (jchittum) wrote :
Revision history for this message
John Chittum (jchittum) wrote :

We have reproducers on arm64 and s390x now. I'm seeing the same things as the snapd.log in

https://bugs.launchpad.net/cloud-images/+bug/1944004/comments/1

any attempts to restart the service do the same thing. So something has set up snapd with incorrect permissions. At this point, I'm going to have to defer to someone on the snapd team for further debugging help.

Revision history for this message
Ian Johnson (anonymouse67) wrote :
Download full text (3.2 KiB)

So some thoughts:

1. The root failure of snapd failing to finish seeding is that it is trying to run `udevadm trigger --subsystem-nomatch=input` which ends up dying on every write like so:

openat(AT_FDCWD, "/sys/devices/virtual/workqueue/writeback/uevent", O_WRONLY|O_NOCTTY|O_NOFOLLOW|O_CLOEXEC) = -1 EACCES (Permission denied)
writeback: Failed to write 'change' to '/sys/devices/virtual/workqueue/writeback/uevent': Permission denied

which is peculiar to me because that command fails on both the 03 and the 04 image, but only causes snapd to fail to seed on the 04 image for some reason. This is triply peculiar to me because that same command run in a 21.04 container (amd64 and arm64) works fine without issues. Quadruply peculiar is that the same command in a 21.10 daily container on amd64 actually also fails, but again does not cause snapd to fail seeding on the amd64 image. My guess is that maybe snapd doesn't think it needs to run this command on the other 21.10 images and so it doesn't die on this command, but for some reason on the new arm64 images it thinks it has to run this command. Quintuply peculiar is the failure around EACCES.

2. If you diff the manifest between the image you see this change:

-libc-bin 2.33-0ubuntu9
-libc6:arm64 2.33-0ubuntu9
+libc-bin 2.34-0ubuntu2
+libc6:arm64 2.34-0ubuntu2

I have no reason to believe that libc6 is the root of this problem, other than the fact that it has been at the root of many other problems that present themselves in weird ways especially when containers are involved.

3. The specific task that snapd is running when the failure happens seems to be either setup-profiles for the snapd snap OR running the install hook for lxd, although I don't think it is related at all to the specific install hook for the lxd snap since that doesn't seem to do anything real and it fails in different ways, sometimes it's a segmentation fault and sometimes it just says the hook exited 1.

4. If I modify the seed.yaml to not seed the lxd snap and then obliterate state.json and restart snapd, it now proceeds to be able to seed properly, so I think there is something about the lxd snap + the state of the system which implies to snapd that it needs to setup the udev backend which runs the failing command, most likely the fact that the lxd snap uses interfaces (even though it's interfaces are rather pointless in that they don't really have any policy). Indeed I can also successfully seed a snap like jq in the seed.yaml which has no interfaces used and it doesn't trigger snapd to run udev things. So part of the story is definitely that snapd thinks it needs to generate udev backend things for the lxd snap while seeding and that causes it to run that udevadm command which fails. Why this doesn't affect amd64 though is unclear to me.

5. The two rootfs' provided above are both preseeded with the same set of snaps with the same assertions and same seed.yaml file, so it doesn't appear to be a snapd regression or some trigger from the snaps themselves.

Finally, I also note that this is also reproducible on a Raspberry Pi too if anyone wants to debug on their own, just run on a Pi:

lxc launch ubuntu-daily:...

Read more...

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

So one thing that I think is going on here is that the udev noise is caused by trying to _undo_ the "Setup snap "snapd" (12886) security profiles" task. In the state.json of the image, that task is done:

    "7": {
      "id": "7",
      "kind": "setup-profiles",
      "summary": "Setup snap \"snapd\" (12886) security profiles",
      "status": 4,
      ...
    },

(4 is DoneStatus in overlord/state/change.go in snapd). So that's a red herring, or at least, not the underlying problem. The underlying problem is that "Run install hook of "lxd" snap if present" is failing (this task is not done in the state.json of either the 20210903 or 20210904 -- in fact those two state.json file differ only in timestamps and cookies afaict). So the next question: why is the install hook failing? snap changes 1 only has this to say about this task:

Run install hook of "lxd" snap if present

2021-09-24T01:53:16Z ERROR run hook "install":
-----
cmd_run.go:410: DEBUG: SELinux not enabled
tracking.go:44: DEBUG: creating transient scope snap.lxd.hook.install
tracking.go:305: DEBUG: created transient scope as object: /org/freedesktop/systemd1/job/339
-----

which is not very helpful. This output isn't present for the working image but maybe output for successful tasks isn't saved? I don't know how this works.

Revision history for this message
Ian Johnson (anonymouse67) wrote :

@mwhudson I'm still not convinced that the udev noise is not the problem, but another data point I just went ahead and got to prove this is that when trying to seed another snap which uses interfaces and has an install hook and is available on arm64, the docker snap, it fails the "same" way with udev things in the setup-profiles task:

https://pastebin.ubuntu.com/p/ZGQSxpFHm4/

but in this state setup the install hook is never even attempted to be run. I also am attaching the state.json that results: https://pastebin.ubuntu.com/p/j5MbT8XjVb/

You can see that all the udev output is from the setup-profiles task for snapd from the state.json as well.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

A few more observations:

1) The output shown for the hook is just snapd debugging goo. It's not meaningful.
2) It is libc6 :(

You can demonstrate 2) by: starting a container with a known good image, removing lxd, upgrading libc6 & co and then running "snap install /var/lib/snapd/seed/snaps/lxd_21400.snap".

I tried to strace snapd while the installing the lxd snap but that fails even with the old libc (probably because apparmor gets upset about something being ptraced?). snap install --devmode /var/lib/snapd/seed/snaps/lxd_21400.snap also fails, I don't know if that's surprising.

If I hack the install hook out of the lxd snap, then it fails with a segfault in snap.lxd.activate.service.

If I start the good image, let lxd seed properly, and then upgrade libc6, things are pretty broken:

root@mwhudson-test-03:~# lxc
2021/09/24 03:34:20.355414 tool_linux.go:204: DEBUG: restarting into "/snap/snapd/current/usr/bin/snap"
2021/09/24 03:34:20.372561 cmd_run.go:410: DEBUG: SELinux not enabled
2021/09/24 03:34:20.372797 tracking.go:44: DEBUG: creating transient scope snap.lxd.lxc
2021/09/24 03:34:20.373178 tracking.go:176: DEBUG: session bus is not available: cannot find session bus
2021/09/24 03:34:20.373373 tracking.go:178: DEBUG: falling back to system bus
2021/09/24 03:34:20.374487 tracking.go:183: DEBUG: using system bus now, session bus was not available
2021/09/24 03:34:20.379914 tracking.go:305: DEBUG: created transient scope as object: /org/freedesktop/systemd1/job/1461
2021/09/24 03:34:20.380008 tracking.go:135: DEBUG: systemd could not associate process 2970 with transient scope snap.lxd.lxc.8787a98f-da32-4466-8105-764dc71b092b.scope
2021/09/24 03:34:20.380020 cmd_run.go:1170: DEBUG: snapd cannot track the started application
2021/09/24 03:34:20.380027 cmd_run.go:1171: DEBUG: snap refreshes will not be postponed by this process
Segmentation fault

But annoyingly the debugging aids all fail to work, e.g.:

root@mwhudson-test-03:~# snap run --strace lxd.lxc
2021/09/24 03:35:26.821544 tool_linux.go:204: DEBUG: restarting into "/snap/snapd/current/usr/bin/snap"
2021/09/24 03:35:26.848119 cmd_run.go:410: DEBUG: SELinux not enabled
2021/09/24 03:35:26.848487 tracking.go:44: DEBUG: creating transient scope snap.lxd.lxc
2021/09/24 03:35:26.848719 tracking.go:176: DEBUG: session bus is not available: cannot find session bus
2021/09/24 03:35:26.848805 tracking.go:178: DEBUG: falling back to system bus
2021/09/24 03:35:26.849923 tracking.go:183: DEBUG: using system bus now, session bus was not available
2021/09/24 03:35:26.851652 tracking.go:305: DEBUG: created transient scope as object: /org/freedesktop/systemd1/job/1469
2021/09/24 03:35:26.851734 tracking.go:135: DEBUG: systemd could not associate process 3003 with transient scope snap.lxd.lxc.5d763d6a-a730-41bf-b6e3-6cd519cd1919.scope
2021/09/24 03:35:26.851742 cmd_run.go:1170: DEBUG: snapd cannot track the started application
2021/09/24 03:35:26.851748 cmd_run.go:1171: DEBUG: snap refreshes will not be postponed by this process
error: exit status 1

Is it possible that snapd's own seccomp policies need updating? I guess I can try building a glibc with clone3 disabled and see what happens.

Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

Adding SNAP_CONFINE_DEBUG=1 in the env will produce more debugging output.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

SNAP_CONFINE_DEBUG=1 doesn't seem to make any difference. also disabling clone3 doesn't seem to help :(

Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

Thanks to @mwhudson for arranging access to the test host. Per his comments I launched both 20210903 and 20210904 snapshots of impish. Debugging, I noticed that there was an apparmor denial logged when snap (the snap binary from snapd) was transitioning to snap-confine. While snap-confine runs under an apparmor profile that's coming from the package, or the core snap, or the snapd snap.

The denial is:

audit: type=1400 audit(1632477434.031:8902): apparmor="DENIED" operation="file_mmap"
    namespace="root//lxd-happy-impish_<var-snap-lxd-common-lxd>"
    profile="/snap/snapd/12886/usr/lib/snapd/snap-confine"
    name="/usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1"
    pid=1101743 comm="snap-confine" requested_mask="m" denied_mask="m"
    fsuid=1000000 ouid=1000000

I've switched the container to privileged and disabled the confinement at lxd level via `lxc config set broken-impish raw.lxc "lxc.apparmor.profile=unconfined"`, so I was only left with the apparmor profile shipped by snapd.

Looking at what we have in the profile for snap-confine in the snapd source tree:

    /{,usr/}lib{,32,64,x32}/{,@{multiarch}/{,atomics/}}ld-*.so mrix,

So /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1 does not match the expected pattern and things fail in a weird and funny way.

I've updated the profile to /{,usr/}lib{,32,64,x32}/{,@{multiarch}/{,atomics/}}ld-*.so* mrix, and things are working again.

I was told that this is an upstream change, so I'll open a PR for snapd to update the profile.

Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

The PR is up: https://github.com/snapcore/snapd/pull/10839 also needs +1 from security

Changed in snapd (Ubuntu):
status: New → In Progress
Michael Vogt (mvo)
Changed in snapd (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Michael Vogt (mvo) wrote :

This is now uploaded to impish as snapd_2.53~pre1.git19b68f708_source.changes

John Chittum (jchittum)
tags: added: rls-ii-incoming
John Chittum (jchittum)
Changed in cloud-images:
status: New → Confirmed
Changed in docker.io (Ubuntu):
status: New → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package snapd - 2.53~pre1.git19b68f708

---------------
snapd (2.53~pre1.git19b68f708) impish; urgency=medium

  * New git snapshot with fix for snapd.seeded.service in lxd
    (LP: #1944004)
  * New git snapshot that supports cgroups v2 (LP: #1850667)

 -- Michael Vogt <email address hidden> Fri, 24 Sep 2021 16:26:55 +0200

Changed in snapd (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

This is great, thanks for the fix.

@jchittum, do you think you guys at CPC could respin the Ubuntu Impish images with this new snapd? Thanks!

Revision history for this message
John Chittum (jchittum) wrote :

@sergiodj : Build happened early this AM. I see in the image manifest

snapd 2.53~pre1.git19b68f708

I'll do manual verification this morning, and push an image if it passes manual testing.

Revision history for this message
John Chittum (jchittum) wrote :

i've tested manually using the following:

1. created an arm64 server instance on AWS
2. uploaded the most recent rootfs from CPC's internal build system
3. created lxd metadata tool using CPC's metadata script (sorry, this and above aren't public)
4. uploaded lxd metadata tar.xz
5. lxd init on the arm instance
6. lxc image import impish-server-cloudimg-arm64-lxd.tar.xz impish-server-cloudimg-arm64-root.tar.xz --alias impish-20210928
7. lxc launch impish-20210928 test-i-20210928
8. lxc shell test-i-20210928
9. test:

root@test-i-20210928:~# cloud-init status
status: done
root@test-i-20210928:~# snap changes
ID Status Spawn Ready Summary
1 Done today at 06:47 UTC today at 13:35 UTC Initialize system state
2 Done today at 13:35 UTC today at 13:35 UTC Initialize device

Based on the above, the snapd change looks good. I'll coordinate with the rest of the CPC team for next steps on the Impish daily.

Revision history for this message
John Chittum (jchittum) wrote :

after the upload of `snapd (2.53~pre1.git19b68f708)`, I see that snap pre-seed optimizations aren't completing properly. the `seed-restart-system-key` is now a call seen in `snap debug seeding`

'seed-restart-system-key': {'apparmor-features': ['caps', 'dbus', 'domain', 'file', 'mount', 'namespaces', 'network', 'network_v8', 'policy', 'ptrace', 'query', 'rlimit', 'signal'], 'apparmor-parser-features': ['qipcrtr-socket', 'unsafe'], 'apparmor-parser-mtime': 1628490219, 'build-id': '62fa24aa378519ac08950d814427f03cc6ef1ee5', 'cgroup-version': '2', 'nfs-home': False, 'overlay-root': '', 'seccomp-compiler-version': '59f5ac9a4838ef078f5b3f464f7669f76fddeab0 2.5.1 80ce90419d281437e5b8b03baec604016043649f7c00b7a578f5fc8afef1d29c bpf-actlog', 'seccomp-features': ['allow', 'errno', 'kill_process', 'kill_thread', 'log', 'trace', 'trap', 'user_notif'], 'version': 10}

the helpful message in our tests state:

```
snap debug seeding' output (on stdout) contained
'seed-restart-system-key'. This indicates a kernel feature mismatch between
the running kernel and the feature set hard-coded in livecd-rootfs
for this image. Boot will be slowed by ~200ms until this is resolved
in livecd-rootfs.

Check the debug logs for /sys/kernel/apparmor/features/* and
/proc/sys/kernel/seccomp/actions_avail to update livecd-rootfs.

This failure can be ignored when publishing images with critical fixes.
```

I know the upload changed some seccomp profiles, so it may be that something has changed in a way that's incompatible with livecd-rootfs.

Revision history for this message
John Chittum (jchittum) wrote :

Full output from `snap debug seeding`
{'image-preseeding': '7.949s',
 'preseed-system-key': {'apparmor-features': ['caps', 'dbus', 'domain', 'file',
                                               'mount', 'namespaces', 'network',
                                               'network_v8', 'policy', 'ptrace',
                                               'query', 'rlimit', 'signal'],
                         'apparmor-parser-features': ['qipcrtr-socket',
                                                      'unsafe'],
                         'apparmor-parser-mtime': 1628490219,
                         'build-id': '62fa24aa378519ac08950d814427f03cc6ef1ee5',
                         'cgroup-version': '1',
                         'nfs-home': False,
                         'overlay-root': '',
                         'seccomp-compiler-version': '59f5ac9a4838ef078f5b3f464f7669f76fddeab0 '
                                                     '2.5.1 '
                                                   '80ce90419d281437e5b8b03baec604016043649f7c00b7a578f5fc8afef1d29c '
                                                     'bpf-actlog',
                         'seccomp-features': ['allow', 'errno', 'kill_process',
                                              'kill_thread', 'log', 'trace',
                                              'trap', 'user_notif'],
                         'version': 10},
  'preseeded': True,
  'seed-completion': '4.464s',
  'seed-restart-system-key': {'apparmor-features': ['caps', 'dbus', 'domain',
                                                    'file', 'mount',
                                                    'namespaces', 'network',
                                                    'network_v8', 'policy',
                                                    'ptrace', 'query', 'rlimit',
                                                    'signal'],
                              'apparmor-parser-features': ['qipcrtr-socket',
                                                           'unsafe'],
                              'apparmor-parser-mtime': 1628490219,
                              'build-id': '62fa24aa378519ac08950d814427f03cc6ef1ee5',
                              'cgroup-version': '2',
                              'nfs-home': False,
                              'overlay-root': '',
                              'seccomp-compiler-version': '59f5ac9a4838ef078f5b3f464f7669f76fddeab0 '
                                                          '2.5.1 '
                                                          '80ce90419d281437e5b8b03baec604016043649f7c00b7a578f5fc8afef1d29c '
                                                          'bpf-actlog',
                              'seccomp-features': ['allow', 'errno',
                                                   'kill_process', 'kill_thread',
                                                   'log', 'trace', 'trap',
                                                   'user_notif'],
                              'version': 10},
  'seeded': True}

Revision history for this message
John Chittum (jchittum) wrote :

I've added `livecd-rootfs` as affected after discussing the issue with the snapd team. I'll be introducing a change to `livecd-rootfs` to address how snapd expects cgroup2 with a mount at /sys/fs/cgroup

John Chittum (jchittum)
Changed in livecd-rootfs (Ubuntu):
assignee: nobody → John Chittum (jchittum)
Changed in cloud-images:
assignee: nobody → John Chittum (jchittum)
John Chittum (jchittum)
Changed in livecd-rootfs (Ubuntu):
status: New → Fix Released
Revision history for this message
Steve Langasek (vorlon) wrote :

this is not released, the package is waiting in the -unapproved queue and has definitely not reached the release pocket.

Changed in livecd-rootfs (Ubuntu):
status: Fix Released → In Progress
Revision history for this message
Steve Langasek (vorlon) wrote :

This is now released but for some reason the bug task didn't get autoclosed. https://launchpad.net/ubuntu/+source/livecd-rootfs/2.741 is in impish release:

livecd-rootfs (2.741) impish; urgency=medium

  [ John Chittum ]
  * mount cgroup2 type for snapd LP:#1944004

 -- Steve Langasek <email address hidden> Wed, 29 Sep 2021 11:25:24 -0700

Changed in livecd-rootfs (Ubuntu):
status: In Progress → Fix Released
John Chittum (jchittum)
Changed in cloud-images:
status: Confirmed → In Progress
Revision history for this message
John Chittum (jchittum) wrote (last edit ):

latest daily was built yesterday. Testing, however, is inconclusive. Even though I tested livecd-rootfs in the Launchpad environment, it seems like something still isn't jiving, as the system restart key is still present. Test case below:

1. downloaded arm64 daily lxc rootfs from Launchpad
2. downloaded generated arm64 lxc metadata generated by build system
3. started an arm64 Focal server on AWS (c6g.large)
4. rsynced downloaded files
5. ssh'd to instance
6. lxd init (took all defaults)
7. reboot
8. lxc image import METADATA.tar.xz ROOTFS.tar.xz --alias impish-20211001
9. lxc launch impish-20211001 test-snapd
10. lxc shell test-snapd
11. snap changes:

root@test-snapd:~# snap changes
ID Status Spawn Ready Summary
1 Done yesterday at 17:46 UTC today at 17:57 UTC Initialize system state
2 Done today at 17:57 UTC today at 17:57 UTC Initialize device

12: snap debug seeding:

snap debug seeding
seeded: true
preseeded: true
image-preseeding: 7.621s
seed-completion: 2.732s
preseed-system-key: {
  "apparmor-features": [
    "caps",
    "dbus",
    "domain",
    "file",
    "mount",
    "namespaces",
    "network",
    "network_v8",
    "policy",
    "ptrace",
    "query",
    "rlimit",
    "signal"
  ],
  "apparmor-parser-features": [
    "qipcrtr-socket",
    "unsafe"
  ],
  "apparmor-parser-mtime": 1628490219,
  "build-id": "b1511de00d634afe3e6972d273aa6c27f23bdd84",
  "cgroup-version": "2",
  "nfs-home": false,
  "overlay-root": "",
  "seccomp-compiler-version": "eeeb2b6ddab2f13c1e5a54e79439562523c7e583 2.5.1 80ce90419d281437e5b8b03baec604016043649f7c00b7a578f5fc8afef1d29c bpf-actlog",
  "seccomp-features": [
    "allow",
    "errno",
    "kill_process",
    "kill_thread",
    "log",
    "trace",
    "trap",
    "user_notif"
  ],
  "version": 10
}
seed-restart-system-key: {
  "apparmor-features": [
    "caps",
    "dbus",
    "domain",
    "file",
    "mount",
    "namespaces",
    "network",
    "network_v8",
    "policy",
    "ptrace",
    "query",
    "rlimit",
    "signal"
  ],
  "apparmor-parser-features": [
    "qipcrtr-socket",
    "unsafe"
  ],
  "apparmor-parser-mtime": 1628490219,
  "build-id": "b1511de00d634afe3e6972d273aa6c27f23bdd84",
  "cgroup-version": "1",
  "nfs-home": false,
  "overlay-root": "",
  "seccomp-compiler-version": "eeeb2b6ddab2f13c1e5a54e79439562523c7e583 2.5.1 80ce90419d281437e5b8b03baec604016043649f7c00b7a578f5fc8afef1d29c bpf-actlog",
  "seccomp-features": [
    "allow",
    "errno",
    "kill_process",
    "kill_thread",
    "log",
    "trace",
    "trap",
    "user_notif"
  ],
  "version": 10
}

Still have the system restart key. _however_, snapd is done seeding and cloud-init finished. At this point, I think it's safe to do a daily release of the lxc rootfs, to hopefully minimally unblock the docker transition.

however, I don't believe the changes have closed out the bug entirely. Once a new daily gets published, I can get more public debugging help on the system restart key

Revision history for this message
Ian Johnson (anonymouse67) wrote :

John, the difference between those two that I can see is that now the image was creating with cgroupsv2 present when preseeding (which was the expected change), but the runtime image does not use cgroupsv2 and thus things need to be regenerated.

This is sort of expected since the runtime kernel is focal, which is not configured by default to use cgroupsv2.

In general, I'm not sure we can solve this problem if we expect to produce one image that when launched everywhere will always be preseeded, since as shown by this output we would need to somehow simultaneously both the runtime system using cgroupsv1 and the runtime system using cgroupsv2.

The change you landed is correct in that it will ensure a system using cgroupsv2 (i.e. impish server) will get the best speed, but a system using cgroupsv1 (i.e. focal server) will not get the best possible speed. I'm not sure which case CPC wants to optimize for.

Revision history for this message
John Chittum (jchittum) wrote :

TY anonymouse67, that was my assumption as well, and acceptable from my POV for the image. I do think it's worth putting in our release notes, so I shall :)

I'm still working on getting something out. We're getting held up by riscv64 builders just not wanting to play nicely. and those builds take 3+ hours, and like to fail around hour 3.

Revision history for this message
John Chittum (jchittum) wrote :

a new lxd rootfs and lxd metadata tar have been published to cloud-images.ubuntu.com/impish/20211004/ I confirmed with a local check of

lxc launch ubuntu-daily:impish

that lxd is picking up the daily produced on 20211004 from streams. Anyone affected, if you could test the latest impish daily to ensure the following:

1. cloud-init finished
2. snap changes shows Done
3. snap debug seeding outputs

test repro steps (for non-x84/amd64 based arches only)

1. lxc launch ubuntu-daily:impish test
2. lxc shell test
3. cloud-init status
4. snap changes
5. snap debug seeding

I'll have more test results from multiple arches soon.

Revision history for this message
John Chittum (jchittum) wrote :

images from 20211006 have been posted. These can be pulled with

lxc launch ubuntu-daily:impish

The build date can be verified at

/etc/cloud/build.info

On our nightly test of lxd with ubuntu-daily:impish I'm still seeing powerpcel64 and s390x fail. I'll attempt to get VMs up for these for debugging today.

With release one week away, I'm growing concerned.

Revision history for this message
John Chittum (jchittum) wrote :

debugging info from a s390x VM

test:

1. lxd init (all defaults)
2. lxc launch ubuntu-daily:impish test-i
3. lxc shell test-i
4. cat /etc/cloud/build.info
build_name: server
serial: 20211006

confirms latest

5. snap debug changes (See attachment)

I'm seeing a ton of "permission denied" all over for s390x. it looks like the change that worked for arm64 failed for s390x.

working on getting a ppcel64 machine to check as well

Revision history for this message
John Chittum (jchittum) wrote :

lxc container snap changes

root@test-i:~# snap changes
ID Status Spawn Ready Summary
1 Error yesterday at 19:26 UTC today at 13:12 UTC Initialize system state
2 Done today at 13:12 UTC today at 13:12 UTC Initialize device
3 Error today at 13:17 UTC today at 13:17 UTC Initialize system state
4 Error today at 13:22 UTC today at 13:22 UTC Initialize system state
5 Error today at 13:27 UTC today at 13:27 UTC Initialize system state

Attached is host machine journalctl --no-pager

Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

There don't appear to be any AppArmor denials in the logs provided. From snap debug seeding output, looks like the command `udevadm trigger --subsystem-nomatch=input` failed when invoked during installation of the snapd, which was done outside of confinement.

Revision history for this message
John Chittum (jchittum) wrote :
Download full text (3.8 KiB)

ppcle64 outputs:

lxc ubuntu-daily:impish

ls -lah /usr/lib/ld64.so.1
ls: cannot access '/usr/lib/ld64.so.1': No such file or directory

ldd /usr/lib/snapd/snap-confine
    linux-vdso64.so.1 (0x00007fd860050000)
    libudev.so.1 => /lib/powerpc64le-linux-gnu/libudev.so.1 (0x00007fd85ff70000)
    libc.so.6 => /lib/powerpc64le-linux-gnu/libc.so.6 (0x00007fd85fcf0000)
    /lib64/ld64.so.2 (0x00007fd860060000)

snap debug seeding
seeded: false
seed-error: |
  cannot perform the following tasks:
  - Run install hook of "lxd" snap if present (run hook "install": signal: segmentation fault)
preseeded: true
image-preseeding: 4.51s
seed-completion: –
preseed-system-key: {
  "apparmor-features": [
    "caps",
    "dbus",
    "domain",
    "file",
    "mount",
    "namespaces",
    "network",
    "network_v8",
    "policy",
    "ptrace",
    "query",
    "rlimit",
    "signal"
  ],
  "apparmor-parser-features": [
    "qipcrtr-socket",
    "unsafe"
  ],
  "apparmor-parser-mtime": 1628490219,
  "build-id": "e09cf7be4df8b799f8e788659c66fe1bb79404cc",
  "cgroup-version": "2",
  "nfs-home": false,
  "overlay-root": "",
  "seccomp-compiler-version": "bb391d0fd29f0bc97f0a62a4dcdd58abcdd57620 2.5.1 80ce90419d281437e5b8b03baec604016043649f7c00b7a578f5fc8afef1d29c bpf-actlog",
  "seccomp-features": [
    "allow",
    "errno",
    "kill_process",
    "kill_thread",
    "log",
    "trace",
    "trap",
    "user_notif"
  ],
  "version": 10
}
seed-restart-system-key: {
  "apparmor-features": [
    "caps",
    "dbus",
    "domain",
    "file",
    "mount",
    "namespaces",
    "network",
    "network_v8",
    "policy",
    "ptrace",
    "query",
    "rlimit",
    "signal"
  ],
  "apparmor-parser-features": [
    "qipcrtr-socket",
    "unsafe"
  ],
  "apparmor-parser-mtime": 1628490219,
  "build-id": "e09cf7be4df8b799f8e788659c66fe1bb79404cc",
  "cgroup-version": "1",
  "nfs-home": false,
  "overlay-root": "",
  "seccomp-compiler-version": "bb391d0fd29f0bc97f0a62a4dcdd58abcdd57620 2.5.1 80ce90419d281437e5b8b03baec604016043649f7c00b7a578f5fc8afef1d29c bpf-actlog",
  "seccomp-features": [
    "allow",
    "errno",
    "kill_process",
    "kill_thread",
    "log",
    "trace",
    "trap",
    "user_notif"
  ],
  "version": 10
}

snap changes
ID Status Spawn Ready Summary
1 Error yesterday at 18:32 UTC today at 15:16 UTC Initialize system state
2 Done today at 15:16 UTC today at 15:16 UTC Initialize device
3 Error today at 15:21 UTC today at 15:21 UTC Initialize system state
4 Error today at 15:26 UTC today at 15:27 UTC Initialize system state
5 Error today at 15:32 UTC today at 15:32 UTC Initialize system state
6 Error today at 15:37 UTC today at 15:37 UTC Initialize system state
7 Error today at 15:42 UTC today at 15:43 UTC Initialize system state
8 Error today at 15:47 UTC today at 15:48 UTC Initialize system state
9 Error today at 15:53 UTC today at 15:53 UTC Initialize system state

On Hirsute:

ls -lah /usr/lib/ld64.so.1
ls: cannot access '/usr/lib/ld64.so.1': No such file or director

ldd /usr/lib/snapd/snap-confine
 linux-v...

Read more...

Revision history for this message
John Chittum (jchittum) wrote :

ppcle64 impish lxd journalctl

Revision history for this message
John Chittum (jchittum) wrote :

ppcle64 hirsute lxd journalctl

Revision history for this message
John Chittum (jchittum) wrote :

ls -lah /lib64/ld64.so.2 on ppcle64 impish, hirsute

On Impish
ls -lah /lib64/ld64.so.2
lrwxrwxrwx 1 root root 36 Sep 2 21:26 /lib64/ld64.so.2 -> /lib/powerpc64le-linux-gnu/ld64.so.2

Hirsute:
ls -lah /lib64/ld64.so.2
lrwxrwxrwx 1 root root 37 Mar 31 2021 /lib64/ld64.so.2 -> /lib/powerpc64le-linux-gnu/ld-2.33.so

Revision history for this message
Ian Johnson (anonymouse67) wrote :

Okay, so after too much time trying to test this, I confirmed that this change will fix it for both powerpc and for s390x: https://github.com/snapcore/snapd/pull/10898

That PR should be merged to snapd master, then cherry-picked to release/2.53 branch, then we can upload a 2.53.1 deb to impish-proposed and try to get that released to impish proper which will fix the issue. If we can do it in time for the release even better.

Changed in snapd (Ubuntu):
status: Fix Released → In Progress
assignee: nobody → Ian Johnson (anonymouse67)
Revision history for this message
Michael Vogt (mvo) wrote :

I distro patched the impish version of snapd with the fix from Ian and uploaded a new version:
https://launchpad.net/ubuntu/impish/+queue?queue_state=1&queue_text=snapd

with the fix as 2.53+21.10ubuntu1.

tags: added: openstack-ibm s390x uosci
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → In Progress
Changed in snapd (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
John Chittum (jchittum) wrote :

I've tested the proposed upload on a pple64 and s390x systems. I used the following procedure

1. lxc launch ubuntu-daily:impish test-i-update
2. lxc shell test-i-update
3. enabled proposed in /etc/apt/sources.list
4. apt update
5. apt install snapd

Get:1 http://ports.ubuntu.com/ubuntu-ports impish-proposed/main s390x snapd s390x 2.53+21.10ubuntu1

Get:1 http://ports.ubuntu.com/ubuntu-ports impish-proposed/main ppc64el snapd ppc64el 2.53+21.10ubuntu1

6. exit && lxc restart test-i-update
7. lxc shell test-i-update
8. snap changes
observed Done state for Initialize system state instead of Error
9. snap debug seeding
observed seeding completed

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package snapd - 2.53+21.10ubuntu1

---------------
snapd (2.53+21.10ubuntu1) impish; urgency=medium

  * Cherry pick https://github.com/snapcore/snapd/pull/10898 to
    fix apparmor profile for snap-confine on s390x,ppc64el,risvc64
    This fixes the snapd.seeded.service failures reported in
    LP: #1944004

 -- Michael Vogt <email address hidden> Fri, 08 Oct 2021 09:29:37 +0200

Changed in snapd (Ubuntu):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Released
Revision history for this message
John Chittum (jchittum) wrote :

The latest lxd rootfs have been tested and daily images have been released.

Changed in cloud-images:
status: In Progress → Fix Released
Changed in snapd:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.