snap confine fails on top of overlayroot

Bug #1797218 reported by Scott Moser
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd (Ubuntu)
Fix Released
High
Zygmunt Krynicki

Bug Description

# Summary

snap confinement fails on top of overlayroot

# Steps to reproduce:

These steps are from comment #11, login with root/passw0rd:

$ url=http://cloud-images.ubuntu.com/cosmic/current/cosmic-server-cloudimg-amd64.img
$ img=${url##*/}
$ wget "${url}" -O "$img"
$ sudo mount-image-callback "$img" -- \
   mchroot sh -xc 'echo overlayroot=tmpfs > $1 && echo root:$2 | chpasswd' \
   setup-image /etc/overlayroot.local.conf passw0rd
$ qemu-system-x86_64 -enable-kvm -m 2048 -curses -hda cosmic-server-cloudimg-amd64.img

# See previous bugs

https://bugs.launchpad.net/ubuntu/+source/snapd/+bug/1729867
https://bugs.launchpad.net/apparmor/+bug/1703674
https://forum.snapcraft.io/t/confined-snaps-dont-work-on-live-images-due-to-apparmor-path-mapping/3767/5

# Additional logs and information

$ cat /etc/cloud/build.info
build_name: server
serial: 20181010

$ dpkg-query --show | pastebinit
http://paste.ubuntu.com/p/KpwSzBdZkw/

The above system hangs on boot. This is seen in curtin's vmtest.
The last success was with image 20180926.

The console log ends up getting repeated messages like:
         Starting Snappy daemon...
[ OK ] Started Snappy daemon.
         Mounting Mount unit for core, revision 5548...
[ OK ] Mounted Mount unit for core, revision 5548.
[ OK ] Stopped Snappy daemon.
         Starting Snappy daemon...
[ OK ] Started Snappy daemon.
         Mounting Mount unit for lxd, revision 9010...
[ OK ] Mounted Mount unit for lxd, revision 9010.
[ OK ] Stopped Snappy daemon.
         Starting Snappy daemon...
[ OK ] Started Snappy daemon.
         Mounting Mount unit for core, revision 5548...
[ OK ] Mounted Mount unit for core, revision 5548.
[ OK ] Stopped Snappy daemon.
         Starting Snappy daemon...
[ OK ] Started Snappy daemon.
         Mounting Mount unit for lxd, revision 9010...
[ OK ] Mounted Mount unit for lxd, revision 9010.
[ OK ] Stopped Snappy daemon.
         Starting Snappy daemon...
[ OK ] Started Snappy daemon.
         Mounting Mount unit for core, revision 5548...
[ OK ] Mounted Mount unit for core, revision 5548.
[ OK ] Stopped Snappy daemon.
         Starting Snappy daemon...
[ OK ] Started Snappy daemon.
         Mounting Mount unit for lxd, revision 9010...
[ OK ] Mounted Mount unit for lxd, revision 9010.
[ OK ] Stopped Snappy daemon.
         Starting Snappy daemon...
[ OK ] Started Snappy daemon.
         Mounting Mount unit for core, revision 5548...
[ OK ] Mounted Mount unit for core, revision 5548.
[ OK ] Stopped Snappy daemon.
         Starting Snappy daemon...
[ OK ] Started Snappy daemon.
         Mounting Mount unit for lxd, revision 9010...
[ OK ] Mounted Mount unit for lxd, revision 9010.
[ OK ] Stopped Snappy daemon.
         Starting Snappy daemon...
[ OK ] Started Snappy daemon.
         Mounting Mount unit for core, revision 5548...
[ OK ] Mounted Mount unit for core, revision 5548.
[ OK ] Stopped Snappy daemon.
         Starting Snappy daemon...
[ OK ] Started Snappy daemon.
         Mounting Mount unit for lxd, revision 9010...
[ OK ] Mounted Mount unit for lxd, revision 9010.
[ OK ] Stopped Snappy daemon.
         Starting Snappy daemon...
[ OK ] Started Snappy daemon.

ProblemType: Bug
DistroRelease: Ubuntu 18.10
Package: snapd 2.35.4+18.10
ProcVersionSignature: User Name 4.18.0-8.9-generic 4.18.7
Uname: Linux 4.18.0-8-generic x86_64
ApportVersion: 2.20.10-0ubuntu11
Architecture: amd64
Date: Wed Oct 10 19:29:43 2018
ProcEnviron:
 TERM=screen-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
SourcePackage: snapd
UpgradeStatus: No upgrade log present (probably fresh install)

Related bugs:
 bug 1786438: [SRU] 2.35

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

$ systemctl status snapd --no-pager --full
● snapd.service - Snappy daemon
   Loaded: loaded (/lib/systemd/system/snapd.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2018-10-10 19:47:19 UTC; 44s ago
 Main PID: 1960 (snapd)
    Tasks: 11 (limit: 1105)
   Memory: 27.4M
   CGroup: /system.slice/snapd.service
           └─1960 /usr/lib/snapd/snapd

Oct 10 19:47:18 ubuntu systemd[1]: snapd.service: Scheduled restart job, restart counter is at 2.
Oct 10 19:47:18 ubuntu systemd[1]: Stopped Snappy daemon.
Oct 10 19:47:18 ubuntu systemd[1]: Starting Snappy daemon...
Oct 10 19:47:19 ubuntu snapd[1960]: AppArmor status: apparmor is enabled and all features are available
Oct 10 19:47:19 ubuntu snapd[1960]: backend.go:125: snapd enabled root filesystem on overlay support, additional upperdir permissions granted
Oct 10 19:47:19 ubuntu snapd[1960]: helpers.go:249: removed stale connections: lxd:lxd-support core:lxd-support, lxd:network core:network, lxd:network-bind core:network-bind, lxd:system-observe core:system-observe
Oct 10 19:47:19 ubuntu snapd[1960]: daemon.go:344: started snapd/2.35.4+18.10 (series 16; classic) ubuntu/18.10 (amd64) linux/4.18.0-8-generic.
Oct 10 19:47:19 ubuntu systemd[1]: Started Snappy daemon.
Oct 10 19:47:21 ubuntu snapd[1960]: handlers.go:389: Reported install problem for "core" as 4d395dae-ccc5-11e8-90ca-fa163e102db1 OOPSID

Revision history for this message
Scott Moser (smoser) wrote :

$ for service in $(cd /lib/systemd/system && ls snap*); do echo ==== $service =====; systemctl status --no-pager --full $service; done 2>&1 | pastebinit
http://paste.ubuntu.com/p/57W9QnXwfQ/

Changed in snapd (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Scott Moser (smoser) wrote :

Jenkins jobs that run this are archived at
 https://jenkins.ubuntu.com/server/job/curtin-vmtest-devel-amd64/

Revision history for this message
Scott Moser (smoser) wrote :

$ snap changes
ID Status Spawn Ready Summary
1 Error today at 19:46 UTC today at 19:47 UTC Initialize system state
2 Done today at 19:46 UTC today at 19:46 UTC Initialize device
3 Error today at 19:52 UTC today at 19:52 UTC Initialize system state

# for i in 1 2 3; do echo ==== $i ====; snap change $i; done 2>&1 | pastebinit
http://paste.ubuntu.com/p/FrFCyT3GH9/

Revision history for this message
Ryan Harper (raharper) wrote :

The error I see is:

Run install hook of "lxd" snap if present

2018-10-10T20:06:30Z ERROR run hook "install": cannot create lock directory /run/snapd/lock: Permission denied

$ ls -al /run | grep snapd
srw-rw-rw- 1 root root 0 Oct 10 20:06 snapd-snap.socket
srw-rw-rw- 1 root root 0 Oct 10 20:06 snapd.socket

Revision history for this message
Ryan Harper (raharper) wrote :

Looking at this bug:

https://bugs.launchpad.net/snappy/+bug/1665808

As we're in the same maas ephemeral environment; though we get a slightly different error. However the workaround included:

sudo apt-get install -yu apparmor-utils
sudo aa-complain /usr/lib/snapd/snap-confine

does allow lxd snap to install.

However, after that completes; the system quickly goes dead (100% cpu usage in the guest VCPUs preventing any input (ssh or serial console).

Revision history for this message
Scott Moser (smoser) wrote :

I noticed that the seeding of snaps in the cloud image also caused regression
of the open-iscsi test [1] as seen from [2].

If anyone wants to debug this the open-iscsi test case provides doc on
how to run it at [3] and I have a gist on it at [4].

Alternatively you can use uvt-kvm or multipass or any openstack or cloud
instance to reproduce fairly easily.
Here's how:

## Start a multipass guest.

$ name=cosmic1
$ multipass launch daily:cosmic "--name=$name"

## write overlayroot.local.conf to enable overlayroot=tmpfs
$ multipass exec $name sudo mv /etc/overlayroot.local.conf /etc/overlayroot.local.conf.dist
$ multipass exec $name sudo sh -c 'echo overlayroot=tmpfs > /etc/overlayroot.local.conf'

## reboot
$ multipass exec $name sudo reboot
..
$ multipass shell $name

## To switch back to non-overlayroot, do
$ multipass exec $name -- sudo overlayroot-chroot rm -f /etc/overlayroot.local.conf
$ multipass exec $name sudo reboot

--
[1] https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-cosmic/cosmic/amd64/o/open-iscsi/20181010_213729_f6165@/log.gz
[2] http://autopkgtest.ubuntu.com/packages/o/open-iscsi/cosmic/amd64
[3] https://git.launchpad.net/~usd-import-team/ubuntu/+source/open-iscsi/tree/debian/tests/README-boot-test.md
[4] https://gist.github.com/smoser/ffb519d00bafd2105abbc180c3410e76

Revision history for this message
Scott Moser (smoser) wrote :

I am attempting to verify if installation of 18.10 is similarly broken.

Revision history for this message
Scott Moser (smoser) wrote :

(installation via maas)

Revision history for this message
Scott Moser (smoser) wrote :

Heres an even easier re-create

$ url=http://cloud-images.ubuntu.com/cosmic/current/cosmic-server-cloudimg-amd64.img
$ img=${url##*/}
$ wget "${url}" -O "$img"

$ sudo mount-image-callback "$img" -- \
   mchroot sh -xc 'echo overlayroot=tmpfs > $1 && echo root:$2 | chpasswd' \
   setup-image /etc/overlayroot.local.conf passw0rd

## launch qemu with whatever parms you want.
## I might suggest using (or learning) '-nographic' or '-snapshot'
## Simplist thing is:
$ qemu-system-x86_64 -enable-kvm -hda cosmic-server-cloudimg-amd64.img -m 2048

Revision history for this message
Scott Moser (smoser) wrote :

After above recreate, you can login on console as 'root' with 'passw0rd'.

You'll then see the system is still 'starting' (per systemctl status)

Revision history for this message
Scott Moser (smoser) wrote :

In another re-create scenario, you can see this fail on bionic.
a.) launch bionic image somewhere
b.) sudo sh -xc 'echo overlayroot=tmpfs > /etc/overlayroot.local.conf && reboot'
c.) $ sudo snap install lxd
2018-10-11T17:44:22Z INFO Waiting for restart...
error: cannot perform the following tasks:
- Run install hook of "lxd" snap if present (run hook "install": cannot create lock directory /run/snapd/lock: Permission denied)

Revision history for this message
Scott Moser (smoser) wrote :

'c' above can also be: 'sudo snap install'
There, the install wont complain or fail, but running 'hello' will.

$ sudo snap install hello
2018-10-11T17:54:21Z INFO Waiting for restart...
hello 2.10 from 'canonical' installed
$ echo $?
0

$ hello
cannot create lock directory /run/snapd/lock: Permission denied

Scott Moser (smoser)
summary: - boot hangs in curtin vmtest
+ snap confine fails on top of overlayroot
Revision history for this message
Ryan Harper (raharper) wrote :

I suspect this reloading of profiles from the core snap has something to do with the failure.

Oct 11 21:46:42 ubuntu systemd[1]: Mounting Mount unit for core, revision 5548...
Oct 11 21:46:42 ubuntu systemd[1]: Mounted Mount unit for core, revision 5548.
Oct 11 21:46:43 ubuntu snapd[745]: backend.go:303: cannot create host snap-confine apparmor configuration: cannot synchronize snap-confine apparmor profile: open /var/lib/snapd/apparmor/profiles/snap-confine.core.5548.7GMky4P363H7~: no such file or directory
Oct 11 21:46:43 ubuntu audit[1095]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap-update-ns.core" pid=1095 comm="apparmor_parser"
Oct 11 21:46:43 ubuntu kernel: audit: type=1400 audit(1539294403.064:14): apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap-update-ns.core" pid=1095 comm="apparmor_parser"
Oct 11 21:46:43 ubuntu audit[1097]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.core.hook.configure" pid=1097 comm="apparmor_parser"
Oct 11 21:46:43 ubuntu kernel: audit: type=1400 audit(1539294403.148:15): apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.core.hook.configure" pid=1097 comm="apparmor_parser"

Joshua Powers (powersj)
description: updated
Zygmunt Krynicki (zyga)
Changed in snapd (Ubuntu):
assignee: nobody → Zygmunt Krynicki (zyga)
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I reproduced the issue and came up with a workaround:

https://github.com/snapcore/snapd/pull/5974

The workaround may need additional fix for an unexpected situation (where we need to reload snap-confine's apparmor profile after generating the overlayfs snippet). I'm investigating that now.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I believe that the pull request referenced above is sufficient to fix this issue. Nothing else has to change.

Revision history for this message
Scott Moser (smoser) wrote :

Just as information to put here. A change to overlayroot like below actually does make the /proc/<pid>/mountinfo table in line with what snapd is expecting. I've verified that such an initramfs does have snaps seemingly working.

--- /usr/share/initramfs-tools/scripts/init-bottom/overlayroot.dist 2018-10-12 15:40:11.248597288 +0000
+++ /usr/share/initramfs-tools/scripts/init-bottom/overlayroot 2018-10-12 15:43:42.593948397 +0000
@@ -686,7 +686,9 @@
    workdir="$_RET"
    mount_opts="${mount_opts},workdir=$workdir"
   fi
- clean_path "${mount_opts} overlayroot ${ROOTMNT}"
+ # root_rw is /media/root-rw, and its value here is passed as
+ # the source (fs_spec) to 'mount'.
+ clean_path "${mount_opts} $root_rw ${ROOTMNT}"
   mount_opts="$_RET"
   ;;
  aufs)

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

This is reported fixed in snapd 2.35.5+18.10 which was uploaded under SRU bug 1786438.

Changed in snapd (Ubuntu):
status: Confirmed → Fix Committed
description: updated
Zygmunt Krynicki (zyga)
Changed in snapd (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.