Running snaps inside a Focal lxd container on a impish host failes

Bug #1953563 reported by Wouter van Bommel
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
New
Undecided
Christian Brauner
lxd
Fix Released
Unknown

Bug Description

For reference see things discussed with LXD up to now: https://github.com/lxc/lxd/issues/9642

Where it comes down to, is that when a snap is installed inside a Focal LXD container running on an impish host, the snap does not work.
When running a the same snap running inside a Focal LXD container on a Focal host, it does work.

From what I got from the LXD case, this has todo with cgroups v1 vs v2 and the snapd support of this.

Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :
Download full text (6.3 KiB)

I see no evidence of this being caused by cgroups v2. In fact all services are up.

Nested ubuntu 20.04 on 21.10 host:

root@my-ubuntu-confined:~# snap-store-proxy status
Store ID: not registered
Internal Service Status:
  memcached: running
  nginx: running
  snapauth: not running: 500 Server Error: INTERNAL SERVER ERROR for url: http://127.0.0.1:8005/_status/check
  snapdevicegw: not running: getresponse() got an unexpected keyword argument 'buffering'
  snapdevicegw-local: not running: [Errno 111] Connection refused
  snapproxy: not running: [Errno 111] Connection refused
  snaprevs: not running: 500 Server Error: INTERNAL SERVER ERROR for url: http://127.0.0.1:8002/_status/check
root@my-ubuntu-confined:~# snap services snap-store-proxy
Service Startup Current Notes
snap-store-proxy.memcached disabled active -
snap-store-proxy.nginx disabled active -
snap-store-proxy.snapassert disabled inactive -
snap-store-proxy.snapauth disabled active -
snap-store-proxy.snapdevicegw disabled active -
snap-store-proxy.snapident disabled inactive -
snap-store-proxy.snapproxy disabled active -
snap-store-proxy.snaprevs disabled active -

The services are up, some of them appear to be retrying operations and logging that. At the same time I observe denials on the host:

Dec 08 08:25:58 dec080806-781058 kernel: audit: type=1400 audit(1638951958.783:675): apparmor="DENIED" operation="capable" namespace="root//lxd-my-ubuntu-confined_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapdevicegw" pid=10881 comm="python3" capability=0 capname="chown"

Dec 08 08:25:59 dec080806-781058 audit[10856]: AVC apparmor="DENIED" operation="capable" namespace="root//lxd-my-ubuntu-confined_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapproxy" pid=10856 comm="python3" capability=0 capname="chown"
Dec 08 08:25:59 dec080806-781058 kernel: audit: type=1400 audit(1638951959.639:676): apparmor="DENIED" operation="capable" namespace="root//lxd-my-ubuntu-confined_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapproxy" pid=10856 comm="python3" capability=0 capname="chown"
Dec 08 08:25:59 dec080806-781058 audit[10881]: AVC apparmor="DENIED" operation="capable" namespace="root//lxd-my-ubuntu-confined_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapdevicegw" pid=10881 comm="python3" capability=0 capname="chown"
Dec 08 08:25:59 dec080806-781058 kernel: audit: type=1400 audit(1638951959.787:677): apparmor="DENIED" operation="capable" namespace="root//lxd-my-ubuntu-confined_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapdevicegw" pid=10881 comm="python3" capability=0 capname="chown"
Dec 08 08:26:00 dec080806-781058 audit[10856]: AVC apparmor="DENIED" operation="capable" namespace="root//lxd-my-ubuntu-confined_<var-snap-lxd-common-lxd>" profile="snap.snap-store-proxy.snapproxy" pid=10856 comm="python3" capability=0 capname="chown"
Dec 08 08:26:00 dec080806-781058 kernel: audit: type=1400 audit(1638951960.643:678): apparmor="DENIED" operation="capable" namespace="root//lxd-my-ubuntu-confined_<var-snap-lxd-common-lxd>" profile="snap...

Read more...

Changed in snapd:
status: New → Incomplete
Changed in lxd:
status: Unknown → Fix Released
Revision history for this message
Wouter van Bommel (woutervb) wrote :

Running the snap-store-proxy status command should show snapdevicegw, snapdevicegw-local and snapproxy as running.
They are all connected via a file socket to gunicorn, but accessing the file sockets fails with a permission denied.

This can be seen in the logs (ie snap logs snap-store-proxy -n10000)

Changed in snapd:
status: Incomplete → New
Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote (last edit ):

What I meant in https://bugs.launchpad.net/snapd/+bug/1953563/comments/1 is that this does not appear to be a snapd problem, but rather something LXD related, as dropping stacked apparmor confinement from LXD, thus the app is only confined by the profile set up by snapd, makes the problem go away.

Please as in the LXD bug tracker to reopen the bug report and investigate.

Revision history for this message
Wouter van Bommel (woutervb) wrote :

Not sure I can do anything with this last comment (https://bugs.launchpad.net/snapd/+bug/1953563/comments/3), as per https://github.com/lxc/lxd/issues/9642 I got told that it is a snapd issue.

Can we at least agree that there is a problem? And it has to do with the combination of snapd and lxd on impish, and possibly other releases.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Created two VMs, one using impish, one using focal. Install the same LXD in both of them, same config for LXD and identical Ubuntu 20.04 container.

Then confirmed that LXD is configuring AppArmor in an identical way:

root@impish:~# sha256sum /var/snap/lxd/common/lxd/security/apparmor/profiles/lxd-u1
348610816fa52930d1071b0dc36dcbd5e4be989fe3a7714dea30b7d2f155c296 /var/snap/lxd/common/lxd/security/apparmor/profiles/lxd-u1

root@focal:~# sha256sum /var/snap/lxd/common/lxd/security/apparmor/profiles/lxd-u1
348610816fa52930d1071b0dc36dcbd5e4be989fe3a7714dea30b7d2f155c296 /var/snap/lxd/common/lxd/security/apparmor/profiles/lxd-u1

So AppArmor setup from LXD is bit for bit identical. Now checking that AppArmor namespacing and stacking is setup the same way:

root@impish:~# lxc info u1 | grep PID
PID: 2852
root@impish:~# cat /proc/2852/attr/current
lxd-u1_</var/snap/lxd/common/lxd>//&:lxd-u1_<var-snap-lxd-common-lxd>:unconfined (enforce)

root@focal:~# lxc info u1 | grep PID
PID: 14047
root@focal:~# cat /proc/14047/attr/current
lxd-u1_</var/snap/lxd/common/lxd>//&:lxd-u1_<var-snap-lxd-common-lxd>:unconfined (enforce)

So hopefully, that helps prove that LXD is behaving identically on both systems...

I've done the same checks of AppArmor profile hashes for the snap-store-proxy snap in both environments and they're similarly identical. Though snap-store-proxy is obviously quite happy in the 20.04 on 20.04 environment.

I then moved the 20.04 system to the same kernel as the 21.10 one (using linux-generic-hwe-20.04-edge) and I'm getting a similarly broken behavior for that snap.

LXD's own profile doesn't restrict the capabilities that containers can use, especially now "chown". So to me this suggests an apparmor kernel issue.

My suspicion is that it's either an apparmor change which somehow causes more strict confinement of profiles loaded in containers than loaded on the host (we've seen that in the past with apparmor in a container needing a rule for the binary being confined whereas it's unneeded on the host), or it's a kernel change (possibly the idmap mounts feature?) which is confusing apparmor somehow.

I'm thinking of that last one because of the fact that "chown" is showed repeatedly as blocked by apparmor and given we have a VFS layer that remaps uid/gid on the fly now, maybe that's causing some confusion?

In any case, adding a task for the kernel as this will trickle down to affecting all LTS users too once the kernel is promoted from -edge to normal HWE.

Revision history for this message
Stéphane Graber (stgraber) wrote :

So I've confirmed that it's VFS idmapped related.
Testing with two containers on the same system, one on a dir storage pool (ext4 uses idmapped mounts) and one on a zfs storage pool (idmapped mounts unsupported). The former has the issue whereas the latter works just fine.

So it looks like something with VFS idmapped mounts is tickling apparmor the wrong way.

Adding a kernel task and pinging brauner to give his thoughts on what may be the issue here.

affects: snapd → linux
Revision history for this message
Stéphane Graber (stgraber) wrote :

Worth noting that the process does call chown based on strace output:

bind(7, {sa_family=AF_UNIX, sun_path="/var/snap/snap-store-proxy/78/snapproxy/snapproxy.sock"}, 56) = 0
chown("/var/snap/snap-store-proxy/78/snapproxy/snapproxy.sock", 0, 0) = -1 EPERM (Operation not permitted)

I can't easily tell if in the working case, apparmor just allows that call or if the process is somehow not issuing the chown call, but that's certainly a bit odd.

Revision history for this message
Christian Brauner (cbrauner) wrote :

I'll take a look into this.

Changed in linux:
assignee: nobody → Christian Brauner (cbrauner)
Revision history for this message
Stéphane Graber (stgraber) wrote :

root@dir:~# aa-exec -p snap.snap-store-proxy.snapproxy chown 0:0 /var/snap/snap-store-proxy/common/nginx/
chown: changing ownership of '/var/snap/snap-store-proxy/common/nginx/': Operation not permitted

root@zfs:~# aa-exec -p snap.snap-store-proxy.snapproxy chown 0:0 /var/snap/snap-store-proxy/common/nginx/
root@zfs:~# aa-exec -p snap.snap-store-proxy.snapproxy chown 1:1 /var/snap/snap-store-proxy/common/nginx/
chown: changing ownership of '/var/snap/snap-store-proxy/common/nginx/': Operation not permitted

That's despite the profiles not having "capability chown," inside them.
This suggests that apparmor will normally silently allow you to do a pointless chown (requested uid/gid matches existing uid/gid) but that logic isn't working properly with idmapped mounts.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Have tested 5.16-rc1, problem still occurs there.
Have tested 5.16-rc2, problem got resolved.

Likely fix is https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=968219708108440b23bc292e0486e3cc1d9a1bed
This was sent to the stable kernel mailing-list and needs to be applied to the Ubuntu kernels.

Revision history for this message
Wouter van Bommel (woutervb) wrote :

Yesterday I installed the HWE kernel on Focal, and there the same problem occurs. Any change it gets fixed?

Also if there is a PPA or the like for me to test a newer kernel (on Focal), let me know and I can share the results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.