udev interface fails in privileged containers

Bug #1712808 reported by Colin Watson
54
This bug affects 9 people
Affects Status Importance Assigned to Milestone
snapd
Medium
Unassigned
lxd (Ubuntu)
Undecided
Unassigned

Bug Description

I think this is possibly a known issue since there's evidence of a workaround in e.g. https://stgraber.org/2017/01/13/kubernetes-inside-lxd/, but I couldn't find any proper discussion of it.

Installing snaps in a privileged LXD container fails. Here's a test script:

  $ lxc launch -c security.privileged=true ubuntu:16.04 snap-test
  $ lxc exec snap-test apt update
  $ lxc exec snap-test apt install squashfuse
  $ lxc exec snap-test snap install hello-world
  2017-08-24T12:03:59Z INFO cannot auto connect core:core-support-plug to core:core-support: (slot auto-connection), existing connection state "core:core-support-plug core:core-support" in the way
  error: cannot perform the following tasks:
  - Setup snap "core" (2462) security profiles (cannot setup udev for snap "core": cannot reload udev rules: exit status 2
  udev output:
  )
  - Setup snap "core" (2462) security profiles (cannot reload udev rules: exit status 2
  udev output:
  )

This is because /sys is mounted read-only in privileged containers (presumably to avoid causing havoc to the host) and so the systemd-udevd service isn't started. The prevailing recommendation seems to be to work around it by making /usr/local/bin/udevadm be a symlink to /bin/true, but this looks like a hack rather than a proper fix.

Revision history for this message
Colin Watson (cjwatson) wrote :

On IRC, Stéphane suggested making the container "even more privileged" as a cleaner workaround, by adding the following to raw.lxc:

  lxc.mount.auto=
  lxc.mount.auto=proc:rw sys:rw

(I also had to fiddle with my restrictive policy-rc.d script to allow udev to start.)

Perhaps documenting that somewhere reasonably findable would be good enough?

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I'm not quite sure what's the difference between the regular and privileged (or more privileged) containers but last time we looked at similar issues we came to the conclusion that any container in which apparmor is not stacked but instead directly shared with the host is unsupportable for us. I'm not sure if this is the same problem again. I didn't try to reproduce it yet.

Revision history for this message
Colin Watson (cjwatson) wrote :

The "even more privileged" workarounds have been working in launchpad-buildd for a while now. We can't use unprivileged containers for various reasons, for example because one of the categories of builds that needs to install snaps sometimes is live filesystem builds, and those do various things like mknod that'll never work in unprivileged containers.

Of course, launchpad-buildd is somewhat special in that it typically only runs a single build before shutting down the VM, so I can imagine that there might be some isolation failures that are a problem in general but that don't affect us in practice. Please don't outright forbid privileged containers though, as we don't really have a good alternative.

Michael Vogt (mvo)
Changed in snapd:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I'm wondering what we can do about it.

When we're not running in a unprivileged container anything that we do inside (tweak cgroups, tweak apparmor) will contaminate the host. If the host also uses snaps those definitions will conflict and collide.

I see two options:

1) Close as WONTFIX as in reality this cannot work very well
2) Make it so that launchpad doesn't have to do hacks ... somehow and ignore the contamination

I'm not so sure how 2) would even look like. Shall we ignore errors? Even if we do snaps may fail at runtime, depending on what they do.

Could launchpad spawn a VM instead of a container for this? (I know it's far heavier)

Changed in snapd:
status: Triaged → Incomplete
Revision history for this message
Colin Watson (cjwatson) wrote :

I filed this bug because it seems ugly, but it does at least work with our current hacks, so closing this as Won't Fix would be better than changing something in a way that makes our hacks not work. :-) If you feel you need to close it then go ahead.

We already run every build in a dedicated VM that's reset at the start of each build (hence why we really don't care whether the container contaminates the host - the host is going to be thrown away anyway). However, those VMs are generic: for instance, they're currently all xenial rather than being for the release we're building for. We use the container both to avoid too much in the way of interference from the software that runs the builder itself and to arrange for the build to be running on the appropriate version of Ubuntu. Using another VM here would both be more complicated/expensive to set up and either slower to run or entirely non-functional due to requiring nested virtualisation. So no, we can't reasonably switch to a VM rather than a container.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for snapd because there has been no activity for 60 days.]

Changed in snapd:
status: Incomplete → Expired
Anthony Fok (foka)
Changed in snapd:
status: Expired → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.4 KiB)

This will come up again and more frequently now that the LXD package upgrade will do the deb->snap transition even when running in a container itself.

As Colin I run (and others might) privileged containers a lot using those extra privs: http://paste.ubuntu.com/p/bcVHRBTKyP/

I never had an issue as I didn't try to snap-in-lxd on my own, but the new package transition will trigger this.

Due to that the severity of this case increases a bit.

[...]
Preparing to unpack .../16-apache2-utils_2.4.34-1ubuntu2_amd64.deb ...
Unpacking apache2-utils (2.4.34-1ubuntu2) over (2.4.34-1ubuntu1) ...
Preparing to unpack .../17-lxd-client_1%3a0.4_all.deb ...
Unpacking lxd-client (1:0.4) over (3.0.2-0ubuntu3) ...
Setting up apparmor (2.12-4ubuntu8) ...
Installing new version of config file /etc/apparmor.d/abstractions/private-files ...
Installing new version of config file /etc/apparmor.d/abstractions/private-files-strict ...
Installing new version of config file /etc/apparmor.d/abstractions/ubuntu-browsers.d/user-files ...
Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
Setting up squashfs-tools (1:4.3-6ubuntu2) ...
Setting up libapparmor1:amd64 (2.12-4ubuntu8) ...
Setting up systemd (239-7ubuntu10) ...
Setting up udev (239-7ubuntu10) ...
update-initramfs: deferring update (trigger activated)
Setting up snapd (2.35.5+18.10) ...
snapd.failure.service is a disabled or a static unit, not starting it.
snapd.snap-repair.service is a disabled or a static unit, not starting it.
(Reading database ... 66334 files and directories currently installed.)
Preparing to unpack .../00-lxd_1%3a0.4_all.deb ...
Warning: Stopping lxd.service, but it can still be activated by:
  lxd.socket
=> Installing the LXD snap
==> Checking connectivity with the snap store
==> Installing the LXD snap from the latest track for ubuntu-18.10
error: cannot perform the following tasks:
- Setup snap "core" (5548) security profiles (cannot setup udev for snap "core": cannot reload udev rules: exit status 2
udev output:
)
- Setup snap "core" (5548) security profiles (cannot reload udev rules: exit status 2
udev output:
)
dpkg: error processing archive /tmp/apt-dpkg-install-R4N7rz/00-lxd_1%3a0.4_all.deb (--unpack):
 new lxd package pre-installation script subprocess returned error exit status 1
Preparing to unpack .../01-open-iscsi_2.0.874-5ubuntu9_amd64.deb ...
[...]

Interesting to me was that a subsequent
$ apt --fix-broken install
does fix it up.

Might there be an ordering issue in the snap/lxd updates that are not an issue for "real" Bionic->Cosmic upgraders?

(Reading database ... 66334 files and directories currently installed.)
Preparing to unpack .../archives/lxd_1%3a0.4_all.deb ...
Warning: Stopping lxd.service, but it can still be activated by:
  lxd.socket
=> Installing the LXD snap
==> Checking connectivity with the snap store
==> Installing the LXD snap from the latest track for ubuntu-18.10
2018-10-16T08:16:38Z INFO Waiting for restart...
lxd 3.6 from Canonical✓ installed
Channel stable/ubuntu-18.10 for lxd is closed; temporarily forwarding to stable.
==> Cleaning up leftovers
Synchronizing state of lxd.service with SysV service script with /lib/systemd/systemd-sysv-...

Read more...

Revision history for this message
Stuart Bishop (stub) wrote :

I just hit this in a 16.04 container, but for reasons I don't understand installing the core snap first worked around the problem:

$ sudo snap install go --classic
error: cannot perform the following tasks:
- Setup snap "core" (5662) security profiles (cannot setup udev for snap "core": cannot reload udev rules: exit status 2
udev output:
)
- Setup snap "core" (5662) security profiles (cannot reload udev rules: exit status 2
udev output:
)

$ sudo snap install core
core 16-2.35.4 from 'canonical' installed

$ sudo snap install go --classic
go 1.11.1 from Michael Hudson-Doyle (mwhudson) installed

Revision history for this message
Stéphane Graber (stgraber) wrote :

Yeah, we've seen that re-running the command usually gets you past the error, so in your case, just running the "snap install go --classic" would likely have been enough.

Revision history for this message
Marco Trevisan (Treviño) (3v1n0) wrote :

Actually to get this working I only needed to use this:

# Mount cgroup in rw to get snaps working
lxc.mount.auto=cgroup:rw

No need to have whole sys and proc as rw (as the problem is due to the snap to try chowning `/sys/fs/cgroup/freezer/snap.*` dirs, however I'm wondering if there's a better way to do this inside the container itself, since this way I guess that two containers sharing the host would have troubles, isn't it?

Revision history for this message
Stéphane Graber (stgraber) wrote :

Hmm, cgroup:rw has absolutely nothing to do with this.
LXD uses a cgroup namespace by default which completely ignores that particular setting.

With the cgroup namespace, root in the container is allowed to do anything it wants to the /sys/fs/cgroup tree.

root@disco:~# mkdir /sys/fs/cgroup/freezer/snap.blah
root@disco:~# chown 1000:1000 /sys/fs/cgroup/freezer/snap.blah

The error also quite clearly comes from udev rather than anything cgroup related:

root@disco:~# snap install hello-world
error: cannot perform the following tasks:
- Setup snap "core" (6531) security profiles (cannot setup udev for snap "core": cannot reload udev rules: exit status 2
udev output:
)
- Setup snap "core" (6531) security profiles (cannot reload udev rules: exit status 2
udev output:
)
root@disco:~# snap install hello-world
2019-03-27T20:18:56Z INFO Waiting for restart...
hello-world 6.3 from Canonical✓ installed
root@disco:~#

Revision history for this message
Marco Trevisan (Treviño) (3v1n0) wrote :

I was not doing this in lxd, but in an unprivileged lxc (not sure if it changes the things) that I've in my qnap nas, but without it I wasn't able to use snap at all.

I guess it reduces securty, but eventually I'm still protected by the container itself.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Yeah, unprivileged LXC is likely to work pretty differently in the way it handles both cgroups and apparmor namespacing both of which are very relevant when you want to run snaps.

Revision history for this message
Stéphane Graber (stgraber) wrote :

At the last engineering sprint, Zygmunt on the snapd team indicate that this was or would soon be sorted out in snapd.

Changed in lxd (Ubuntu):
status: New → Invalid
Revision history for this message
Ian Johnson (anonymouse67) wrote :

For reference, the PR that Zygmunt had which was planned to fix this was https://github.com/snapcore/snapd/pull/8219, but there were issues with that approach. We need to pick it up again and rework to get an approach which matches the comments from Jamie there.

Revision history for this message
Alireza Nasri (sysnasri) wrote :

when this will be fixed?

Revision history for this message
L&L (bass1957) wrote :

+1
impacting ffmpeg snap when sharing a gpu

Revision history for this message
Marco Trevisan (Treviño) (3v1n0) wrote :

Looks like that the error message is quite misleading... Installing and running snaps in privileged containers works quite well, the problem is that apparently udev needs `/lib/modules/` to be available.

In fact, in a completely new privileged LXD instance:

ubuntu@ubuntu-bp:~$ sudo snap install hello
error: cannot perform the following tasks:
- Setup snap "core" (11993) security profiles (cannot reload udev rules: exit status 2
udev output:
)
ubuntu@ubuntu-bp:~$ sudo mkdir /lib/modules
ubuntu@ubuntu-bp:~$ sudo snap install hello
Download snap "core" (11993) from channel "stable" \error: change finished in status "Undone" with no error message
ubuntu@ubuntu-bp:~$ sudo snap install hello
2021-12-02T13:36:05Z INFO Waiting for automatic snapd restart...
hello 2.10 from Canonical✓ installed
ubuntu@ubuntu-bp:~$ hello
Hello, world!

So I think that this issue is really easy to fix, we just need to ensure that such directory is there.

Revision history for this message
Ian Johnson (anonymouse67) wrote :

@Marco, can you reproduce that behavior without creating the directory? I.e. just start a new instance and then run `snap install hello` twice and see if it works? AFAIK, that has always been the workaround of choice is just running it twice initially for some reason...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers