systemd--networkd mounts denied for lxc guest

Bug #1811248 reported by km on 2019-01-10
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
apparmor (Ubuntu)
Undecided
Unassigned

Bug Description

Host unbuntu cosmic | lxc 3.0.3 | aa 2.12 | systemd 239-7
Guest Arch Linux | systemd 240.0

After having upgraded in the guest systemd from 239.370 to 240.0 the host's AA is exhibiting

> audit: type=1400 audit(1547125168.853:722): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-container-default-cgns" name="/" pid=8426 comm="(networkd)" flags="rw, rslave"

and the guest

> systemd-networkd.service: Failed to set up mount namespacing: Permission denied
> systemd-networkd.service: Failed at step NAMESPACE spawning /usr/lib/systemd/systemd-networkd: Permission denied

According to lxc bug tracker https://github.com/lxc/lxc/issues/2778

> While we'd like to allow such mounts we cannot do so until the apparmor_parser is fixed to handle them correctly.

other cross references

https://github.com/systemd/systemd/issues/11371
https://bugs.archlinux.org/task/61313

Seth Arnold (seth-arnold) wrote :

Could you add to this bug which mount flags are being used by the mount(2) system call that's failed and which mount rules are in the profile? I couldn't find either information in the linked bugs.

Thanks

km (n8v8r) wrote :

profile="lxc-container-default-cgns"

profile lxc-container-default-cgns flags=(attach_disconnected,mediate_deleted) {
  #include <abstractions/lxc/container-base>

  # the container may never be allowed to mount devpts. If it does, it
  # will remount the host's devpts. We could allow it to do it with
  # the newinstance option (but, right now, we don't).
  deny mount fstype=devpts,
  mount fstype=cgroup -> /sys/fs/cgroup/**,
  mount fstype=cgroup2 -> /sys/fs/cgroup/**,
}

__

> flags are being used by the mount(2) system call that's failed

Pardon my ignorance as not being sure what you are asking here. I thought it was obvious from the log

pid=8426 comm="(networkd)" flags="rw, rslave"

On Fri, Jan 11, 2019 at 02:36:30AM -0000, km wrote:
> profile="lxc-container-default-cgns"
>
> profile lxc-container-default-cgns flags=(attach_disconnected,mediate_deleted) {
> #include <abstractions/lxc/container-base>
>
> # the container may never be allowed to mount devpts. If it does, it
> # will remount the host's devpts. We could allow it to do it with
> # the newinstance option (but, right now, we don't).
> deny mount fstype=devpts,
> mount fstype=cgroup -> /sys/fs/cgroup/**,
> mount fstype=cgroup2 -> /sys/fs/cgroup/**,
> }

Thanks.

> > flags are being used by the mount(2) system call that's failed
>
> Pardon my ignorance as not being sure what you are asking here. I
> thought it was obvious from the log
>
> pid=8426 comm="(networkd)" flags="rw, rslave"

It's my ignorance here -- I don't know if AppArmor's log message is
sufficient to reconstruct the actual mount() syscall that the process
has performed -- and I don't know if the extra parameters that may be
in the syscall are important or not.

If you could catch the mount() syscall with strace that'd be beautiful.

Thanks

km (n8v8r) wrote :

strace does not seem to be the tool to figure out the info you are asking for. Considering that the pid of the involved processes would be unknown at the time of starting strace. And executing the process(es) from the cli along with strace will not bear fruit for the case.

Going back to the log message I would reckon that MOUNT_NAMESPACES is in play, in particular recursive MS_SLAVE. Would be that be supported by AA in general and with the profile in particular?

km (n8v8r) wrote :

Some further input from the lxc dev team:

> What systemd wants to do is the equivalent of executing mount --make-rslave / on the commandline. The syscall from systemd specifically AFAICT is: mount(NULL, "/", NULL, MS_REC|MS_SLAVE, NULL);
As for the AppArmor profile rule, see https://github.com/lxc/lxc/blob/master/config/apparmor/abstractions/container-base.in#L94

I've pinged jjohansen from the AppArmor devs on irc about it and am hoping he's gonna find the time to dig into this soon.

km (n8v8r) wrote :

This issue accelerating/cascading to the extent that that the lxc arch linux guest is now entirely dead

https://bugs.archlinux.org/task/61428

Marcin Longlastname (hak8or) wrote :

Going further, for those who are running arch containers in proxmox who reach here after googling via getting a message similar to this:

[ 2204.273155] audit: type=1400 audit(1548030556.960:100): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-101_</var/lib/lxc>" name="/" pid=26493comm="(networkd)" flags="rw, rslave"

the github link in the beginning has discussion on workarounds for the meantime:

https://github.com/lxc/lxc/issues/2778#issuecomment-455199160

I attempted to just modify "mount options=(rw,make-rslave)," in "/etc/apparmor.d/abstractions/lxc/container-base" which did not work sadly since the file "/var/lib/lxc/102/apparmor/lxc-{YOUR_CONTAINER_ID}_\<-var-lib-lxc\>" that is created when starting the container keeps the old commented out version of that line, even after rebooting the host. Instead, I ended up just adding "lxc.apparmor.profile: unconfined" to the "/etc/pve/lxc/{YOUR_CONTAINER_ID}.conf" file for each container and then restarting the container which disabled apparmor for all your containers which while terrible security wise, at least I get my containers back up while waiting for a bug fix.

km (n8v8r) wrote :

https://github.com/lxc/lxd/issues/5439#issuecomment-461257784

> The fix in LXD is only partial because there's currently no safe way for us to fix that for privileged containers due to an apparmor parser bug that the AppArmor team is still working on.

So we've made the change only to the unprivileged policy for now as the AppArmor bug isn't causing too much damage in that case.

There's no such distinction in profile in LXC, so putting those same lines in the LXC policy would allow every user to bypass all mount protections, which isn't acceptable from a security point of view.
So the LXC fix is effectively blocked on the AppArmor security bug being resolved first.

km (n8v8r) wrote :

Whilst 'lxc.apparmor.profile: unconfined' appears the only way to keep unprivileged lxc guests with systemd v240 alive it defeats the purpose of AppArmor.

Notwithstanding, the tail riding on this bug

https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1813622
https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=030919ba5e4931d6ee576d0259fae67fe4ed9770

km (n8v8r) wrote :

adding cross reference

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=916639#85

> I think that disabling AppArmor by default for new LXC containers for
Buster would be an OK-ish fallback option, if nothing else can
realistically be made to work in time for the freeze; that would be
sad, but it would not be a regression vs. Stretch.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.