systemd--networkd mounts denied for lxc guest

Bug #1811248 reported by km
64
This bug affects 11 people
Affects Status Importance Assigned to Milestone
apparmor (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Host unbuntu cosmic | lxc 3.0.3 | aa 2.12 | systemd 239-7
Guest Arch Linux | systemd 240.0

After having upgraded in the guest systemd from 239.370 to 240.0 the host's AA is exhibiting

> audit: type=1400 audit(1547125168.853:722): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-container-default-cgns" name="/" pid=8426 comm="(networkd)" flags="rw, rslave"

and the guest

> systemd-networkd.service: Failed to set up mount namespacing: Permission denied
> systemd-networkd.service: Failed at step NAMESPACE spawning /usr/lib/systemd/systemd-networkd: Permission denied

According to lxc bug tracker https://github.com/lxc/lxc/issues/2778

> While we'd like to allow such mounts we cannot do so until the apparmor_parser is fixed to handle them correctly.

other cross references

https://github.com/systemd/systemd/issues/11371
https://bugs.archlinux.org/task/61313

Revision history for this message
Seth Arnold (seth-arnold) wrote :

Could you add to this bug which mount flags are being used by the mount(2) system call that's failed and which mount rules are in the profile? I couldn't find either information in the linked bugs.

Thanks

Revision history for this message
km (n8v8r) wrote :

profile="lxc-container-default-cgns"

profile lxc-container-default-cgns flags=(attach_disconnected,mediate_deleted) {
  #include <abstractions/lxc/container-base>

  # the container may never be allowed to mount devpts. If it does, it
  # will remount the host's devpts. We could allow it to do it with
  # the newinstance option (but, right now, we don't).
  deny mount fstype=devpts,
  mount fstype=cgroup -> /sys/fs/cgroup/**,
  mount fstype=cgroup2 -> /sys/fs/cgroup/**,
}

__

> flags are being used by the mount(2) system call that's failed

Pardon my ignorance as not being sure what you are asking here. I thought it was obvious from the log

pid=8426 comm="(networkd)" flags="rw, rslave"

Revision history for this message
Seth Arnold (seth-arnold) wrote : Re: [Bug 1811248] Re: systemd--networkd mounts denied for lxc guest

On Fri, Jan 11, 2019 at 02:36:30AM -0000, km wrote:
> profile="lxc-container-default-cgns"
>
> profile lxc-container-default-cgns flags=(attach_disconnected,mediate_deleted) {
> #include <abstractions/lxc/container-base>
>
> # the container may never be allowed to mount devpts. If it does, it
> # will remount the host's devpts. We could allow it to do it with
> # the newinstance option (but, right now, we don't).
> deny mount fstype=devpts,
> mount fstype=cgroup -> /sys/fs/cgroup/**,
> mount fstype=cgroup2 -> /sys/fs/cgroup/**,
> }

Thanks.

> > flags are being used by the mount(2) system call that's failed
>
> Pardon my ignorance as not being sure what you are asking here. I
> thought it was obvious from the log
>
> pid=8426 comm="(networkd)" flags="rw, rslave"

It's my ignorance here -- I don't know if AppArmor's log message is
sufficient to reconstruct the actual mount() syscall that the process
has performed -- and I don't know if the extra parameters that may be
in the syscall are important or not.

If you could catch the mount() syscall with strace that'd be beautiful.

Thanks

Revision history for this message
km (n8v8r) wrote :

strace does not seem to be the tool to figure out the info you are asking for. Considering that the pid of the involved processes would be unknown at the time of starting strace. And executing the process(es) from the cli along with strace will not bear fruit for the case.

Going back to the log message I would reckon that MOUNT_NAMESPACES is in play, in particular recursive MS_SLAVE. Would be that be supported by AA in general and with the profile in particular?

Revision history for this message
km (n8v8r) wrote :

Some further input from the lxc dev team:

> What systemd wants to do is the equivalent of executing mount --make-rslave / on the commandline. The syscall from systemd specifically AFAICT is: mount(NULL, "/", NULL, MS_REC|MS_SLAVE, NULL);
As for the AppArmor profile rule, see https://github.com/lxc/lxc/blob/master/config/apparmor/abstractions/container-base.in#L94

I've pinged jjohansen from the AppArmor devs on irc about it and am hoping he's gonna find the time to dig into this soon.

Revision history for this message
km (n8v8r) wrote :

This issue accelerating/cascading to the extent that that the lxc arch linux guest is now entirely dead

https://bugs.archlinux.org/task/61428

Revision history for this message
Marcin Longlastname (hak8or) wrote :

Going further, for those who are running arch containers in proxmox who reach here after googling via getting a message similar to this:

[ 2204.273155] audit: type=1400 audit(1548030556.960:100): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-101_</var/lib/lxc>" name="/" pid=26493comm="(networkd)" flags="rw, rslave"

the github link in the beginning has discussion on workarounds for the meantime:

https://github.com/lxc/lxc/issues/2778#issuecomment-455199160

I attempted to just modify "mount options=(rw,make-rslave)," in "/etc/apparmor.d/abstractions/lxc/container-base" which did not work sadly since the file "/var/lib/lxc/102/apparmor/lxc-{YOUR_CONTAINER_ID}_\<-var-lib-lxc\>" that is created when starting the container keeps the old commented out version of that line, even after rebooting the host. Instead, I ended up just adding "lxc.apparmor.profile: unconfined" to the "/etc/pve/lxc/{YOUR_CONTAINER_ID}.conf" file for each container and then restarting the container which disabled apparmor for all your containers which while terrible security wise, at least I get my containers back up while waiting for a bug fix.

Revision history for this message
km (n8v8r) wrote :

https://github.com/lxc/lxd/issues/5439#issuecomment-461257784

> The fix in LXD is only partial because there's currently no safe way for us to fix that for privileged containers due to an apparmor parser bug that the AppArmor team is still working on.

So we've made the change only to the unprivileged policy for now as the AppArmor bug isn't causing too much damage in that case.

There's no such distinction in profile in LXC, so putting those same lines in the LXC policy would allow every user to bypass all mount protections, which isn't acceptable from a security point of view.
So the LXC fix is effectively blocked on the AppArmor security bug being resolved first.

Revision history for this message
km (n8v8r) wrote :

Whilst 'lxc.apparmor.profile: unconfined' appears the only way to keep unprivileged lxc guests with systemd v240 alive it defeats the purpose of AppArmor.

Notwithstanding, the tail riding on this bug

https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1813622
https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=030919ba5e4931d6ee576d0259fae67fe4ed9770

Revision history for this message
km (n8v8r) wrote :

adding cross reference

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=916639#85

> I think that disabling AppArmor by default for new LXC containers for
Buster would be an OK-ish fallback option, if nothing else can
realistically be made to work in time for the freeze; that would be
sad, but it would not be a regression vs. Stretch.

Revision history for this message
km (n8v8r) wrote :

After having upgraded the host to:

unbuntu disco (19.04) | kernel 5.0.0-13 | aa 2.13.2-9 | systemd 240-6

the issue is still present

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in apparmor (Ubuntu):
status: New → Confirmed
Revision history for this message
Vaidyanathan L S (vaidyas) wrote :
Download full text (4.5 KiB)

I noticed this crash 4 times today. To recreate, ran the following commands (almost always in this order)

5.11.0-41-generic | 20.04.1-Ubuntu | x86_64 | x86_64 | x86_64

$ sudo lxc-ls
$ sudo lxc-start -n test
$ sudo lxc-ls -f
$ sudo lxc-console -n test
$ sudo lxc-stop test
$ sudo lxc-ls -f
$ sudo lxc-ls -f

syslog:
Dec 1 19:09:53 ThinkPad kernel: [ 4503.306174] kauditd_printk_skb: 24 callbacks suppressed
Dec 1 19:09:53 ThinkPad kernel: [ 4503.306177] audit: type=1400 audit(1638365993.337:184): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-container-default-cgns" name="/" pid=22258 comm="mount" flags="rw, remount"
Dec 1 19:09:53 ThinkPad kernel: [ 4503.325009] audit: type=1400 audit(1638365993.357:185): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-container-default-cgns" name="/snap/" pid=22277 comm="mount" flags="rw, shared"
Dec 1 19:09:53 ThinkPad kernel: [ 4503.325689] audit: type=1400 audit(1638365993.357:186): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-container-default-cgns" name="/" pid=22278 comm="(md-udevd)" flags="rw, rslave"
Dec 1 19:09:56 ThinkPad kernel: [ 4506.767508] audit: type=1400 audit(1638365996.801:187): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-container-default-cgns" name="/" pid=22302 comm="(md-udevd)" flags="rw, rslave"
Dec 1 19:10:40 ThinkPad kernel: [ 4550.606075] lxcbr0: port 1(vethEY9smt) entered disabled state
Dec 1 19:10:40 ThinkPad kernel: [ 4550.607657] device vethEY9smt left promiscuous mode
Dec 1 19:10:40 ThinkPad kernel: [ 4550.607661] lxcbr0: port 1(vethEY9smt) entered disabled state
Dec 1 19:11:09 ThinkPad kernel: [ 4579.319750] systemd[22860]: NetworkManager-dispatcher.service: Failed to connect stdout to the journal socket, ignoring: Connection refused
Dec 1 19:12:09 ThinkPad kernel: [ 4639.237516] fbcon: Taking over console
Dec 1 19:12:09 ThinkPad kernel: [ 4639.243508] Console: switching to colour frame buffer device 240x67
Dec 1 19:12:09 ThinkPad kernel: [ 4639.522393] systemd[23043]: user-runtime-dir@125.service: Failed to connect stdout to the journal socket, ignoring: Connection refused
Dec 1 19:12:09 ThinkPad kernel: [ 4639.558561] systemd[23062]: user@125.service: Failed to connect stdout to the journal socket, ignoring: Connection refused

apport.log:
ERROR: apport (pid 22986) Wed Dec 1 19:12:08 2021: called for pid 916, signal 6, core limit 0, dump mode 1
ERROR: apport (pid 22986) Wed Dec 1 19:12:08 2021: executable: /usr/lib/systemd/systemd-timesyncd (command line "/lib/systemd/systemd-timesyncd")
ERROR: apport (pid 22986) Wed Dec 1 19:12:08 2021: is_closing_session(): no DBUS_SESSION_BUS_ADDRESS in environment
ERROR: apport (pid 22986) Wed Dec 1 19:12:08 2021: apport: report /var/crash/_usr_lib_systemd_systemd-timesyncd.102.crash already exists and unseen, doing nothing to avoid disk usage DoS
ERROR: apport (pid 22992) Wed Dec 1 19:12:09 2021: called for pid 1007, signal 6, core limit 0, dump mode 1
ERROR: apport (pid 22992) Wed Dec 1 19:12:09 2021: executable: /usr/lib/systemd/systemd-logind (command line "/lib...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.