Comment 4 for bug 1950787

Revision history for this message
Stéphane Graber (stgraber) wrote :

If this only fails in privileged containers, then I probably wouldn't worry about it too much, those aren't the default and a LOT of things break in privileged containers, so I don't think it's worth doing distro changes to accommodate this, assuming the container otherwise still boots.

For cases like this one, it's usually been hard to make a solid case for a change of behavior in upstream systemd. There are a few places like the devices cgroup where permission errors are considered non-fatal which then accommodates containers quite well, but the same isn't true with the isolation security features which this one ties into.

In an ideal world, AppArmor would allow us to craft a policy which:
 - Allows for mount namespaces
 - Allows for bind-mounts of restricted paths
 - Applies the parent's policy onto the bind-mount target
 - Properly support mount propagation flags in a way that can't be abuse to allow all mounts

But as it stands, AppArmor is entirely path based, so a policy that applies to /proc will not apply to /proc bind-mount to /blah/proc (which is effectively what systemd does) and so causes all confinement to be bypassable. Additionally, there are (or were in some versions at least) issues with processing those mount propagation flags you see in your log (shared/slave/...) and allowing a bind-mount to be marked using one of those flags would incorrectly cause the parser or the kernel (not quite sure which) to allow ALL mounts...