pivot_root or mounts setup breaks unshare of userns
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
snap-confine |
Fix Released
|
Critical
|
Zygmunt Krynicki | ||
snap-confine (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Xenial |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Impact]
Snap-Confine uses pivot_root internally. The particular way in which this is done is somewhat tricky and in effect we used to "leak" the old root filesystem. This caused the kernel to assume our process is unsafe and cannot use user namespaces.
The fix includes using unmount2(2) with MNT_DETACH to detach/unmount the old root filesystem.
For more information about the execution environment, please see this article http://
[Test Case]
The test case can be found here:
https:/
The test case is ran automatically for each pull request and for each final release. It can be reproduced manually by executing the shell commands listed in the prepare/
The commands there assume that snapd and snap-confine are installed.
No other additional setup is necessary.
[Regression Potential]
* Regression potential is minimal. Experienced member of the LXD development team (Stephane Graber) has reported this issue and recommended the fix that we've applied. The same approach is used by LXD.
* The fix was tested on Ubuntu with spread, successfully.
[Other Info]
* This bug is a part of a major SRU that brings snap-confine in Ubuntu 16.04 in line with the current upstream release 1.0.41.
* snap-confine is technically an integral part of snapd which has an SRU exception and is allowed to introduce new features and take advantage of accelerated procedure. For more information see https:/
== # Pre-SRU bug description follows # ==
Starting around the time ubuntu-
This is obviously a pretty big deal for LXD.
I've confirmed that this isn't apparmor, seccomp or capabilities getting in the way and I think I tracked it down to a poor implementation of the chroot/pivot_root feature in snap-confine.
There is no code in snap-confine to umount the paths outside of the pivot target. This means the snap mount table then contains a whole lot of unreachable mounts which will be stuck there forever.
This causes us to trip the chroot detection code in the kernel as there are more than one root mount point and a ton of completely unreachable mount entries which makes the kernel think we're in an unsafe environment for a user namespace to be created.
Strace of the current launcher to a basic binary (lxd --help): http://
The mount table for a running LXD process is now: http://
This is also very wasteful, especially considering that snap-confine creates a new namespace for every single command but more importantly, it's going to create a bunch of weird issues on systems using snapd, including potential data loss.
That's because not unmounting unused mount entries (anything outside of your pivot dir), keeps an active reference to them in the kernel. This effectively means that none of those mounts can really be unmounted on the host. The host mount entry will disappear on umount, but attempting to mount again will fail with "already mounted".
It also means that any non-persistent device (USB stick) will never get properly unmounted which may cause data loss.
Changed in snap-confine: | |
status: | New → Triaged |
importance: | Undecided → Critical |
milestone: | none → 1.0.41 |
assignee: | nobody → Zygmunt Krynicki (zyga) |
Changed in snap-confine: | |
status: | Triaged → Fix Committed |
Changed in snap-confine: | |
status: | Fix Committed → Fix Released |
description: | updated |
Changed in snap-confine (Ubuntu): | |
status: | New → Fix Released |
Changed in snap-confine (Ubuntu Xenial): | |
status: | New → In Progress |
Changed in snap-confine (Ubuntu Xenial): | |
status: | In Progress → Fix Released |
I don't have a way to exploit this in a good way so I'm not marking it as a security issue.
But given that snap-confine is a setuid binary, this effectively allows any unprivileged user to lock all the mount entries on the host. Causing potential data loss for removable media and preventing even root on the host to remount filesystems that were present at the time snap-confine was called.