pivot_root or mounts setup breaks unshare of userns

Bug #1618683 reported by Stéphane Graber
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snap-confine
Fix Released
Critical
Zygmunt Krynicki
snap-confine (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

Snap-Confine uses pivot_root internally. The particular way in which this is done is somewhat tricky and in effect we used to "leak" the old root filesystem. This caused the kernel to assume our process is unsafe and cannot use user namespaces.

The fix includes using unmount2(2) with MNT_DETACH to detach/unmount the old root filesystem.

For more information about the execution environment, please see this article http://www.zygoon.pl/2016/08/snap-execution-environment.html

[Test Case]

The test case can be found here:

https://github.com/snapcore/snap-confine/blob/master/spread-tests/regression/lp-1618683/task.yaml

The test case is ran automatically for each pull request and for each final release. It can be reproduced manually by executing the shell commands listed in the prepare/execute/restore phases manually.
The commands there assume that snapd and snap-confine are installed.
No other additional setup is necessary.

[Regression Potential]

 * Regression potential is minimal. Experienced member of the LXD development team (Stephane Graber) has reported this issue and recommended the fix that we've applied. The same approach is used by LXD.

* The fix was tested on Ubuntu with spread, successfully.

[Other Info]

* This bug is a part of a major SRU that brings snap-confine in Ubuntu 16.04 in line with the current upstream release 1.0.41.

* snap-confine is technically an integral part of snapd which has an SRU exception and is allowed to introduce new features and take advantage of accelerated procedure. For more information see https://wiki.ubuntu.com/SnapdUpdates

== # Pre-SRU bug description follows # ==

Starting around the time ubuntu-core-launcher was transitioned to snap-confine, unsharing a user namespace became impossible.

This is obviously a pretty big deal for LXD.

I've confirmed that this isn't apparmor, seccomp or capabilities getting in the way and I think I tracked it down to a poor implementation of the chroot/pivot_root feature in snap-confine.

There is no code in snap-confine to umount the paths outside of the pivot target. This means the snap mount table then contains a whole lot of unreachable mounts which will be stuck there forever.

This causes us to trip the chroot detection code in the kernel as there are more than one root mount point and a ton of completely unreachable mount entries which makes the kernel think we're in an unsafe environment for a user namespace to be created.

Strace of the current launcher to a basic binary (lxd --help): http://paste.ubuntu.com/23114432/
The mount table for a running LXD process is now: http://paste.ubuntu.com/23114471/

This is also very wasteful, especially considering that snap-confine creates a new namespace for every single command but more importantly, it's going to create a bunch of weird issues on systems using snapd, including potential data loss.

That's because not unmounting unused mount entries (anything outside of your pivot dir), keeps an active reference to them in the kernel. This effectively means that none of those mounts can really be unmounted on the host. The host mount entry will disappear on umount, but attempting to mount again will fail with "already mounted".

It also means that any non-persistent device (USB stick) will never get properly unmounted which may cause data loss.

Tags: lxd
Revision history for this message
Stéphane Graber (stgraber) wrote :

I don't have a way to exploit this in a good way so I'm not marking it as a security issue.

But given that snap-confine is a setuid binary, this effectively allows any unprivileged user to lock all the mount entries on the host. Causing potential data loss for removable media and preventing even root on the host to remount filesystems that were present at the time snap-confine was called.

Revision history for this message
Stéphane Graber (stgraber) wrote :

For a good implementation of how to do pivot_root and dealing with all the complications of pre-existing mounts and things like rshared/rprivate (which likely explain the mess under /tmp), you may want to look at what we wrote for LXC.

tags: added: lxd
Revision history for this message
Stéphane Graber (stgraber) wrote :

I included a first pass on fixing this using LXC's pivot_root implementation: https://github.com/snapcore/snap-confine/pull/122

As mentioned, this doesn't fix everything. There are still a bunch of mounts that you may want to unmount to minimize the size of the mount table. But the problematic ones are definitely gone and LXD works properly with this patch applied.

Zygmunt Krynicki (zyga)
Changed in snap-confine:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 1.0.41
assignee: nobody → Zygmunt Krynicki (zyga)
Zygmunt Krynicki (zyga)
Changed in snap-confine:
status: Triaged → Fix Committed
Zygmunt Krynicki (zyga)
Changed in snap-confine:
status: Fix Committed → Fix Released
Zygmunt Krynicki (zyga)
description: updated
Changed in snap-confine (Ubuntu):
status: New → Fix Released
Changed in snap-confine (Ubuntu Xenial):
status: New → In Progress
Revision history for this message
Leo Arias (elopio) wrote :

I ran the snap-confine test in an up-to-date xenial classic kvm, after enabling proposed and upgrading to snap-confine to 0.43.

I got no errors, looks good.

Changed in snap-confine (Ubuntu Xenial):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.