fuse filesystems get disconnected on container exit

Bug #1402834 reported by Serge Hallyn on 2014-12-15
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Seth Forshee

Bug Description

When bind-mounting a directory from a fuse filesytems into a container,
then when the container is shut down, the userspace process serving the
fuse fs is terminated. The original fuse mountpoint remains busy until it
is manually unmounted.

I've tested this with sshfs, git://github.com/stgraber/cgmanagerfs,
the bbfs example fs from http://www.cs.nmsu.edu/~pfeiffer/fuse-tutorial/,
or git://github.com/lxc/lxcfs.

To reproduce:

Mount a fusefs - say sshfs - with -o allow_other, let's say onto /tmp/d.

sshfs -f -d -o allow_other somehost:$HOME /tmp/d

Bind that into a container by adding

lxc.mount.entry = /tmp/d freezer none bind,create=dir 0 0

to the container's config.

start the container, stop it.

the fuse program stops (exits 0 in fact)

the mount is not cleaned up - ls /tmp/d on the host henceforth complains:

 ls: cannot access /tmp/d Transport endpoint is not connected"

(sudo umount /tmp/d cleans it up)

I don't know for sure whether this is a kernel or libfuse bug.

Seth Forshee (sforshee) on 2014-12-15
Changed in linux (Ubuntu):
assignee: nobody → Seth Forshee (sforshee)
importance: Undecided → Medium
status: New → In Progress
status: In Progress → Confirmed
tags: added: kernel-da-key
Andy Whitcroft (apw) wrote :

The big question is whether we tried to unmount the mount to trigger the exit. If we just killed the fuse server then the kernel behaviour seems valid. If we attempted to unmount it, that let the server go, then the umount failed, that sounds wrong.

Quoting Andy Whitcroft (<email address hidden>):
> The big question is whether we tried to unmount the mount to trigger the
> exit.

Manually reproducing these steps doesn't seem to reproduce it:

sshfs -f -d -o allow_other somehost:$HOME /tmp/d
lxc-unshare -s MOUNT -- /bin/bash
 mount --bind /tmp/d /mnt
 umount /mnt
 umount /tmp/d
 exit
# sshfs is still running

> If we just killed the fuse server then the kernel behaviour seems
> valid.

The fuse server is running on the host, not in the container. We do
not kill the fuse server. It seems as though somehow sees some event
which makes it think it should exit.

> If we attempted to unmount it, that let the server go, then the
> umount failed, that sounds wrong.

We don't do anything to let the mount on the server go. Userspace
in the container does what it does on shutdown (sync;sync;umount
or whatever)

Stéphane Graber (stgraber) wrote :

So the problem is that a force unmount of a bind-mount of a fuse filesystem somehow gets the kernel to send the "destroy" command back to the user space process running the filesystem. This behavior is clearly wrong.

As an example, lets say that I'm running "lxcfs" as a fuse filesystem on my system. The mount is visible to everyone on the system and then as a nobody user I can unshare my user namespace, unshare my mount namespace, bind-mount that filesystem to say /mnt and then force unmount it and it will destroy the fuse filesystem entirely, getting the process that backs it to exit and preventing anyone on the system from accessing it.

A simple reproducer is: echo "mount --bind /var/lib/lxcfs /mnt && umount -f /mnt" | lxc-usernsexec -- /bin/bash

Stéphane Graber (stgraber) wrote :

I don't have any good example in mind of fuse being used in that manner (system wide user accessible filesystem) but if there was, this would be a potential security issue against them. Once we figure out the root cause of this and fix it, it may be worth considering this a security fix.

Serge Hallyn (serge-hallyn) wrote :

My guess is that this has been deemed a non-issue until now because '-o allow_other' is not the norm.

The kernel code doing this is straightforward: fs/namespace.;do_umount() calls sb->umount_begin() if MNT_FORCE is specified; fs/fuse/inode.c:fuse_umount_begin() calls: fuse_abort_conn(get_fuse_conn_super(sb))

So the answer presumable is for fuse_umount_begin() to only call fuse_abort_conn() if the caller is in fact the owner of the sb? If so, go ahead. If not, and if there are other remaining mounts, do nothing. Or finally, if not, and there are no other remaining mounts, then go ahead.

Stéphane Graber (stgraber) wrote :

Hmm, I can reproduce the exact same thing even without allow_other.

Sure, my user is getting permission denied if it attempts to read from the fs, but it can still bind-mount it and then cause it to die by doing a force unmount.

Seth Forshee (sforshee) wrote :

Note that it's also possible to do this without namespaces at all, and it definitely seems to be force unmount which makes it happen.

$ fuseext2 ext2.img /tmp/d
$ sudo mount --bind /tmp/d mount
$ sudo umount -f mount
$ ls /tmp/d
ls: cannot access /tmp/d: Transport endpoint is not connected

Other filesystems which implement umount_begin such as nfs also seem to start tearing down their connections in the callback, so I suspect they might behave similarly in this scenario. I'm going to test with nfs to verify.

I don't really think that fuse can detect this situation. It seems like do_umount potentially needs to handle this differently in the case of a bind mount, but I'm not actually sure whether or not the behavior here is actually what's expected to happen.

The other question is whether or not lxc acutally needs to use MNT_FORCE when unmounting. What's the reason for doing so?

Serge Hallyn (serge-hallyn) wrote :

Right,

so as discussed on irc the MNT_FORCE should probably be ignored so long as
there are mounts in other namespaces. (Ideally we could have a concept of
a 'master' namespace where the MNT_FORCE could be done anyway, but that
isn't possible AIUI in the kernel)

Serge Hallyn (serge-hallyn) wrote :

> The other question is whether or not lxc acutally needs to use MNT_FORCE
> when unmounting. What's the reason for doing so?

lxc doesn't. Ubuntu does. But the point is that any user can disconnect
any other user's fuse connections, so this now looks much more serious
than it did at first. Joe can disconnect my sshfs.

Seth Forshee (sforshee) wrote :

I didn't see the irc discussion. Atm we don't have a concept of a owner or "master" namespace for a super block, though I expect we will see it in the future. And I agree it doesn't seem to make sense to let a less privileged userns to do this to a more privileged namespace. However if we ignore MNT_FORCE if there are mounts in any other namespace this would allow a lesser privileged namespace to block MNT_FORCE for a more privileged one, which is also undesirable.

Tyler Hicks (tyhicks) wrote :

Serge asked me about potentially using an AppArmor umount rule to prevent forced umounts in the container. After I looked at the AppArmor parser code, I realized that it doesn't properly support umount rules (note that mount rules are properly supported). I've created bug #1403968 to track this AppArmor issue.

Stéphane Graber (stgraber) wrote :

So I came up with an alternate way around this which works for both privileged and unprivileged containers and doesn't require an updated apparmor. This uses seccomp to filter the umount2 call and return EACCES when passed MNT_FORCE as second argument.

Code is at: http://paste.ubuntu.com/9568741/

stgraber@castiana:~/Desktop$ gcc sec-mount.c -o sec-mount -lseccomp
stgraber@castiana:~/Desktop$ cp sec-mount /tmp/
stgraber@castiana:~/Desktop$ lxc-usernsexec -- /tmp/sec-mount
root@castiana:~/Desktop# mount --bind /home/stgraber/ /mnt
root@castiana:~/Desktop# umount /mnt
root@castiana:~/Desktop# mount --bind /home/stgraber/ /mnt
root@castiana:~/Desktop# umount -f /mnt
umount2: Permission denied
umount: /mnt: block devices not permitted on fs
root@castiana:~/Desktop# exit

Stéphane Graber (stgraber) wrote :

SCMP_CMP_MASKED_EQ should be used to restrict MNT_FORCE regardless of what other mntflags are passed, though I'm failing to find the right syntax for it...

Serge Hallyn (serge-hallyn) wrote :

Patches sent to the lxc-devel mailing list to follow up on stgraber's idea.

So between this and Eric's patch to prevent anyone but root from doing umount -f, this should become non-urgent.

But it still seems wrong.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers