Comment 10 for bug 1639345

Revision history for this message
Christian Brauner (cbrauner) wrote :

Ok, so for the second part, preventing the /proc attack. We need the procfd to the host's namespace so that we can set the lsm label later on in attach_child_main(). The obvious solution that we've shortly discussed at Plumbers is to mount a fresh proc in a new mount and pid namespace. However, we cannot simply unshare(CLONE_NEWNS|CLONE_NEWPID) and then umount /proc since we then are not really part of the pid namespace and that leads to all sorts of complications. You can try that out with

unshare -Ump --mount-proc -- sh

mounting a fresh /proc seems to only work correctly if you fork()/clone() first. That's why I suspect that the cleanest solution is to create a minimal namespace (similar to the lxcfs solution), mount /proc in there and then open an fd to that. In essence, this would mean that we would have to lxc_clone(chroot_us_and_clone_again, ..., CLONE_PARENT) in and chroot us, and then call lxc_clone(attach_child_main, &payload, CLONE_PARENT) within chroot_us_and_clone_again() to exec the actual process we want to run in the container. We would effectively create an additional 4th process in attach. This *could* work: Due to CLONE_PARENT both processes should have the same parent and we should be able to selectively wait on our chroot_us_and_clone_again() child in lxc_attach() while handing back the PID of the exec'ing process back to our caller. So that when someone straces us and finds a way to escape to the namespace of lxc_attach() he would effectively find himself safely chrooted.