Comment 0 for bug 1642767

Revision history for this message
Forest (foresto) wrote : starting any container with umask 007 breaks lxc-stop and prevents host system shutdown

If I run lxc-start with umask 007 (or any other value that masks the world-execute bit), my host system enters a state with the following problems:

* lxc-stop hangs forever instead of stopping any container, even one that wasn't started with umask 007.
* lxc-stop --kill --nolock hangs in the same way.
* Attempts to reboot or shut down my host system fail, requiring a hard reset to recover.

When lxc-stop hangs, messages like these appear in syslog every couple of minutes:

Nov 17 01:22:11 hostbox kernel: [ 3360.091624] INFO: task systemd:12179 blocked for more than 120 seconds.
Nov 17 01:22:11 hostbox kernel: [ 3360.091629] Tainted: P OE 4.4.0-47-generic #68-Ubuntu
Nov 17 01:22:11 hostbox kernel: [ 3360.091631] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 17 01:22:11 hostbox kernel: [ 3360.091633] systemd D ffff8800c6febb58 0 12179 12168 0x00000104
Nov 17 01:22:11 hostbox kernel: [ 3360.091638] ffff8800c6febb58 ffff8800d318d280 ffff88040c649b80 ffff8800d318d280
Nov 17 01:22:11 hostbox kernel: [ 3360.091641] ffff8800c6fec000 ffff8800345bc088 ffff8800345bc070 ffffffff00000000
Nov 17 01:22:11 hostbox kernel: [ 3360.091644] fffffffe00000001 ffff8800c6febb70 ffffffff81830f15 ffff8800d318d280
Nov 17 01:22:11 hostbox kernel: [ 3360.091647] Call Trace:
Nov 17 01:22:11 hostbox kernel: [ 3360.091653] [<ffffffff81830f15>] schedule+0x35/0x80
Nov 17 01:22:11 hostbox kernel: [ 3360.091657] [<ffffffff81833b62>] rwsem_down_write_failed+0x202/0x350
Nov 17 01:22:11 hostbox kernel: [ 3360.091662] [<ffffffff812899a0>] ? kernfs_sop_show_options+0x40/0x40
Nov 17 01:22:11 hostbox kernel: [ 3360.091666] [<ffffffff81403fa3>] call_rwsem_down_write_failed+0x13/0x20
Nov 17 01:22:11 hostbox kernel: [ 3360.091669] [<ffffffff8183339d>] ? down_write+0x2d/0x40
Nov 17 01:22:11 hostbox kernel: [ 3360.091672] [<ffffffff812104a0>] grab_super+0x30/0xa0
Nov 17 01:22:11 hostbox kernel: [ 3360.091674] [<ffffffff81210a32>] sget_userns+0x152/0x450
Nov 17 01:22:11 hostbox kernel: [ 3360.091677] [<ffffffff81289a20>] ? kernfs_sop_show_path+0x50/0x50
Nov 17 01:22:11 hostbox kernel: [ 3360.091680] [<ffffffff81289c8e>] kernfs_mount_ns+0x7e/0x230
Nov 17 01:22:11 hostbox kernel: [ 3360.091685] [<ffffffff811187ab>] cgroup_mount+0x2eb/0x7f0
Nov 17 01:22:11 hostbox kernel: [ 3360.091687] [<ffffffff81211af8>] mount_fs+0x38/0x160
Nov 17 01:22:11 hostbox kernel: [ 3360.091691] [<ffffffff8122db57>] vfs_kern_mount+0x67/0x110
Nov 17 01:22:11 hostbox kernel: [ 3360.091694] [<ffffffff81230329>] do_mount+0x269/0xde0
Nov 17 01:22:11 hostbox kernel: [ 3360.091698] [<ffffffff812311cf>] SyS_mount+0x9f/0x100
Nov 17 01:22:11 hostbox kernel: [ 3360.091701] [<ffffffff81834ff2>] entry_SYSCALL_64_fastpath+0x16/0x71

When system shutdown hangs, similar messages appear on the console every couple of minutes.

I'm running lxc 2.0.5-0ubuntu1~ubuntu16.04.2 on xubuntu 16.04.1 LTS amd64.

My containers are all unprivileged.

I can reproduce this at will with either an old-ish container or a fresh new one.

My umask at container creation time does not seem to matter. As far as I have seen, my umask only matters the first time I start a container in my login session.

I can work around the bug by manually setting my umask to something more permissive before I start my first container of the day, and then setting it back again, but that's rather a hassle. (Even worse, it's very easy to forget this workaround and be left with containers that can't be stopped and a host system that won't shut down cleanly.)