LXD bootstrap issues on xenial

Bug #1551854 reported by Casey Marshall
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lxd (Ubuntu)
Fix Released
High
Stéphane Graber

Bug Description

I'm using lxd with zfs block storage on xenial, and having issues with trusty containers. I've witnessed this problem when trying to bootstrap as well as after rebooting the host and a container failed to start.

In the latter case, the container that failed to start was the juju controller:

c@mawhrin-skel:~/omnibus-layers$ lxc list [3/3]
+-----------------------------------------------------+---------+--------------------------------+------+------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+-----------------------------------------------------+---------+--------------------------------+------+------------+-----------+
| juju-5f4bd172-ad22-4726-8d84-47185ab31b54-machine-0 | STOPPED | | | PERSISTENT | 0 |
+-----------------------------------------------------+---------+--------------------------------+------+------------+-----------+
| juju-5f4bd172-ad22-4726-8d84-47185ab31b54-machine-1 | RUNNING | 10.0.3.28 (eth0) | | PERSISTENT | 0 |
+-----------------------------------------------------+---------+--------------------------------+------+------------+-----------+
| juju-5f4bd172-ad22-4726-8d84-47185ab31b54-machine-2 | RUNNING | 10.0.3.85 (eth0) | | PERSISTENT | 0 |
+-----------------------------------------------------+---------+--------------------------------+------+------------+-----------+
| juju-5f4bd172-ad22-4726-8d84-47185ab31b54-machine-3 | RUNNING | 10.0.3.176 (eth0) | | PERSISTENT | 0 |
+-----------------------------------------------------+---------+--------------------------------+------+------------+-----------+
| juju-5f4bd172-ad22-4726-8d84-47185ab31b54-machine-4 | RUNNING | 10.0.3.66 (eth0) | | PERSISTENT | 0 |
+-----------------------------------------------------+---------+--------------------------------+------+------------+-----------+
| juju-5f4bd172-ad22-4726-8d84-47185ab31b54-machine-5 | RUNNING | 10.0.3.31 (eth0) | | PERSISTENT | 0 |
+-----------------------------------------------------+---------+--------------------------------+------+------------+-----------+
| juju-5f4bd172-ad22-4726-8d84-47185ab31b54-machine-6 | RUNNING | 10.0.3.196 (eth0) | | PERSISTENT | 0 |
+-----------------------------------------------------+---------+--------------------------------+------+------------+-----------+
| juju-5f4bd172-ad22-4726-8d84-47185ab31b54-machine-7 | RUNNING | 10.0.3.186 (eth0) | | PERSISTENT | 0 |
| | | 10.0.4.1 (lxcbr0) | | | |
+-----------------------------------------------------+---------+--------------------------------+------+------------+-----------+

I manually started it, but found that no upstart services were started. Remembering this thread, https://lists.ubuntu.com/archives/juju/2016-February/006698.html, I checked /var/log/mountall.log in the machine-0 container, and sure enough:

root@juju-5f4bd172-ad22-4726-8d84-47185ab31b54-machine-0:~# cat /var/log/upstart/mountall.log
mount: permission denied
mountall: mount /sys/kernel/debug [187] terminated with status 32
mountall: Filesystem could not be mounted: /sys/kernel/debug

The problem persists if I stop and start the container. If I remount /sys/kernel/debug on the host, then stop and start the container, upstart succeeds and the juju controller starts up.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-8-generic 4.4.0-8.23
ProcVersionSignature: Ubuntu 4.4.0-8.23-generic 4.4.2
Uname: Linux 4.4.0-8-generic x86_64
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl
ApportVersion: 2.20-0ubuntu3
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: c 24562 F.... pulseaudio
CurrentDesktop: MATE
Date: Tue Mar 1 10:52:45 2016
EcryptfsInUse: Yes
HibernationDevice: RESUME=UUID=007cedda-f922-4e4c-89b1-57b31f18292e
InstallationDate: Installed on 2016-02-28 (2 days ago)
InstallationMedia: Ubuntu-MATE 16.04 LTS "Xenial Xerus" - Alpha amd64 (20160224)
MachineType: LENOVO 2306CTO
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-8-generic.efi.signed root=/dev/mapper/ubuntu--mate--vg-root ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-8-generic N/A
 linux-backports-modules-4.4.0-8-generic N/A
 linux-firmware 1.156
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/22/2014
dmi.bios.vendor: LENOVO
dmi.bios.version: G2ETA1WW (2.61 )
dmi.board.asset.tag: Not Available
dmi.board.name: 2306CTO
dmi.board.vendor: LENOVO
dmi.board.version: Not Defined
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvrG2ETA1WW(2.61):bd04/22/2014:svnLENOVO:pn2306CTO:pvrThinkPadX230:rvnLENOVO:rn2306CTO:rvrNotDefined:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 2306CTO
dmi.product.version: ThinkPad X230
dmi.sys.vendor: LENOVO

Revision history for this message
Casey Marshall (cmars) wrote :
Revision history for this message
Casey Marshall (cmars) wrote :

I also confirmed that the mountall error message was duplicated every time I restarted the machine-0 container -- until remounting on the host.

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I'm on the same kernel

Linux sl 4.4.0-8-generic #23-Ubuntu SMP Wed Feb 24 20:45:30 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

and also have the tracefs mounted

0 ✓ serge@sl ~ $ grep debug /proc/self/mountinfo
74 19 0:7 / /sys/kernel/debug rw,relatime shared:26 - debugfs debugfs rw
44 74 0:9 / /sys/kernel/debug/tracing rw,relatime shared:29 - tracefs tracefs rw

but trusty (upstart-based) containers start fine for me, using lxc version 2.0.0~rc4+master~20160229-0647-0ubuntu1~xenial and lxd from git HEAD.

Very odd therefore that unmounting and re-mounting debugfs works for you...

Will try in a fresh vm.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

marking confirmed because two people have reported it, but I cannot reproduce it yet.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Also cannot reproduce in a clean VM, so I have to assume juju is tweaking something.

Can you show output of 'lxc config show <container>' where <container> is the container which fails?

Revision history for this message
Casey Marshall (cmars) wrote :

FWIW I've observed the bug outside of Juju. Launching a trusty container, sshd did not start until I remounted debug on the host. The main reason it's been observed with juju is, Juju tries to SSH into the instance right after cloud-init, but upstart in the container isn't starting sshd so bootstrap hangs.

Revision history for this message
Casey Marshall (cmars) wrote :

This is the config from the container that had the issue this morning:

c@mawhrin-skel:~/omnibus-layers$ lxc config show juju-145a3177-d1c0-4974-89f6-feaebb3ca87d-machine-0
name: juju-145a3177-d1c0-4974-89f6-feaebb3ca87d-machine-0
profiles:
- default
- juju-lxd
config:
  user.juju-model-uuid: "true"
  user.user-data: |
    #cloud-config
    output:
      all: '| tee -a /var/log/cloud-init-output.log'
    runcmd:
    - set -xe
    - install -D -m 644 /dev/null '/etc/init/juju-clean-shutdown.conf'
    - |-
      printf '%s\n' '
      author "Juju Team <email address hidden>"
      description "Stop all network interfaces on shutdown"
      start on runlevel [016]
      task
      console output

      exec /sbin/ifdown -a -v --force
      ' > '/etc/init/juju-clean-shutdown.conf'
    - install -D -m 644 /dev/null '/var/lib/juju/nonce.txt'
    - printf '%s\n' 'user-admin:bootstrap' > '/var/lib/juju/nonce.txt'
    users:
    - groups:
      - adm
      - audio
      - cdrom
      - dialout
      - dip
      - floppy
      - netdev
      - plugdev
      - sudo
      - video
      lock_passwd: true
      name: ubuntu
      shell: /bin/bash
      ssh-authorized-keys:
      - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDNt6t7py1b0vwYVobsx490piX1LrjtCJrcmOH49EKOtTzxxiv1aTRqVOD38pKR8WPWUc6ZTjYtGetqbwhvma8FLWeTjIaPyw8QzKAS963/KNzZRqE+iALtcdA9sJgrp5hxxl00zZ7cD7b2OD5SOzSjyRHJkBxGDnkzE07g+/qXekkPzVHKvAMbaBU+OwnuW3KSy20/y2D/qlWkLfF7FWfeEvb6P8KwIFZagv/yt+QeLONq4FLwowdBIwMDHBKFA3H+dKzld5bs3hGvLNhlFYUdeKs/F+swkYwwi5ycWj7N7clu0wvP9ZZhXlUJ2Fog39GrXznnekPqr4pAwL8m3vr9
        Juju:juju-client-key
      sudo:
      - ALL=(ALL) NOPASSWD:ALL
  volatile.base_image: 510c27eb5e30ac53c6cf8b423d4e145bd2e40b8845e89bd66a5d78e2a087727a
  volatile.eth0.hwaddr: 00:16:3e:9a:00:f9
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":165536,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":165536,"Nsid":0,"Maprange":65536}]'
  volatile.lo.hwaddr: 00:16:3e:3d:f5:18
devices:
  root:
    path: /
    type: disk
ephemeral: false

Revision history for this message
Adam Stokes (adam-stokes) wrote :

Here is my config:

name: juju-078fe32d-4080-4f11-83e2-e579ead11df8-machine-0
profiles:
- default
- juju-myish
config:
  user.juju-model-uuid: "true"
  user.user-data: |
    #cloud-config
    output:
      all: '| tee -a /var/log/cloud-init-output.log'
    runcmd:
    - set -xe
    - install -D -m 644 /dev/null '/etc/init/juju-clean-shutdown.conf'
    - |-
      printf '%s\n' '
      author "Juju Team <email address hidden>"
      description "Stop all network interfaces on shutdown"
      start on runlevel [016]
      task
      console output

      exec /sbin/ifdown -a -v --force
      ' > '/etc/init/juju-clean-shutdown.conf'
    - install -D -m 644 /dev/null '/var/lib/juju/nonce.txt'
    - printf '%s\n' 'user-admin:bootstrap' > '/var/lib/juju/nonce.txt'
    users:
    - groups:
      - adm
      - audio
      - cdrom
      - dialout
      - dip
      - floppy
      - netdev
      - plugdev
      - sudo
      - video
      lock_passwd: true
      name: ubuntu
      shell: /bin/bash
      ssh-authorized-keys:
      - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDr6xIZdawDhRLDTARbf1TO1FAIcEBLbqh50B82zosRs2T0WsQX00c6NvtBLkpkvuAwqFBZA4/zVr4xY52cDbJ+cB49HW9Z+LgLPa/VQV/Z4XpSHXJxILAeEFY+eSgMRneUKtpNzlW6dnKArBCa+egAGKan6TGTaAjZonNJsd+7LOvoPDAmmSR5AYsXrUZfzEdo5rfwKquZdZRnxZjR41nhezr14deWUjCPAgCH22Is+GNDOHadCUi0nqbcZDBWUC69BptmvdL02HQJgrz3HuPnseTWEqdFYmfhmIwnXO/43/oIvR26dOUq2Y+S5KDH2rOsdp2B6UQbZOiT279GX8gx
        Juju:juju-client-key
      sudo:
      - ALL=(ALL) NOPASSWD:ALL
  volatile.base_image: 510c27eb5e30ac53c6cf8b423d4e145bd2e40b8845e89bd66a5d78e2a087727a
  volatile.eth0.hwaddr: 00:16:3e:f6:c5:98
  volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":100000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":100000,"Nsid":0,"Maprange":65536}]'
  volatile.lo.hwaddr: 00:16:3e:b6:76:cb
devices:
  root:
    path: /
    type: disk
ephemeral: false

I am also not use zfs

Revision history for this message
Adam Stokes (adam-stokes) wrote :

Also wrt to Juju, if I do the following:

umount /sys/kernel/debug
mount -t debugfs none /sys/kernel/debug

And then reissue a juju bootstrap it will complete successfully :\ where as before I was running into this error: http://paste.ubuntu.com/15267564/

Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
Seth Forshee (sforshee) wrote :

Serge: Why do we need to mount debugfs in containers? Even in the host we restrict access to root.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@sforshee,

Because in the past mountall would fail if we didn't.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@sforshee - are you saying that removing the debugfs line from /usr/share/lxc/config/ubuntu-common.conf fixes this for you?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Note - I am not actively looking at this bug as I've not managed to reproduce it. Hopefully the kernel team has it under control, please shout if I'm needed.

If using juju first is a prerequisite to reproducing this, I can try that, but my impression from previous reports has been that this is not supposed to be a requirement, so I think something else is triggering it which I'm missing.

Revision history for this message
Seth Forshee (sforshee) wrote : Re: [Bug 1551854] Re: LXD bootstrap issues on xenial

On Fri, Mar 04, 2016 at 05:36:28PM -0000, Serge Hallyn wrote:
> @sforshee - are you saying that removing the debugfs line from
> /usr/share/lxc/config/ubuntu-common.conf fixes this for you?

I haven't reproduced it. Just wondering as it should be impossible to
actually use debugfs from within the container.

Revision history for this message
Casey Marshall (cmars) wrote :

Interesting. I removed the /sys/kernel/debug mount and containers seem to start up just fine:

c@mawhrin-skel:~$ grep kernel/debug /usr/share/lxc/config/ubuntu.common.conf
# lxc.mount.entry = /sys/kernel/debug sys/kernel/debug none bind,optional 0 0
c@mawhrin-skel:~$ lxc launch ubuntu-trusty t2
Creating t2
Starting t2
c@mawhrin-skel:~$ lxc exec t2 -- ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 4 19:02 ? 00:00:00 /sbin/init
root 453 1 0 19:02 ? 00:00:00 upstart-socket-bridge --daemon
root 1507 1 0 19:02 ? 00:00:00 upstart-udev-bridge --daemon
root 1513 1 0 19:02 ? 00:00:00 /lib/systemd/systemd-udevd --dae
root 1583 1 0 19:02 ? 00:00:00 dhclient -1 -v -pf /run/dhclient
root 1606 1 0 19:02 ? 00:00:00 /bin/sh /etc/network/if-up.d/ntp
root 1610 1606 0 19:02 ? 00:00:00 lockfile-touch /var/lock/ntpdate
root 1619 1606 0 19:02 ? 00:00:00 /usr/sbin/ntpdate -s ntp.ubuntu.
root 1772 1 0 19:02 ? 00:00:00 /bin/sh /etc/network/if-up.d/ntp
message+ 1773 1 0 19:02 ? 00:00:00 dbus-daemon --system --fork
root 1775 1772 0 19:02 ? 00:00:00 lockfile-create /var/lock/ntpdat
root 1812 1 0 19:02 ? 00:00:00 /lib/systemd/systemd-logind
root 1864 1 0 19:02 ? 00:00:00 cron
daemon 1866 1 0 19:02 ? 00:00:00 atd
root 1867 1 0 19:02 ? 00:00:00 acpid -c /etc/acpi/events -s /va
root 1870 1 0 19:02 ? 00:00:00 /usr/sbin/irqbalance
root 1886 1 0 19:02 ? 00:00:00 /usr/sbin/sshd -D
root 1910 1 0 19:02 ? 00:00:00 upstart-file-bridge --daemon
root 1925 1 0 19:02 ? 00:00:00 /bin/sh /etc/init.d/ondemand bac
root 1937 1925 0 19:02 ? 00:00:00 sleep 60
syslog 1943 1 2 19:02 ? 00:00:00 rsyslogd
root 2023 1 0 19:02 ? 00:00:00 /usr/bin/python /usr/bin/cloud-i
root 2025 0 0 19:02 ? 00:00:00 ps -ef

Revision history for this message
Seth Forshee (sforshee) wrote :

I'm getting something kind of similar without juju. If I remount debugfs ro in the host then start the container I get this in /var/log/upstart/mountall.log:

mount: cannot remount block device debugfs read-write, is write-protected
mountall: mount /sys/kernel/debug [143] terminated with status 32
mountall: Event failed

and services don't start in the container. If I completely unmount debugfs in the host though everything is happy, though debugfs is not mounted in the container.

Casey/Adam: Can one of you confirm that debugfs is not mounted in the host when you get the failures? If it is mounted can you paste the output of 'mount | grep debugfs' in the host?

@hallyn: I didn't find that line you were referring to in /usr/share/lxc/config/ubuntu.common.conf, in fact I didn't find any reference to debugfs in any of the template files. And debugfs is not a ns-mountable filesystem, so I guess it must be a bind mount? So getting EACCES makes sense if the container tries to mount debugfs, I'm just not sure why their containers are trying to mount debugfs if not mounted in the host and mine does not, which is what I assume must be going on.

Maybe it has something to do with that juju-lxd profile. Can someone paste in its contents (lxc profile show juju-lxd) or point me to where I can find it?

At this point I don't really think this is a kernel bug. debugfs is _not_ namespace mountable, nor should it be.

Revision history for this message
Seth Forshee (sforshee) wrote :

@Casey: I must have been typing my comment when you posted yours. So you've answered one of my questions, but I have no idea what's leading to the EACCES error. Can you provide the output of 'mount | grep debugfs' in the host when you're seeing the failure?

Revision history for this message
Casey Marshall (cmars) wrote :

@stforshee I'll uncomment the debugfs mount in my /usr/share/lxc/config/ubuntu.common.conf (putting it back the way it was), reboot, and see if I can reproduce it again.

My juju-lxd profile shows:

name: juju-lxd
config:
  boot.autostart: "true"
  security.nesting: "true"
description: ""
devices: {}

Revision history for this message
Seth Forshee (sforshee) wrote :

On Fri, Mar 04, 2016 at 09:17:01PM -0000, Casey Marshall wrote:
> @stforshee I'll uncomment the debugfs mount in my
> /usr/share/lxc/config/ubuntu.common.conf (putting it back the way it
> was), reboot, and see if I can reproduce it again.

Oh there it is, I was grepping for debugfs and not debug, d'oh.

Revision history for this message
Seth Forshee (sforshee) wrote :

Casey: Any luck reproducing? I'd still like to see what 'mount | grep debugfs' in the host shows when this is happening.

Revision history for this message
Stéphane Graber (stgraber) wrote :

I'd like to note that LXD works differently from LXC here.

In LXC we mount debugfs through ubuntu.common.conf whereas with lxd, we simply bind-mount /sys/kernel/debug from the host if it exists.

LXD doesn't use any of the /usr/share/lxc/* files. If it does on your system, then you most definitely aren't running LXD 2.0.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Please can everyone affected by this issue post the output of: dpkg -l lxc liblxc1 lxd lxd-client lxcfs

It's very difficult to figure out what's wrong when we don't even know the version being used.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Stéphane Graber (stgraber) wrote :

Oh and "lxc info" too for good measure (just in case lxd wasn't restarted post-upgrade).

Revision history for this message
Casey Marshall (cmars) wrote :
Download full text (3.9 KiB)

Haven't seen it in a few days. I'll reboot and see if I can reproduce it. It usually happens after rebooting the host, when launching new containers or existing ones would autostart.

Info you requested. I think the /usr/share/lxc/... might have been a red herring. I'm exclusively using LXD on this xenial host, not messing with the old lxc commands.

$ dpkg -l lxc liblxc1 lxd lxd-client lxcfs
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-====================-====================-====================================================================
ii liblxc1 2.0.0~rc5-0ubuntu1 amd64 Linux Containers userspace tools (library)
ii lxc 2.0.0~rc5-0ubuntu1 all Transitional package for lxc1
ii lxcfs 2.0.0~rc2-0ubuntu2 amd64 FUSE based filesystem for LXC
ii lxd 2.0.0~rc1-0ubuntu3 amd64 Container hypervisor based on LXC - daemon
ii lxd-client 2.0.0~rc1-0ubuntu3 amd64 Container hypervisor based on LXC - client

$ lxc info
apicompat: 0
auth: trusted
environment:
  addresses:
  - 192.168.88.234:8443
  - 192.168.122.1:8443
  - 10.0.3.1:8443
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIIFqjCCA5KgAwIBAgIQXpne6Qjwhg8de+RmyV1+mTANBgkqhkiG9w0BAQsFADA6
    MRwwGgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMRowGAYDVQQDDBFyb290QG1h
    d2hyaW4tc2tlbDAeFw0xNjAyMjgwMzEyMzdaFw0yNjAyMjUwMzEyMzdaMDoxHDAa
    BgNVBAoTE2xpbnV4Y29udGFpbmVycy5vcmcxGjAYBgNVBAMMEXJvb3RAbWF3aHJp
    bi1za2VsMIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAzv/3uX3JWduq
    wmtbyTABmfJkup6Z5Lh4lKPXgL/H2gQ/mlccORKm1eDZhAGmv9UuQGeMRHEneJqD
    V9c3f7/9cJBwvz2loKlppWj0ohAzTP91L8paeUSfP4X9EAr702Qjyb2ig+xWv5tM
    cJxbdl0zGpjYO1P+xmUdthSidsFrzWQPXptOlcZvak7n0QL5GlkVXdqX+she2Pbs
    ONtyTaBSpF3zEYv6cM9ZeJYL4Hl7LEQ1/p8ojpOyaxO8B1Cn/gIbuDqgzRwmei90
    Aca06YDF4SHVcl8qFajrwkPF3jWW5pgS8sAJlYoq2+ROhl0CnpdBl4AiJrvfFsIs
    RL8dKSuFA6AcLhYooWgMy6UWR8mLbmYHp04ThuBDoRaTt0uGLDlTAfMg7e8Gwpz+
    aEwxSzzQhvJYr4e6TSP1C4zXNqS5mUHfm6RtfEccVFmq6vyqZGELAyREce76J88V
    FMf/V+KYlQYUxo0JH2k+BYMO4Iigar0+8p8o8drqh6Lks5zTP6idsKa0LSWA4rwm
    3+5hjnVJtFa9CVCItJW3r0+nczwtAbjvRojkr7Yb2Kifufhilb+I695qSk+Toug3
    HKOODTqc9sbH+urmfr7jBBexACtWVX/tMptWFnBmcqoN4ptSZutY7CPf+KR//9p0
    8fFRQP+ItmElJHRFO+madGOBMVrC7j8CAwEAAaOBqzCBqDAOBgNVHQ8BAf8EBAMC
    BaAwEwYDVR0lBAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADBzBgNVHREEbDBq
    ggxtYXdocmluLXNrZWyCETE5Mi4xNjguODguMjM0LzI0ghxmZTgwOjozZWE5OmY0
    ZmY6ZmU1Njo0N2RjLzY0ggsxMC4wLjMuMS8yNIIcZmU4MDo6NThkNTpjMmZmOmZl
    ZTQ6NzJiYy82NDANBgkqhkiG9w0BAQsFAAOCAgEAB6aFItuxlZm5+2/IB2eCVAM0
    eQbO6dfvfF2khfiEbWWaKPtkZSYKlDIcoOph35obnNMQjT+y4zlnF/fepvjq8P1R
    yGd+Q+GMcXWVRht3uIMW2ZqwNqOujunyn9+Hl1SYi1dV1g/CH9lJt8I7FKIvyieh
    siZRe5Zp7TdPREkIJveuz8qB3X87WVh9bqvMpoX91Mgrjzd3qATef/tN0HP+b26Y
    X...

Read more...

Revision history for this message
Stéphane Graber (stgraber) wrote :

Ok, so investigation shows that:

 - LXD bind-mounts all that stuff, it doesn't have a choice as it's not privileged enough to mount things itself
 - mountall fails to run if its "optional" filesystems fail to mount (because that makes a lot of sense...)
 - systemd sets up the host filesystems, on a clean boot they all seem fine
 - "something" apparently remounts debugfs ro sometimes, this breaks containers
 - "something" apparently makes the /proc/sys/fs/binfmt_misc autofs go nuts (loop of symlinks) which also breaks containers

We could try to teach mountall to do the right thing with optional mount and ignore their failures, however we'd need to SRU that to trusty and precise and then nag other distros in doing the same (centos, oracle, rhel, ...) before we can get rid of our workaround.

As a clean Xenial system does work properly, I think it would be best to figure out what's messing with debugfs and binfmt_misc post-boot and fix whatever it is to stop doing that.

Would be useful if the bug reporters could document exactly what they did on their system between the time it worked fine and the time it stopped working so we can figure out what's messing with those mounts.

affects: linux (Ubuntu) → lxd (Ubuntu)
Changed in lxd (Ubuntu):
assignee: nobody → Stéphane Graber (stgraber)
status: Incomplete → Fix Committed
Revision history for this message
Stéphane Graber (stgraber) wrote :

So the cause of all this was /sys/kernel/debug/tracing which is a weird auto-mounted kernel path. That is, the sole action of listing that directory will cause it to get mounted for you by the kernel.

That means that any number of thing could accidentally cause it to mount.

Once it's mounted, the kernel considers /sys/kernel/debug to have a directory that's hidden through overmounting and so will not allow unprivileged users to bind-mount the underlying directory, which means /sys/kernel/debug isn't mounted in the container and causes mountall to fail.

There are quite a few ways to fix this.
The best would be to not have the kernel do this weird auto-mount thing, sadly fixing that would be a userspace regression so as weird and inconsistent (trying to remain polite) as the current design is, reverting it is unlieky.

As mentioned before, we could also fix mountall not to be so picky and not die when mounts that it knows as "optional" fail to mount. Unfortunately there are a lot of images out there using mountall, so we can't really rely on being able to push a fix to all of them.

A third option and the one we'll be using for now is to have LXD recursively bind-mount paths, therefore not exposing the container to any more information than would be visible on the host and so avoiding the kernel security feature entirely.

The fix in LXD is a one character change (bind to rbind) and I've sent a pull request upstream to do just that.

I'd just like to stress that I think the kernel behavior here is absolutely ridiculous, we have a security feature which triggers when it shouldn't (the path doesn't exist so can't be "hidden") combined with a crazy feature that's been added to be "user friendly" and causes automatic mounting of a filesystem by simply accessing a path inside another filesystem. The combination of both results in this bug... But the fact is, it's way easier and faster for us to workaround this in LXD than to try and fix the source of the problem...

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lxd - 2.0.0~rc2-0ubuntu2

---------------
lxd (2.0.0~rc2-0ubuntu2) xenial; urgency=medium

  * Cherry-pick upstream bugfix:
    - Workaround kernel overmounting protection (LP: #1551854)

 -- Stéphane Graber <email address hidden> Mon, 07 Mar 2016 22:18:32 -0500

Changed in lxd (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Casey Marshall (cmars) wrote :

Thanks for the fix. I didn't notice anything in particular that seemed to cause the issue. I do remember having the issue right after boot & desktop environment -- in some cases the first thing I did was start containers in a gnome terminal. Other things I did at various points last week:

- Usual developer userspace stuff (tmux, vim, git, golang, juju).
- Usual manager stuff (Google hangouts and drive) in firefox.
- Messing with pcscd (build from source, etc) because of a bug where it consumes 100% CPU if I plug or unplug any USB device.
- Messing with hdparm and smartctl because the power management burns through load cycle count on my x230 otherwise.
- Built some debian packages for a release, uploaded to a launchpad PPA.
- Daily package updates (update, dist-upgrade, autoremove).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.