rootfs issues - 2.8 LXD release

Bug #1675760 reported by Lorenzo Cavassa
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lxd (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

System:

Distributor ID: Ubuntu
Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenial

Kernel release: 4.8.0-36

LXD release 2.8

I got a rootfs error starting few LXD containers:

https://pastebin.canonical.com/183648/

This is the 'lxc list' output:

https://pastebin.canonical.com/183650/

That container, named 'idm' uses an 'ovs' profile:

https://pastebin.canonical.com/183649/

Here attached there is the /var/log/lxd/lxd.log file

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: lxd 2.8-0ubuntu1~ubuntu16.04.1
ProcVersionSignature: Ubuntu 4.8.0-36.36~16.04.1-generic 4.8.11
Uname: Linux 4.8.0-36-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.5
Architecture: amd64
Date: Fri Mar 24 12:22:59 2017
InstallationDate: Installed on 2017-03-23 (0 days ago)
InstallationMedia: Ubuntu-Server 16.04.2 LTS "Xenial Xerus" - Release amd64 (20170215.8)
SourcePackage: lxd
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Lorenzo Cavassa (lorenzo-cavassa) wrote :
Revision history for this message
Lorenzo Cavassa (lorenzo-cavassa) wrote :

We ran new tests: some issue shows up even using LXD 2.12

If I run:

for X in `seq 1 4`; do lxc init ubuntu: test$X & done

LXD creates the containers but it can't start them.

This is the output of 'lxc info --show-log test1':

https://pastebin.canonical.com/183666/

If I run:

for X in `seq 1 4`; do lxc init ubuntu: test$X ; done

there are no issues and I can start the containers in the usual way.

Looks like LXD isn't able to handle some sort of concurrency during the containers creation.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Please post "lxc info" and "lxc config show" for the LXD 2.12 based setup.

Changed in lxd (Ubuntu):
status: New → Incomplete
Revision history for this message
Stéphane Graber (stgraber) wrote :

Also, since this is a public bug, please refrain from using private paste services.

Revision history for this message
Christian Brauner (cbrauner) wrote : Re: [Bug 1675760] Re: rootfs issues - 2.8 LXD release

Was this on a freshly created LXD instance or on an upgraded LXD instance?

Revision history for this message
Lorenzo Cavassa (lorenzo-cavassa) wrote :

'lxc info' output:

http://pastebin.ubuntu.com/24241376/

'lxc config show' output:

http://pastebin.ubuntu.com/24241379/

I ran:

for X in `seq 1 4`; do lxc init ubuntu: test$X & done

and then, after a while:

for X in `seq 1 4`; do lxc start test$X & done

This is what I got in the /var/log/lxd/lxd.log file:

http://pastebin.ubuntu.com/24241385/

Revision history for this message
Lorenzo Cavassa (lorenzo-cavassa) wrote :

A freshly created one (KVM VM). LXD 2.12

Revision history for this message
Christian Brauner (cbrauner) wrote :

Can you please run LXD in debug mode

lxd --debug --group lxd

and then append the full log for one of the containers that fails to start?

Revision history for this message
Stéphane Graber (stgraber) wrote :

lxc 20170324150557.624 ERROR lxc_conf - conf.c:lxc_mount_auto_mounts:801 - Permission denied - error mounting proc on /usr/lib/x86_64-linux-gnu/lxc/proc flags 14

That's the kernel overmounting protection kicking in for some reason.

Can you paste "cat /proc/self/mounts" on the host?

Revision history for this message
Lorenzo Cavassa (lorenzo-cavassa) wrote :
Revision history for this message
Christian Brauner (cbrauner) wrote :

@stgraber, that sounds like one of those empty directories similar to
the issues we had with the empty xen directory.

Revision history for this message
Lorenzo Cavassa (lorenzo-cavassa) wrote :

Here attached there is the log of a single container (test1), as requested by Christian.

lxc info --show-log test1

Revision history for this message
Stéphane Graber (stgraber) wrote :

Hmm, not seeing anything obviously wrong with /proc here though...

Can you try:
 - umount /proc/sys/fs/binfmt_misc

And then attempt to start a container again?

If that still fails, please post the output of:
 - ls -lh /var/lib/lxd
 - ls -lh /var/lib/lxd/containers
 - ls -lh /var/lib/lxd/storage-pools
 - ls -lh /var/lib/lxd/storage-pools/default
 - ls -lh /var/lib/lxd/storage-pools/default/containers
 - ls -lh /var/lib/lxd/storage-pools/default/containers/test1
 - ls -lh /var/lib/lxd/storage-pools/default/containers/test1/rootfs

So we can look for any actual permission issue.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Also note that both kernels you mentioned in this report so far are severely outdated.

You initially listed 4.8.0-36 when current is 4.8.0-42 and on your LXD 2.12 system, you listed 4.4.0-31 when current is 4.4.0-67. Since the error hints at a kernel mount protection issue, you really ought to make sure your systems are booted on the latest kernel update.

Revision history for this message
Lorenzo Cavassa (lorenzo-cavassa) wrote :

VM upgraded to:

Linux ubuntu 4.4.0-66-generic #87

Same behaviour.

Here attached the output of the 'ls' commands.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Did you try the unmount I mentioned earlier?

Filesystem permissions look reasonable too, unless the posix ACL is messing with things somehow.
Can you post the output of:

 - getfacl /var/lib/lxd/storage-pools/default/containers/test1

Revision history for this message
Lorenzo Cavassa (lorenzo-cavassa) wrote :

Yes, I unmounted it.

getfacl: Removing leading '/' from absolute path names
# file: var/lib/lxd/storage-pools/default/containers/test1
# owner: 100000
# group: 100000
user::rwx
user:100000:r-x
group::r-x
mask::r-x
other::r-x

Revision history for this message
Lorenzo Cavassa (lorenzo-cavassa) wrote :

Running the containers 'init' operation in the concurrent way and looking with a 'top', you can see the related 'unsquashfs' operations running:

for X in `seq 1 4`; do lxc init ubuntu: test$X & done

top

Waiting until the unsquashfs terminates and then starting the containers, all is good and the operation runs fine:

for X in `seq 1 4`; do lxc start test$X & done

The 'init' process takes some time to explode the container image on the fs.
Looks like there are not any bugs but all this behaviour is due to the overall speed (disk+fs) to unsquash the containers images.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Ok, marking invalid then, since the problem boils down to the directory backend not being able to do instant copies of the image and so there being a time where the container is still unpacking its filesystem.

To reflect our IRC discussion, the directory backend is really meant as a fallback option which will work everywhere but comes with very significant compromises when it comes to performance and resource consumption, whenever possible, my advice is to stay away from it and stick with either zfs or btrfs.

Changed in lxd (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.