8+ containers makes one get stuck in "pending" on joyent

Bug #1626725 reported by Aaron Bentley
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
James Tunnicliffe

Bug Description

When we add eight containers to a Joyent machine, one gets stuck in pending. Eventually, the test script raises AgentsNotStarted.

We are seeing this in our long-running industrial/reliability tests.

e.g. http://juju-ci.vapour.ws/job/industrial-test-joyent/184/consoleText

It happens almost every time, but not every time. It is usually the last container (e.g. 3/lxd/7), but not always. Sometimes it's the seventh or even the first.

It does not happen on AWS, even though AWS machines are no better (and in some regards worse) than Joyent machines in terms of their cpu/memory/storage.

I reproduced this using our juju-ci-tools industrial_test script.
./industrial_test.py parallel-joyent `jver 2.0-rc1-4405` density ~/sandbox/logz/ --single --attempts 1 --json-file results.json --new-agent-url https://us-east.manta.joyent.com/cpcjoyentsupport/public/juju-dist/parallel-testing/agents --agent-stream revision-build-4405

An example run is attached.

Revision history for this message
Aaron Bentley (abentley) wrote :
tags: added: jujuqa
description: updated
Changed in juju:
milestone: 2.0-rc2 → 2.0.0
assignee: nobody → Richard Harding (rharding)
Revision history for this message
James Tunnicliffe (dooferlad) wrote :

This job has gone, but the problem remains. Just did a bunch of add-machines and eventually machine 0 (the host) just stopped responding. Can ping it, but not SSH to it and the agent state shows as down.

On MAAS I added 50 LXDs and got bored of waiting for something bad to happen.

Nothing was crying out to me from the logs, but that isn't much of a surprise at this stage in the investigation.

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

I ran a script that added a container every time juju status showed everything as started. I got up to waiting for 0/lxd/13.

There is only 400MB of space on /, but I can still write a text file.

In 0/lxc/13 cloud-init ran, but didn't find any configuration files. This may be because of a data race (juju hasn't written them yet) or something else.

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

The API call to LXD does show the sending of the right systemd init job.

Just restarted this experiment and it failed again on 0/lxd/13 (14th container).

tune2fs -l /dev/vda1 shows 0 reserved blocks and >0 free blocks, we we aren't out of disk space.

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

...but we may have been out of disk space sometimes. Just 'juju remove machine 0/lxd/n' with n 1..5 then did a juju add-machine lxd:0 and ran out of disk space. Maybe joys of sparse file systems? Something grew but didn't shrink?

Given this has happened twice on the 14th LXC when doing the slow and deliberate route and the machine is basically full I am going to leave the slow/reliable path and switch back to the fast path to try and identify what is happening there.

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

OK, got it. From what I can tell the disk fills up and this causes the LXD ZFS pool to be taken offline due to I/O errors. At that point it looks like things shrink back to a size where you can log in, but you can't do anything with the containers because their disks are offline.

I tried starting 5 machines with 11 guests on each, which leaves enough disk space for all the containers to start, and they all started fine. So, seems like this isn't a bug, but not a helpful user experience.

From dmesg:

[ 1090.253680] WARNING: Pool 'lxd' has encountered an uncorrectable I/O failure and has been suspended.

http://paste.ubuntu.com/23274269/

root@c0b55d45-188c-45dc-8efd-17c6766a5425:~# zpool status -x
  pool: lxd
 state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://zfsonlinux.org/msg/ZFS-8000-HC
  scan: none requested
config:

 NAME STATE READ WRITE CKSUM
 lxd ONLINE 0 640 0
   /var/lib/lxd/zfs.img ONLINE 0 1.30K 0

errors: 640 data errors, use '-v' for a list

This particular issue is documented here:
http://zfsonlinux.org/msg/ZFS-8000-HC/

Data errors took the pool offline.

Changed in juju:
status: Triaged → Won't Fix
status: Won't Fix → Invalid
assignee: Richard Harding (rharding) → James Tunnicliffe (dooferlad)
Revision history for this message
Aaron Bentley (abentley) wrote :

James, thank you for your investigation.

Maybe it's not possible to prevent this ZFS error from occurring, but I think it's still incorrect to list the status as "pending" if, in fact, it will never start. I think the status should indicate that user intervention is required, especially if (as it appears) other containers on this machine are affected by this issue.

Changed in juju:
status: Invalid → Triaged
Revision history for this message
James Tunnicliffe (dooferlad) wrote :

My reproducer was:

Bootstrap on <cloud>
Add <machines>
For each machine, add <containers>

This is encoded in:
https://github.com/dooferlad/jujuWand/blob/abb8a297bd7837298c8c3cbbb536b757cd3931ab/add-lxd.py

This passed:
./add-lxd.py --controller joyent --guests=11 --hosts 5

Currently testing with --deploy (i.e. deploy the Ubuntu charm to a container n times rather than starting n containers).

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

When deploying the Ubuntu charm instead of just adding containers I ran out of disk space after 8 containers. Tested with this:

./add-lxd.py --controller joyent --guests=11 --deploy

Revision history for this message
Aaron Bentley (abentley) wrote :

Here's another thing that I don't get: the Joyent machines hit this, and they have a root-disk of "51200M". The AWS machines didn't hit this, and they have a smaller root disk of "8192M".

AWS: arch=amd64 cores=1 cpu-power=300 mem=3840M root-disk=8192M
availability-zone=eu-west-1a
Joyent: arch=amd64 cores=1 mem=3840M root-disk=51200M

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

Ah, interesting. I only got 7.4G, which must be the default machine that Juju gets.

ubuntu@55830fc2-2f00-4adf-a770-78f8aa2302fb:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 868M 0 868M 0% /dev
tmpfs 175M 5.1M 170M 3% /run
/dev/vda1 7.4G 2.3G 5.2G 31% /
tmpfs 875M 0 875M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 875M 0 875M 0% /sys/fs/cgroup
/dev/vdb 50G 52M 47G 1% /mnt
tmpfs 175M 0 175M 0% /run/user/1000
lxd/containers/juju-1593c9-0-lxd-0 97G 837M 96G 1% /var/lib/lxd/containers/juju-1593c9-0-lxd-0.zfs

With 97G per LXD plus some overhead for everything else it is entirely obvious where that space is going. I am surprised that LX[CD] doesn't fail early when there isn't enough space to create a container. Filing a bug now.

Revision history for this message
Richard Harding (rharding) wrote :

So the space isn't all used right away. It does some deduping and I believe only uses up the space as it's required. I think that what's going on is that as the filesystem does its work, notices duplicate files/etc that the space allocation ends up flexing back/forth over time.

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

This is the root cause of at least one problem: https://bugs.launchpad.net/juju/+bug/1630571

If non-space related issues exist once that is fixed up then we can revisit. See also: https://github.com/lxc/lxd/issues/2458

Changed in juju:
milestone: 2.0.0 → 2.1.0
Revision history for this message
James Tunnicliffe (dooferlad) wrote :
Download full text (4.1 KiB)

Oh, this is great:

2016-10-06 12:18:26 INFO juju.utils.packaging.manager utils.go:57 Running: apt-get --option=Dpkg::Options::=--force-confold --option=Dpkg::options::=--force-unsafe-io --assume-yes --quiet install --no-install-recommends zfsutils-linux
2016-10-06 12:18:37 INFO juju.utils.packaging.manager utils.go:98 Retrying: &{/usr/bin/apt-get [apt-get --option=Dpkg::Options::=--force-confold --option=Dpkg::options::=--force-unsafe-io --assume-yes --quiet install --no-install-recommends zfsutils-linux] [] <nil> Reading package lists...
  libuutil1linux libzfs2linux libzpool2linux python python-minimal python2.7
  python2.7-minimal zfs-doc
  nfs-kernel-server zfs-initramfs
  zfs-zed
  libuutil1linux libzfs2linux libzpool2linux python python-minimal python2.7
  python2.7-minimal zfs-doc zfsutils-linux
Err:8 http://eu-ams-1.joyent.clouds.archive.ubuntu.com/ubuntu xenial-updates/main amd64 zfs-doc all 0.6.5.6-0ubuntu12
Err:12 http://eu-ams-1.joyent.clouds.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libzfs2linux amd64 0.6.5.6-0ubuntu12
Err:13 http://eu-ams-1.joyent.clouds.archive.ubuntu.com/ubuntu xenial-updates/main amd64 zfsutils-linux amd64 0.6.5.6-0ubuntu12
E: Failed to fetch http://eu-ams-1.joyent.clouds.archive.ubuntu.com/ubuntu/pool/main/z/zfs-linux/zfs-doc_0.6.5.6-0ubuntu12_all.deb 404 Not Found
E: Failed to fetch http://eu-ams-1.joyent.clouds.archive.ubuntu.com/ubuntu/pool/main/z/zfs-linux/libuutil1linux_0.6.5.6-0ubuntu12_amd64.deb 404 Not Found
E: Failed to fetch http://eu-ams-1.joyent.clouds.archive.ubuntu.com/ubuntu/pool/main/z/zfs-linux/libnvpair1linux_0.6.5.6-0ubuntu12_amd64.deb 404 Not Found
E: Failed to fetch http://eu-ams-1.joyent.clouds.archive.ubuntu.com/ubuntu/pool/main/z/zfs-linux/libzpool2linux_0.6.5.6-0ubuntu12_amd64.deb 404 Not Found
E: Failed to fetch http://eu-ams-1.joyent.clouds.archive.ubuntu.com/ubuntu/pool/main/z/zfs-linux/libzfs2linux_0.6.5.6-0ubuntu12_amd64.deb 404 Not Found
E: Failed to fetch http://eu-ams-1.joyent.clouds.archive.ubuntu.com/ubuntu/pool/main/z/zfs-linux/zfsutils-linux_0.6.5.6-0ubuntu12_amd64.deb 404 Not Found
  libuutil1linux libzfs2linux libzpool2linux python python-minimal python2.7
  python2.7-minimal zfs-doc
  nfs-kernel-server zfs-initramfs
  zfs-zed
  libuutil1linux libzfs2linux libzpool2linux python python-minimal python2.7
  python2.7-minimal zfs-doc zfsutils-linux
Err:8 http://eu-ams-1.joyent.clouds.archive.ubuntu.com/ubuntu xenial-updates/main amd64 zfs-doc all 0.6.5.6-0ubuntu12
Err:12 http://eu-ams-1.joyent.clouds.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libzfs2linux amd64 0.6.5.6-0ubuntu12
Err:13 http://eu-ams-1.joyent.clouds.archive.ubuntu.com/ubuntu xenial-updates/main amd64 zfsutils-linux amd64 0.6.5.6-0ubuntu12
E: Failed to fetch http://eu-ams-1.joyent.clouds.archive.ubuntu.com/ubuntu/pool/main/z/zfs-linux/zfs-doc_0.6.5.6-0ubuntu12_all.deb 404 Not Found
E: Failed to fetch http://eu-ams-1.joyent.clouds.archive.ubuntu.com/ubuntu/pool/main/z/zfs-linux/libuutil1linux_0.6.5.6-0ubuntu12_amd64.deb 404 Not Found
E: Failed to fetch http://eu-ams-1.joyent.clouds.archive.ubuntu.com/ubuntu/pool/main/z/zfs-linux/libnvpair1linux_0.6.5.6-0ubuntu12_amd64.deb ...

Read more...

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

Arg, and we don't perform an apt-get update before retrying, which would have fixed the problem!

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

Right. I have a patch that will create a zpool that uses 90% of the free space on the host's root file system. This sounds drastic, but since it is using a sparse file it won't use the space on the host FS until it is used in the container. This is an imperfect fix because the host file system could fill up to the point where the sparse file can't expand, but it is less broken than before.

Juju needs to grow monitoring and active management abilities to really fix this so the zpool that LXD is using can be allowed to grow to mostly fill the host disk, while the host doesn't use up more space than has been promised to the zpool. The right way to do this is to not use sparse files for the zpool and grow it 1GB (or whatever increment seems reasonable) at a time as needed.

At the same time I found that if you don't do os upgrades as part of bootstrap, apt-get update never gets run, which results in the above stale package problem. I have updated the apt wrapper code to perform an update when an install fails before retrying; this gets us out of the need for manual intervention to fix out or date package list issues.

Changed in juju:
status: Triaged → In Progress
Revision history for this message
James Tunnicliffe (dooferlad) wrote :

I just asked joyent for a machine:

juju add-machine --constraints "mem=16G cores=4 root-disk=200G"

This gave me a machine with 200G on /dev/sdb, not a larger root disk, so LXD still ran out of space. I don't know if we have a way of configuring the LXD storage location... investigating.

Revision history for this message
James Tunnicliffe (dooferlad) wrote :

No, can't do anything about this one without changing Juju - we perform:

lxd init --auto --stoarge-backend xfs --storage-pool lxd --storage-create-loop <size>

...without any option to do anything else. https://github.com/lxc/lxd covers other options. Since it varies by provider how and where storage is mounted this is work that needs thinking through and scheduling.

Revision history for this message
John A Meinel (jameinel) wrote :

Joyent treating "root-disk" constraint as just-another-disk seems a bit off.

Changed in juju:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.1.0 → 2.1-beta1
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.