when starting many LXD containers, they start failing to boot with "Too many open files"
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxd (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Xenial |
Fix Released
|
Medium
|
Unassigned |
Bug Description
== SRU
=== Rationale
LXD containers using systemd will use a very large amount of inotify watches. This means that a system will typically run out of global watches with as little as 15 Ubuntu 16.04 containers.
An easy fix for the issue is to bump the number of user watches up to 1024, making it possible to run around 100 containers before hitting the limit again.
To do so, LXD is now shipping a sysctl.d file which bumps that particular limit on systems that have LXD installed.
=== Testcase
1) Upgrade LXD
2) Spawn about 50 Ubuntu 16.04 containers ("lxc launch ubuntu:16.04")
3) Check that they all get an IP address ("lxc list"), that's a pretty good sign that they booted properly
=== Regression potential
Not expecting anything here. Juju has shipped a similar configuration for a while now and so have the LXD feature releases.
We pretty much just forgot to include this particular change in our LTS packaging branch
== Original bug report
Reported by Uros Jovanovic here: https:/
"...
However, if you bootstrap LXD and do:
juju bootstrap localxd lxd --upload-tools
for i in {1..30}; do juju deploy ubuntu ubuntu$i; sleep 90; done
Somewhere between 10-20-th deploy fails with machine in pending state (nothin useful in logs) and none of the new deploys after that first pending succeeds. Might be a different bug, but it's easy to verify with running that for loop.
So, this particular error was not in my logs, but the controller still ends up unable to provision at least 30 machines ..."
I can reproduce this. Looking on the failed machine I can see that jujud isn't running, which is why juju considers the machine not up, and in fact nothing of juju seems to be installed. There's nothing about juju in /var/log.
Comparing cloud-init-
Cloud-init v. 0.7.7 running 'init' at Tue, 12 Jul 2016 08:32:00 +0000. Up 4.0 seconds.
...and then a whole lot of juju-installation gubbins, while the failed machine log just stops.
Changed in juju-core: | |
importance: | Undecided → Critical |
milestone: | none → 2.0-beta12 |
Changed in juju-core: | |
status: | New → Triaged |
tags: | added: lxd |
Changed in juju-core: | |
assignee: | nobody → Christian Muirhead (2-xtian) |
Changed in juju-core: | |
status: | Triaged → In Progress |
Changed in juju-core: | |
assignee: | Christian Muirhead (2-xtian) → nobody |
Changed in juju: | |
milestone: | 2.0.0 → 2.1.0 |
Changed in juju: | |
status: | Triaged → In Progress |
Changed in juju: | |
status: | Confirmed → Triaged |
Changed in juju: | |
milestone: | 2.1.0 → 2.1-rc1 |
assignee: | Richard Harding (rharding) → nobody |
Changed in juju: | |
milestone: | 2.2-beta1 → 2.2-beta2 |
Changed in juju: | |
milestone: | 2.2-beta2 → 2.2-beta3 |
Changed in juju: | |
milestone: | 2.2-beta3 → 2.2-beta4 |
Changed in juju: | |
milestone: | 2.2-beta4 → 2.2-rc1 |
no longer affects: | juju |
description: | updated |
Changed in lxd (Ubuntu Xenial): | |
status: | New → Triaged |
status: | Triaged → In Progress |
importance: | Undecided → Medium |
Here's a cloud-init log from a successfully started machine for comparison.