intermittent: failed to retrieve the template to clone: template container juju-trusty-lxc-template did not stop

Bug #1441319 reported by Larry Michel
40
This bug affects 7 people
Affects Status Importance Assigned to Milestone
juju-core
Invalid
Medium
Unassigned

Bug Description

NOTE: this is an intermittent failure. The log file in comment #1 shows that there are many other LXC container that do start properly on other host machines.

This is similar to bug #1348386 but that one is fixed, and have been seeing for 1.22.

This happens for OpenStack deployment:
+ . ./pipeline_parameters
++ export OPENSTACK_RELEASE=icehouse
++ OPENSTACK_RELEASE=icehouse
++ export COMPUTE=nova-kvm
++ COMPUTE=nova-kvm
++ export BLOCK_STORAGE=cinder-iscsi
++ BLOCK_STORAGE=cinder-iscsi
++ export IMAGE_STORAGE=glance-swift
++ IMAGE_STORAGE=glance-swift
++ export PIPELINE_ID=a8431fdc-a5cd-4985-baed-3248c6f00cb6
++ PIPELINE_ID=a8431fdc-a5cd-4985-baed-3248c6f00cb6
++ export NETWORKING=neutron-nvp
++ NETWORKING=neutron-nvp
++ export UBUNTU_RELEASE=trusty
++ UBUNTU_RELEASE=trusty

From juju-debug log file:

machine-4[3721]: 2015-04-05 05:32:17 INFO juju.container.lxc clonetemplate.go:208 template container started, now wait for it to stop
machine-4[3721]: 2015-04-05 05:37:42 INFO juju.container.lxc clonetemplate.go:227 not heard anything from the template log for five minutes
machine-4[3721]: 2015-04-05 05:37:42 INFO juju.container lock.go:66 release lock "juju-trusty-lxc-template"
machine-4[3721]: 2015-04-05 05:37:42 ERROR juju.provisioner.lxc lxc-broker.go:110 failed to start container: failed to retrieve the template to clone: template container "juju-trusty-lxc-template" did not stop
machine-4[3721]: 2015-04-05 05:37:42 ERROR juju.provisioner provisioner_task.go:531 cannot start instance for machine "4/lxc/0": failed to retrieve the template to clone: template container "juju-trusty-lxc-template" did not stop

From juju_status.yaml:

  '4':
    agent-state: started
    agent-version: 1.22.0
    containers:
      4/lxc/0:
        agent-state-info: 'failed to retrieve the template to clone: template container
          "juju-trusty-lxc-template" did not stop'
        instance-id: pending
        series: trusty
      4/lxc/1:
        agent-state-info: 'lxc container cloning failed: cannot clone a running container'
        instance-id: pending
        series: trusty

Revision history for this message
Larry Michel (lmic) wrote :
Curtis Hovey (sinzui)
tags: added: lxc
Changed in juju-core:
status: New → Triaged
importance: Undecided → Medium
Curtis Hovey (sinzui)
Changed in juju-core:
importance: Medium → High
milestone: none → 1.24-alpha1
tags: added: ci test-failure
tags: added: vivid
Changed in juju-core:
assignee: nobody → Katherine Cox-Buday (cox-katherine-e)
Revision history for this message
Katherine Cox-Buday (cox-katherine-e) wrote :

It looks like stderr from lxc-start was being dumped to "/var/lib/juju/containers/juju-*-lxc-template/container.log". With help from the CI team, we're now capturing these logs. Here is the offending log from the latest run which shows some issues I'm looking into further.

Revision history for this message
Katherine Cox-Buday (cox-katherine-e) wrote :

Some various interesting bits from the log:

      lxc-start 1426805367.372 WARN lxc_confile - confile.c:config_pivotdir:1768 - lxc.pivotdir is ignored. It will soon become an error.
      lxc-start 1426805367.373 WARN lxc_log - log.c:lxc_log_init:316 - lxc_log_init called with log already initialized
      lxc-start 1426805367.376 WARN lxc_cgmanager - cgmanager.c:cgm_get:962 - do_cgm_get exited with error

...

      lxc-start 1426805367.662 ERROR lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:183 - No such file or directory - failed to change apparmor profile to lxc-container-default
      lxc-start 1426805367.662 ERROR lxc_sync - sync.c:__sync_wait:51 - invalid sequence number 1. expected 4
      lxc-start 1426805367.662 ERROR lxc_start - start.c:__lxc_start:1157 - failed to spawn 'juju-vivid-lxc-template'
      lxc-start 1426805367.663 ERROR lxc_cgmanager - cgmanager.c:cgm_remove_cgroup:518 - call to cgmanager_remove_sync failed: invalid request
      lxc-start 1426805367.663 ERROR lxc_cgmanager - cgmanager.c:cgm_remove_cgroup:520 - Error removing all:lxc/juju-vivid-lxc-template-11
      lxc-start 1426805367.700 WARN lxc_commands - commands.c:lxc_cmd_rsp_recv:172 - command get_init_pid failed to receive response
      lxc-start 1426805367.701 WARN lxc_cgmanager - cgmanager.c:cgm_get:962 - do_cgm_get exited with error
      lxc-start 1426805372.706 ERROR lxc_start_ui - lxc_start.c:main:344 - The container failed to start.
      lxc-start 1426805372.706 ERROR lxc_start_ui - lxc_start.c:main:346 - To get more details, run the container in foreground mode.
      lxc-start 1426805372.706 ERROR lxc_start_ui - lxc_start.c:main:348 - Additional information can be obtained by setting the --logfile and --logpriority options.
      lxc-start 1426806676.505 INFO lxc_start_ui - lxc_start.c:main:264 - using rcfile /var/lib/lxc/juju-vivid-lxc-template/config
      lxc-start 1426806676.505 WARN lxc_confile - confile.c:config_pivotdir:1768 - lxc.pivotdir is ignored. It will soon become an error.
      lxc-start 1426806676.506 WARN lxc_log - log.c:lxc_log_init:316 - lxc_log_init called with log already initialized
      lxc-start 1426806676.508 WARN lxc_cgmanager - cgmanager.c:cgm_get:962 - do_cgm_get exited with error

Revision history for this message
Katherine Cox-Buday (cox-katherine-e) wrote :

Discussions with app armor/LXC experts lead us to believe that this is a possible race issue in Juju: i.e., possibly apt-get install lxc is not yet complete by the time we attempt to utilize lxc commands. Clues include the fact that changing apparmor profiles fail several times but eventually succeed. We believe there is a secondary issue which is causing the spam at the tail of the log (peer has disconnected), but the thought is solving the first issue might solve the secondary issue, or at least make it more clear what's happening.

Further investigation is needed.

Revision history for this message
Tim Penhey (thumper) wrote :

Curtis, can you please file a different bug for vivid. I'm 99% certain that it is a different cause.

The way we get the template container to stop is to add an upstart job to shutdown the machine. Since vivid is using systemd, we need a more robust solution there.

The errors shown above are trusty, and different.

Tim Penhey (thumper)
summary: - failed to retrieve the template to clone: template container juju-
- trusty-lxc-template did not stop
+ intermittent: failed to retrieve the template to clone: template
+ container juju-trusty-lxc-template did not stop
description: updated
Revision history for this message
Tim Penhey (thumper) wrote :

Hey Larry,

Can we gather extra logging information from this environment? Or has it been torn down?

If you can, we'd love everything from /var/lib/juju/containers (should only be one directory there for the template, but if there are others, grab them too).

Also /var/log/juju/machine-4.log

Revision history for this message
Tim Penhey (thumper) wrote :

The source of this problem is almost certainly a race condition on the host machine.

In order to reduce the number of packages we install by default on the cloud instances, the container packages are installed in a "just in time" manner. It seems that it isn't quite in time, or more precisely, the packages are installed, but some of the other components that are needed haven't got themselves into a stable state before we try to use them in anger.

What we probably want to do is to have some form of 'ready check' that we can do after the packages are installed before we try and create the template container. We just don't know that this should be yet.

Changed in juju-core:
status: Triaged → Incomplete
Curtis Hovey (sinzui)
tags: removed: ci test-failure vivid
Curtis Hovey (sinzui)
no longer affects: juju-core/1.23
Revision history for this message
Curtis Hovey (sinzui) wrote :

As of Commit 2e07936c in 1.23, the aws-deployer-bundle test fails like this bug report

machines:
  "0":
    agent-state: started
    agent-version: 1.23-beta4
    dns-name: 52.5.226.249
    instance-id: i-470c67bb
    instance-state: running
    series: trusty
    hardware: arch=amd64 cpu-cores=1 cpu-power=300 mem=3840M root-disk=8192M availability-zone=us-east-1a
    state-server-member-status: has-vote
  "1":
    agent-state: started
    agent-version: 1.23-beta4
    dns-name: 52.4.228.68
    instance-id: i-b40c5a63
    instance-state: running
    series: trusty
    hardware: arch=amd64 cpu-cores=1 cpu-power=300 mem=3840M root-disk=8192M availability-zone=us-east-1c
  "2":
    agent-state: started
    agent-version: 1.23-beta4
    dns-name: 52.5.201.139
    instance-id: i-e9983b14
    instance-state: running
    series: trusty
    containers:
      2/lxc/0:
        agent-state-info: 'failed to retrieve the template to clone: template container
          "juju-trusty-lxc-template" did not stop'
        instance-id: pending
        series: trusty
      2/lxc/1:
        agent-state-info: 'lxc container cloning failed: cannot clone a running container'
        instance-id: pending
        series: trusty
    hardware: arch=amd64 cpu-cores=1 cpu-power=300 mem=3840M root-disk=8192M availability-zone=us-east-1d

I captured the container log from the machine before it was destroyed.

Curtis Hovey (sinzui)
tags: added: ci deployer regression
Changed in juju-core:
status: Incomplete → Triaged
importance: High → Critical
importance: Critical → High
Revision history for this message
Curtis Hovey (sinzui) wrote :

I removed the "ci" tag from this bug because the commit that caused this was reverting a feature. We don't want to block the secondary fixes that this branch needs.

tags: removed: ci
Revision history for this message
Curtis Hovey (sinzui) wrote :

Using the bundle found at
    http://bazaar.launchpad.net/~juju-qa/juju-ci-tools/repository/view/head:/bundles.yaml
you can run the same test as CI
    juju bootstrap
    juju --show-log deployer --debug --deploy-delay 10 --config bundles.yaml

This works on 1.22.1 and 1.23-beta4 (cut form cherylj's commit).
Starting with commit 2e07936c (which thinks it is 1.23-beta4, but is not) deployments with containers in aws will fail.

Revision history for this message
Katherine Cox-Buday (cox-katherine-e) wrote :

For the record, apparmor did change on the machine last week: 2.8.95~2430-0ubuntu5 release (main) 2014-04-04.

However, this certainly lends credence that commit 2e07936c is suspect since there is a version of v1.23 that works. Curtis also checked to ensure the AWS mirrors were not stale.

Curtis Hovey (sinzui)
no longer affects: juju-core/1.23
Changed in juju-core:
importance: High → Medium
assignee: Katherine Cox-Buday (cox-katherine-e) → nobody
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.24-alpha1 → none
tags: added: systemd
tags: added: upstart
Revision history for this message
Larry Michel (lmic) wrote :

We hit this with twice yesterday:

'5':
    agent-state: started
    agent-version: 1.23.2
    containers:
      5/lxc/0:
        agent-state-info: 'failed to retrieve the template to clone: template container
          "juju-trusty-lxc-template" did not stop'
        instance-id: pending
        series: trusty
      5/lxc/1:
        agent-state-info: 'lxc container cloning failed: cannot clone a running container'
        instance-id: pending
        series: trusty
    dns-name: hayward-16.oil
    hardware: arch=amd64 cpu-cores=8 mem=16384M tags=debug,hw-ok,oil-slave-2,hardware-sm15k,hw-glance-sm15k
    instance-id: /MAAS/api/1.0/nodes/node-a336d312-c4cd-11e3-824b-00163efc5068/
    series: trusty

Revision history for this message
Alvaro Uria (aluria) wrote :

Hello,

This also happens on juju 1.22.6.1 (current stable pkg in Ubuntu Trusty - 1.22.6-0ubuntu1~14.04.1). I used debian-installer preseed.
"""
machines:
  "0":
    agent-state: started
    agent-version: 1.22.6.1
    dns-name: os-1.maas
    instance-id: /MAAS/api/1.0/nodes/node-XXXX/
    series: trusty
    containers:
      0/lxc/0:
        agent-state-info: 'failed to retrieve the template to clone: template container
          "juju-trusty-lxc-template" did not stop'
        instance-id: pending
        series: trusty
"""

I even tried to add "apt-get install lxc -qy" on d-i late_commands stage, with the same effect.

Cheers,
-Alvaro.

tags: added: canonical-bootstack
Revision history for this message
Cheryl Jennings (cherylj) wrote :

At this point, we'll need the console log from the template container to figure out what's going on. Can you attach the contents of /var/lib/juju/containers/juju-trusty-lxc-template/* (making sure to get console.log in particular) for me to look at?

Revision history for this message
Alvaro Uria (aluria) wrote :

Hello Cheryl,

I don't have that information now, but I will try to gather it on Monday.

Cheers,
-Alvaro.

Revision history for this message
Jill Rouleau (jillrouleau) wrote :

Cheryl,

Ran into this in a different environment, using fastpath intaller.
    containers:
      0/lxc/0:
        agent-state-info: |-
          failed to retrieve the template to clone: cannot determine cached image URL: cannot determine LXC image URL: cannot determine LXC image URL: failed to get https://cloud-images.ubuntu.com/query/trusty/server/released-dl.current.txt
          : exit status 1: cannot determine LXC image URL: failed to get https://cloud-images.ubuntu.com/query/trusty/server/released-dl.current.txt
          : exit status 1
        instance-id: pending
        series: trusty

Requested logs are attached, I also saved off /var/log in case you need machine logs or anything else.
Thanks.

Revision history for this message
Jill Rouleau (jillrouleau) wrote :
Revision history for this message
Jill Rouleau (jillrouleau) wrote :
Revision history for this message
Jill Rouleau (jillrouleau) wrote :
Revision history for this message
Cheryl Jennings (cherylj) wrote :

Hi Jill, the problem that you ran into in seq #18 is different than the original issue. From the logs you attached, I can see that the template container did stop and was able to be cloned. There may have been a temporary outage for cloud-images.ubuntu.com which would lead to the error message you saw. Were you able to do any other deployments after the failed one?

Revision history for this message
Alvaro Uria (aluria) wrote :

I think issue on #18 was related to https://bugs.launchpad.net/ubuntu/+bug/1485456. glance-simplestreams-sync service was also affected, returning 403 errors.

Revision history for this message
Alvaro Uria (aluria) wrote :

Hello Cheryl,

Please find attached /var/lib/juju/containers/juju-trusty-lxc-template/ after 0/lxc/0 returned agent-state-info: 'failed to retrieve the template to clone: template container "juju-trusty-lxc-template" did not stop'.

Find also attached "juju status" output just after this error.

Please let me know if you would need more information.

Kind regards,
-Alvaro.

Revision history for this message
Alvaro Uria (aluria) wrote :

juju status output after 0/lxc/0 error

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Thanks, Alvaro. Taking a look now.

Revision history for this message
Jill Rouleau (jillrouleau) wrote :

Hi Cheryl,
I'm travelling this week so I have not, I'll see what we come up with in this other environment when I get back. Thanks!

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Looked at the logs more and things appear hung while doing an apt-get upgrade during cloud-init. The good news is that we don't see those lxc errors and warnings. The bad news is that now we need to figure out what cloud-init is doing, and this requires pulling the cloud-init log which is located in the template container at /var/log/cloud-init.log

I do have a few other questions to help piece together what may be going on:
1 - if you run lxc-ls --fancy on the machine hosting the lxc containers, do you see the container named juju-trusty-lxc-template ever stop? I'm wondering if the update is just taking longer than the timeout set to wait for the container to stop
2 - are there any other issues on the machine which could stall progress such as a disk full, or networking issues?

Revision history for this message
Alvaro Uria (aluria) wrote :

Hello Cheryl,

I would need to redeploy to get /var/log/cloud-init.log from juju-trusty-lxc-template. However, I've got /var/log/juju from machine "0", as well as a bit more recent console.log, which was definitely stuck on:
"""
Get:43 http://archive.ubuntu.com trusty/main Sources [1335 kB]
Get:44 http://security.ubuntu.com trusty-security/main amd64 Packages [419 kB]
"""

With regard to your questions:
1.- "lxc-ls --fancy" shows juju-trusty-lxc-template in state "RUNNING". I waited for 40 minutes and it didn't stop (while rest of machines were being deployed [nova-compute units, etc.])

2.- I haven't seen any further issue on machine 0. With regard to networking, I haven't tested manually but lxc.conf as well as /e/n/interfaces files show a correct MTU setting. Besides, deploys with juju 1.20.14 have worked all 3 times tried while deploys with 1.22.6 have not worked any of the 8-9 times tried.

Cheers,
-Alvaro.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

If you have access to a system where 1.20.14 successfully created the template, please send the contents of /var/lib/juju/containers/juju-trusty-lxc-template/ for that system so I can compare the two. In the meantime, I'll try to see what's changed between 1.20.14 and 1.22.6.

Previously, it was mentioned that this bug was intermittent. Is it the case now where you can't deploy to a container at all with 1.22.6?

Changed in juju-core:
assignee: nobody → Cheryl Jennings (cherylj)
Revision history for this message
Alvaro Uria (aluria) wrote :

Hello Cheryl,

Please find attached /var/lib/juju/containers/juju-trusty-lxc-template/ from successful 1.20.14 deploy, on the same machine 0 metal as I tried with 1.22.6. I see, lxc.conf specifies mtu on 1.22.6 deploy. FWIW, mtu 8000 was/is configured both on MAAS node and machine 0 interfaces.

OTOH, juju 1.22.6 has failed all the times in this specific environment: HA + debian-installer (running mdadm on top of disks). Curtin can't be used as SW RAID1 is not supported yet.

However, 1.22.6 has worked fine on a Staging HA environment, using Curtin (and in another environment we have used it, as well... ha+curtin).

Please let me know if I can help you more.

Cheers,
-Alvaro.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

I've asked smoser for some assistance with this bug.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

We're still hitting this on 1.24.5.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Jason, if you're running into it again, could you grab the cloud init logs off of the container? They'll be in:

/var/log/cloud-init.log
/var/log/cloud-init-output.log

on the juju-trusty-lxc-template container. Scott is back from vacation tomorrow, so I can follow up with him then. It would be nice to have those cloud init logs to go through with him if you can get them. Thanks!

Revision history for this message
Matt Rae (mattrae) wrote :

In my case this issue appears related to the mtu set on the containers. i'm using juju 1.24.5

When the physical network mtu is 1500 we need to decrease the instance mtu due to the additional GRE header added when using a neutron network encapsulated with GRE.

we can change the default instance mtu to 1454 with ' juju set neutron-gateway instance-mtu=1454'

This makes the openstack instance mtu is 1454 but the juju-trusty-lxc-template created on that instance still has mtu 1500 causing apt-get update to hang during cloud-init.

can we set the default mtu for containers created by juju?

Revision history for this message
Matt Rae (mattrae) wrote :
tags: added: cpec
Revision history for this message
Matt Rae (mattrae) wrote :

adding 'lxc-default-mtu: 1454' to my .juju/environments.yaml prior to bootstrapping solved this error for me.

I found 'lxc-default-mtu' from this bug https://bugs.launchpad.net/juju-core/+bug/1442257

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Since Matt reports the issue is no longer happening when lxc-default-mtu is set, should we close this?

Revision history for this message
Scott Moser (smoser) wrote :

should the user be expected to set such a thing or risk some arbitrary network related failure?

Seems like *something* should be fixed here.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

This should certainly work by default.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Jason, are you saying juju should *by default* set the MTU for every NIC configured for each LXC container to 1454 ?
Sorry, but I disagree - that might work for this specific setup with GRE tunnels, but other stakeholders have different setups - we were asked to even set it to 9000 by default. Juju provides a setting to allow you to explicitly set MTU for any LXC NICs, exactly for these cases. The original solution I had implemented did discover the host NIC's MTU and used that for the corresponding LXC NIC. That was deemed too "magical" and not helpful in a lot of cases, esp. with corosync in play. So now there's lxc-default-mtu instead.

tags: added: cisco landscape
Revision history for this message
Tom Haddon (mthaddon) wrote :

Perhaps at a minimum we could have a better error message to explain what's failing and what the likely fix is.

Revision history for this message
Antoni Segura Puimedon (celebdor) wrote :

I reproduced with juju 1.25.0 on a trusty machine deploying a precise lxc container:

    2015-11-26 12:14:58 ERROR juju.provisioner.lxc lxc-broker.go:168 failed to start container: failed to retrieve the template to clone: template container "juju-precise-lxc-template" did not stop
    2015-11-26 12:14:58 ERROR juju.provisioner provisioner_task.go:644 cannot start instance for machine "2/lxc/0": failed to retrieve the template to clone: template container "juju-precise-lxc-template" did not stop
    2015-11-26 12:53:54 ERROR juju.provisioner.lxc lxc-broker.go:168 failed to start container: lxc container cloning failed
    2015-11-26 12:54:04 ERROR juju.provisioner.lxc lxc-broker.go:168 failed to start container: lxc container cloning failed
    2015-11-26 12:54:14 ERROR juju.provisioner.lxc lxc-broker.go:168 failed to start container: lxc container cloning failed
    2015-11-26 12:54:24 ERROR juju.provisioner.lxc lxc-broker.go:168 failed to start container: lxc container cloning failed
    2015-11-26 12:54:24 ERROR juju.provisioner provisioner_task.go:644 cannot start instance for machine after a retry "2/lxc/1": lxc container cloning failed

Any idea on a workaround or fix?

Revision history for this message
Antoni Segura Puimedon (celebdor) wrote :
Download full text (26.7 KiB)

As requested above:

ubuntu@digital-hair:~$ sudo cat /var/lib/juju/containers/juju-precise-lxc-template/console.log
    ==
    Checking for running unattended-upgrades:
    acpid: exiting
    TERM environment variable not set.
    <4>init: tty4 main process (350) killed by TERM signal
    <4>init: tty2 main process (367) killed by TERM signal
    <4>init: tty3 main process (369) killed by TERM signal
    Thu Nov 26 13:46:39 UTC 2015: shutting down for shutdown-unknown [up 6609s].
    <4>init: cron main process (377) killed by TERM signal
    <4>init: irqbalance main process (387) killed by TERM signal
    <4>init: console main process (413) killed by TERM signal
    <4>init: tty1 main process (417) killed by TERM signal
    <4>init: hwclock-save main process (582) terminated with status 70
    <4>init: plymouth-upstart-bridge main process (591) terminated with status 1
     * Stopping landscape-client daemon
       ...fail!
     * Asking all remaining processes to terminate...
       ...done.
     * All processes ended within 1 seconds....
       ...done.
    initctl: Event failed
     * Deactivating swap...
       ...fail!
    mount: cannot mount block device LABEL=cloudimg-rootfs read-only
     * Will now halt

========================================================

ubuntu@digital-hair:~$ sudo cat /var/lib/juju/containers/juju-precise-lxc-template/container.log
==
      lxc-start 1448545270.029 DEBUG lxc_commands - commands.c:lxc_cmd_handler:888 - peer has disconnected
      lxc-start 1448545270.029 DEBUG lxc_commands - commands.c:lxc_cmd_handler:888 - peer has disconnected
      lxc-start 1448545270.031 DEBUG lxc_commands - commands.c:lxc_cmd_handler:888 - peer has disconnected
      lxc-start 1448545270.032 DEBUG lxc_commands - commands.c:lxc_cmd_handler:888 - peer has disconnected
      lxc-start 1448545270.033 DEBUG lxc_commands - commands.c:lxc_cmd_handler:888 - peer has disconnected
      lxc-start 1448545270.034 DEBUG lxc_commands - commands.c:lxc_cmd_handler:888 - peer has disconnected
      lxc-start 1448545270.035 DEBUG lxc_commands - commands.c:lxc_cmd_handler:888 - peer has disconnected
      lxc-start 1448545270.035 DEBUG lxc_commands - commands.c:lxc_cmd_handler:888 - peer has disconnected
      lxc-start 1448545270.036 DEBUG lxc_commands - commands.c:lxc_cmd_handler:888 - peer has disconnected
      lxc-start 1448545270.038 DEBUG lxc_commands - commands.c:lxc_cmd_handler:888 - peer has disconnected
      lxc-start 1448545270.038 DEBUG lxc_commands - commands.c:lxc_cmd_handler:888 - peer has disconnected
      lxc-start 1448545270.054 DEBUG lxc_commands - commands.c:lxc_cmd_handler:888 - peer has disconnected
      lxc-start 1448545270.056 DEBUG lxc_commands - commands.c:lxc_cmd_handler:888 - peer has disconnected
      lxc-start 1448545270.057 DEBUG lxc_commands - commands.c:lxc_cmd_handler:888 - peer has disconnected
      lxc-start 1448545270.059 DEBUG lxc_commands - commands.c:lxc_cmd_handler:888 - peer has disconnected
      lxc-start 1448545270.060 DEBUG lxc_commands - commands.c:lxc_cmd_handler:888 - peer has disconnected
      lxc-start 1448545270.061 DEBUG lxc_c...

Revision history for this message
Antoni Segura Puimedon (celebdor) wrote :

After digging the whole afternoon I found out that the problem was a misconfiguration of the maas dhcp server. It only left space for 7 addresses. When they exhausted, all the other new machines/containers failed to deploy.

I recommend:
- using the maas api to detect subnet exhaustion and prevent launching machines/containers that can't be provisioned
- detecting failures due to the lxc container networking not getting up and reporting it meaningfully in the juju debug log and error message.

Changed in juju-core:
assignee: Cheryl Jennings (cherylj) → nobody
Curtis Hovey (sinzui)
description: updated
Curtis Hovey (sinzui)
description: updated
Changed in juju-core:
status: Triaged → Invalid
tags: removed: regression
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.