Error creating container juju-trusty-lxc-template; Failed to parse config

Bug #1485784 reported by Jason Hobbs
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Cheryl Jennings
1.25
Fix Released
High
Cheryl Jennings

Bug Description

This is with juju 1.24.4.

Seeing a failure to deploy an LXC unit, juju status for the the LXC machine ends in this output:

          created.''; Container juju-trusty-lxc-template created.; + exit 0; lxc_container:
          confile.c: config_mount_auto: 1413 Invalid filesystem to automount: sys:mixed;
          lxc_container: parse.c: lxc_file_for_each_line: 57 Failed to parse config:
          lxc.mount.auto = cgroup:mixed proc:mixed sys:mixed; lxc_container: parse.c:
          lxc_file_for_each_line: 57 Failed to parse config: lxc.include = /usr/share/lxc/config/common.conf;
          lxc_container: parse.c: lxc_file_for_each_line: 57 Failed to parse config:
          lxc.include = /usr/share/lxc/config/ubuntu.common.conf; lxc_container: parse.c:
          lxc_file_for_each_line: 57 Failed to parse config: lxc.include = /usr/share/lxc/config/ubuntu-cloud.common.conf;
          lxc_container: lxc_create.c: main: 271 Error creating container juju-trusty-lxc-template'

I attached the full juju status. I've seen this on ppc64el and amd64, both times on trusty.

Tags: lxc oil
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Debug log from the machine where LXC setup failed.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

This machine had 1.0.7-0ubuntu0.2 installed originally and was upgraded to 1.1.2-0ubuntu3.1~cloud0 before this happened.

Curtis Hovey (sinzui)
tags: added: lxc
Revision history for this message
Curtis Hovey (sinzui) wrote :

Juju CI tests 1.1.2-0ubuntu3.1 on vivid with success.
All CI trusty machines have 1.0.7-0ubuntu0.2 and template creation succeeds.

This might be a case of 1.1.2-0ubuntu3 is mismatched to trusty, or some deps are incomplete. I expect lxc-templates to also be 1.1.2-0ubuntu3

Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 1.25.0
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

We don't hit this everytime - it's fairly rare. Feels like a race condition to me.

Revision history for this message
Tim Penhey (thumper) wrote :

If this happens again, can we get the contents of the directory:
  /var/lib/juju/containers/juju-trusty-lxc-template

It should contain some files along the lines of:

-rw-r--r-- 1 root root 1321 Jul 22 16:41 cloud-init
-rw------- 1 root root 43279 Nov 27 2014 console.log
-rw-r--r-- 1 root root 232552 Nov 27 2014 container.log
-rw-r--r-- 1 root root 111 Jul 22 16:41 lxc.conf

These will help us work out what went wrong with the container creation.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

We'll have to make code changes to pick up those logs as it's happening in an automated environment. Is /var/lib/juju/containers just conf and log files, no images or anything like that?

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Jason, yes it's just conf and log files.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Ok - attached the entire /var/lib/juju/containers folder from a system where this failed.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

BTW in that attachment the failure was with machine 4/lxc/1.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Thanks Jason. Could you also attach /usr/share/lxc/config/ubuntu-cloud.common.conf from machine 4?

Changed in juju-core:
assignee: nobody → Cheryl Jennings (cherylj)
Revision history for this message
Cheryl Jennings (cherylj) wrote :

In looking through the logs, I see that lxc/1 was picked to be created first, but creating the template container failed. So, the queued up lxc/0 tries to start and sees that the template container doesn't exist. Juju then tries to create the template and does so successfully. So, it was in fact the juju-trusty-lxc-template container that failed, but was successfully created during the second attempt.

Saw a similar error message reported by someone else earlier this year where Tycho had helped out, so I pinged Tycho to see if he could take a look in this case too.

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.25-alpha1 → 1.25-beta1
Revision history for this message
Cheryl Jennings (cherylj) wrote :

Pinged Tycho again to see if he could take a look.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

From Tycho: it looks like this might be a version mismatch; what versions of the liblxc1 and lxc-templates packages do you have installed?

Could you update the bug with the versions of liblxc1 and lxc-templates packages you have installed?

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

It's hard to say - it looks like lxc and those other packages are being updated right around the time the error happened. I attached /var/log/apt/history.log and machine-5.log from a failure.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Yeah, I can see that the attempt to create the container happens at the same time that the lxc packages are being updated (I'm assuming this is done by an install hook for whatever service is running on the machine).

lxc-create:
2015-08-20 11:59:30 TRACE golxc.run.lxc-create golxc.go:448 run: lxc-create [-n juju-trusty-lxc-template -t ubuntu-cloud -f /var/lib/juju/containers/juju-trusty-lxc-template/lxc.conf -- --debug --userdata /var/lib/juju/containers/juju-trusty-lxc-template/cloud-init --hostid juju-trusty-lxc-template -r trusty -T https://10.245.0.183:17070/environment/d48157e2-9592-4054-878a-

Update happens:
Start-Date: 2015-08-20 12:00:32
Commandline: apt-get --assume-yes --option Dpkg::Options::=--force-confnew --option Dpkg::Options::=--force-confdef dist-upgrade
Install: ieee-data:ppc64el (20131224.1, automatic), python-pyasn1:ppc64el (0.1.7-1ubuntu2, automatic)
Upgrade: lxc:ppc64el (1.0.7-0ubuntu0.2, 1.1.2-0ubuntu3.1~cloud0), python-urllib3:ppc64el (1.7.1-1ubuntu3, 1.9.1-3~cloud0), librbd1:ppc64el (0.80.10-0ubuntu0.14.04.1, 0.94.2-0ubuntu0.15.04.1~cloud0), librados2:ppc64el (0.80.10-0ubuntu0.14.04.1, 0.94.2-0ubuntu0.15.04.1~cloud0), python-netaddr:ppc64el (0.7.10-1ubuntu1.1, 0.7.12-2~cloud0), python-requests-whl:ppc64el (2.2.1-1ubuntu0.3, 2.4.3-6~cloud0), python3-lxc:ppc64el (1.0.7-0ubuntu0.2, 1.1.2-0ubuntu3.1~cloud0), liblxc1:ppc64el (1.0.7-0ubuntu0.2, 1.1.2-0ubuntu3.1~cloud0), python-six:ppc64el (1.5.2-1ubuntu1, 1.9.0-1~cloud0), python-six-whl:ppc64el (1.5.2-1ubuntu1, 1.9.0-1~cloud0), lxc-templates:ppc64el (1.0.7-0ubuntu0.2, 1.1.2-0ubuntu3.1~cloud0), python-netifaces:ppc64el (0.8-3build1, 0.10.4-0.1~cloud0), python-requests:ppc64el (2.2.1-1ubuntu0.3, 2.4.3-6~cloud0), qemu-utils:ppc64el (2.0.0+dfsg-2ubuntu1.16, 2.2+dfsg-5expubuntu9.3~cloud0), python-urllib3-whl:ppc64el (1.7.1-1ubuntu3, 1.9.1-3~cloud0)
End-Date: 2015-08-20 12:00:38

lxc-create fails:
2015-08-20 12:00:54 TRACE golxc.run.lxc-create golxc.go:458 run failed output: ...

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Yeah - the swift storage unit on the system is the one doing that update. Attached the log for it.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

A few ideas were raised on how to address this issue, but I think our best bet here is to implement some retry logic when creating the containers.

Changed in juju-core:
status: Triaged → In Progress
Revision history for this message
Cheryl Jennings (cherylj) wrote :

Review request: http://reviews.vapour.ws/r/2729/

The proposed code change retries the lxc-create (and also lxc-clone) call 3 times, with a 10 second delay between attempts. While this doesn't "fix" the issue where lxc-create fails during an apt-get upgrade, it will allow the deployment of the container to recover from this temporary error condition.

This can be treated as a temporary error since we know that subsequent calls to lxc-create after the apt-get upgrade completes will succeed. The error log for this bug shows that a subsequent attempt to create the template container succeeded (see comment #12).

In regards to preventing the error from happening in the first place, the basic message from the lxc team about this was "don't create containers while doing an upgrade". The way we can guarantee that upgrades don't happen while we're creating containers is to hold the apt-cache lock during the creation. We could end up holding this lock for an arbitrary amount of time during the container creation, so I'd rather not do that. This is why I opted for retrying the container creation instead.

Changed in juju-core:
milestone: 1.25-beta1 → 1.26-alpha1
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
no longer affects: lxc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.