LXC on Maas: cannot get tools from machine for lxc container

Bug #1351368 reported by David Britton
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
Triaged
High
Unassigned

Bug Description

I'm not sure what is unique about this situation. It's similar to many of the other LXC bugs that have been filed where instances stay in pending indefinitely. But the way it got there is a bit unique:

2014-08-01 15:10:27 ERROR juju.provisioner container_initialisation.go:154 cannot get tools from machine for lxc container
2014-08-01 15:10:27 ERROR juju.provisioner container_initialisation.go:95 starting container provisioner for lxc: initialising container infrastructure on host machine: no matching tools available
2014-08-01 15:10:27 ERROR juju.worker runner.go:218 exited "0-container-watcher": initialising container infrastructure on host machine: no matching tools available

Then at the end of the log (not sure if this is important):

2014-08-01 15:11:59 ERROR juju.state.apiserver debuglog.go:101 debug-log handler error: write tcp 172.16.1.98:49085: broken pipe

I'll attach the full machine-0.log for details.

The end result is that the lxcs do not come up, even the lxc package doesn't get installed.

Environment: trusty w/ trusty containers, juju 1.20.2 pre-release package.

Revision history for this message
David Britton (dpb) wrote :
David Britton (dpb)
description: updated
Revision history for this message
Adam Collard (adam-collard) wrote :

Did you perhaps race the release of 1.20.3 here? I'm guessing now that's "out" the tools for 1.20.2 are gone?

Revision history for this message
David Britton (dpb) wrote : Re: [Bug 1351368] Re: LXC on Maas: cannot get tools from machine for lxc container

On Fri, Aug 01, 2014 at 04:58:39PM -0000, Adam Collard wrote:
> Did you perhaps race the release of 1.20.3 here? I'm guessing now that's
> "out" the tools for 1.20.2 are gone?
>

Very possibly. It would make this a minor bug, just a nicer error
message.

--
David Britton <email address hidden>

Revision history for this message
Curtis Hovey (sinzui) wrote :

Can this issue be retested with 1.20.3?

Changed in juju-core:
status: New → Incomplete
importance: Undecided → High
milestone: none → 1.20.3
Revision history for this message
David Britton (dpb) wrote :

On Mon, Aug 04, 2014 at 02:43:24PM -0000, Curtis Hovey wrote:
> Can this issue be retested with 1.20.3?

Hi curtis, I'm not sure it can by me... My best RCA was a timing error
when the tools in the s3 bucket were being updated. When I try this url
now it works fine.

This feels to me more like a unit test that is missing in Juju.

I do think it should be a bug as the error reported is rather misleading
(not a release critical one of course). I think the real bug here is no
retries or means of knowing about this error other than following the
machine-0.log file. The instance from a juju status perspective just
stays in "pending" forever.

--
David Britton <email address hidden>

Revision history for this message
Ian Booth (wallyworld) wrote :

We need to work on better reporting of errors from commands inside cloud init. eg bug 1350008

Changed in juju-core:
milestone: 1.20.3 → none
Revision history for this message
David Britton (dpb) wrote :
Revision history for this message
David Britton (dpb) wrote :

Added a listing of the problem. I think I can hit it pretty consistently now. The machine starts up fine, the tools are there, etc. But when when we go to start LXCs, the agent thinks there are no tools:

    if m.doc.Tools == nil {
        return nil, errors.NotFoundf("agent tools for machine %v", m)
    }

Could be a race of some kind, I suppose. Not knowing much about it yet due to lack of logging.

I have a test that fires up a bunch of machines in our MAAS lab and then starts lxcs on those machines. It seems to hit it pretty frequently so far.

As I can repeat this pretty easily, I don't think it should be incomplete.

Changed in juju-core:
status: Incomplete → New
Revision history for this message
David Britton (dpb) wrote :

Another log snip: maybe I'm reading too much into this, but it appears like the lxc error comes before the the machine reports as "started"?

Curtis Hovey (sinzui)
Changed in juju-core:
status: New → Triaged
tags: added: lxc
Revision history for this message
David Britton (dpb) wrote :

Deployer file I use to demonstrate the issue: http://paste.ubuntu.com/8087092/

Revision history for this message
David Britton (dpb) wrote :
Ryan Harper (raharper)
tags: added: oil
Revision history for this message
Ryan Harper (raharper) wrote :

We're still seeing this on 1.20.5.

https://bugs.launchpad.net/juju-core/+bug/1359800

We also use the -to placement for containers. It's not 100% reliable hit, but pretty frequently.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.