OS-deployer job fails to complete

Bug #1494356 reported by Martin Packman on 2015-09-10
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Critical
Ian Booth

Bug Description

The Openstack deployer bundle test is failing in CI on 1.25 and master.

Last successful run on master:

<http://reports.vapour.ws/releases/3038/job/OS-deployer/attempt/243>

Timeout on master:

<http://reports.vapour.ws/releases/3042/job/OS-deployer/attempt/256>

Timeout on 1.25:

<http://reports.vapour.ws/releases/3040/job/OS-deployer/attempt/250>

The symptom seems to be we cannot create lxc containers, but there is no obvious smoking gun in the logs.

Cheryl Jennings (cherylj) wrote :

Looked at the logs for machine-0, and saw that the containers were started, but then never connected to the API. Can you get the container logs from machine-0 in /var/lib/juju/containers/juju-machine-0-lxc-0/*? (or tell me how to access this environment?)

Also, is there a /var/log/juju/machine-0-lxc-0.log on machine-0?

Changed in juju-core:
assignee: nobody → Cheryl Jennings (cherylj)
Martin Packman (gz) wrote :

If the lxc container logs are created, they are included, compare the earlier run from the same revision on 1.25:

<http://reports.vapour.ws/releases/3040/job/OS-deployer/attempt/248>

I'm still trying to eliminate the possibility something is borked with our maas, but the other maas jobs are passing and we had a pass with this bundle on 1.24 after the 1.25 failure.

Martin Packman (gz) wrote :

Sorry, /var/lib/juju/containers/* logs are not included, I can add them in, but the lxc machine logs are included if the machine agents come up.

Cheryl Jennings (cherylj) wrote :

Any luck getting the /var/lib/juju/containers/* logs?

Dimiter Naydenov (dimitern) wrote :

Especially in these cases where the container started, but couldn't connect to the API and rsyslog, some other files can be useful to include in the artifacts:: the run-time config /var/lib/lxc/<container>/config and cloud-init files, possibly even rootfs/var/log/cloud* if available.

Martin Packman (gz) wrote :

This run in jenkins that has the lxc logs:

<http://juju-ci.vapour.ws/job/OS-deployer/262/>

Of interest, some lxc containers did come up, others are still pending, and the job was still doing work when it was timed out. Also, the one of the charms complains due to a rename. I've updated the bundle we're using, and requeued a run with a longer timeout to see if that's more informative.

Historically, this job takes about 35 minutes to complete. The retests with master have been slowly progressing at 45 mins. I am not sure if our maas box is running slower or juju is running slower.

Cheryl Jennings (cherylj) wrote :

The link in comment #6 doesn't work for me. Is it correct?

Martin Packman (gz) wrote :

Log in. There are subsequent passing runs that will also be a useful comparison.

Juju-CI verified that this issue is Fix Released in juju-core 1.25:
    http://reports.vapour.ws/releases/3049

Martin Packman (gz) wrote :

Okay, this bug is actually super simple, and not lxc related. Look at any of the unit logs, for instance in:

<http://juju-ci.vapour.ws/job/OS-deployer/272/artifact/artifacts/>

machine-6/unit-nova-cloud-controller-0-2015-09-12T01-46-41.259.log

We're getting the line:

2015-09-12 01:46:40 DEBUG juju.worker.uniter.remotestate watcher.go:393 update status timer triggered

16721 times. That exact line. Meaning we're logging an update status line ~2**14 times per second. So, performance problem indeed.

Martin Packman (gz) on 2015-09-12
no longer affects: juju-core/1.25
Tim Penhey (thumper) on 2015-09-14
Changed in juju-core:
assignee: Cheryl Jennings (cherylj) → Tim Penhey (thumper)
status: Triaged → In Progress
Tim Penhey (thumper) on 2015-09-14
Changed in juju-core:
assignee: Tim Penhey (thumper) → Ian Booth (wallyworld)
Ian Booth (wallyworld) on 2015-09-14
Changed in juju-core:
milestone: none → 1.26-alpha1
Ian Booth (wallyworld) on 2015-09-14
Changed in juju-core:
status: In Progress → Fix Committed
Tim Penhey (thumper) on 2015-09-15
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers