OS-deployer job fails to complete

Bug #1494356 reported by Martin Packman
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Critical
Ian Booth

Bug Description

The Openstack deployer bundle test is failing in CI on 1.25 and master.

Last successful run on master:

<http://reports.vapour.ws/releases/3038/job/OS-deployer/attempt/243>

Timeout on master:

<http://reports.vapour.ws/releases/3042/job/OS-deployer/attempt/256>

Timeout on 1.25:

<http://reports.vapour.ws/releases/3040/job/OS-deployer/attempt/250>

The symptom seems to be we cannot create lxc containers, but there is no obvious smoking gun in the logs.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Looked at the logs for machine-0, and saw that the containers were started, but then never connected to the API. Can you get the container logs from machine-0 in /var/lib/juju/containers/juju-machine-0-lxc-0/*? (or tell me how to access this environment?)

Also, is there a /var/log/juju/machine-0-lxc-0.log on machine-0?

Changed in juju-core:
assignee: nobody → Cheryl Jennings (cherylj)
Revision history for this message
Martin Packman (gz) wrote :

If the lxc container logs are created, they are included, compare the earlier run from the same revision on 1.25:

<http://reports.vapour.ws/releases/3040/job/OS-deployer/attempt/248>

I'm still trying to eliminate the possibility something is borked with our maas, but the other maas jobs are passing and we had a pass with this bundle on 1.24 after the 1.25 failure.

Revision history for this message
Martin Packman (gz) wrote :

Sorry, /var/lib/juju/containers/* logs are not included, I can add them in, but the lxc machine logs are included if the machine agents come up.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Any luck getting the /var/lib/juju/containers/* logs?

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Especially in these cases where the container started, but couldn't connect to the API and rsyslog, some other files can be useful to include in the artifacts:: the run-time config /var/lib/lxc/<container>/config and cloud-init files, possibly even rootfs/var/log/cloud* if available.

Revision history for this message
Martin Packman (gz) wrote :

This run in jenkins that has the lxc logs:

<http://juju-ci.vapour.ws/job/OS-deployer/262/>

Of interest, some lxc containers did come up, others are still pending, and the job was still doing work when it was timed out. Also, the one of the charms complains due to a rename. I've updated the bundle we're using, and requeued a run with a longer timeout to see if that's more informative.

Historically, this job takes about 35 minutes to complete. The retests with master have been slowly progressing at 45 mins. I am not sure if our maas box is running slower or juju is running slower.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

The link in comment #6 doesn't work for me. Is it correct?

Revision history for this message
Martin Packman (gz) wrote :

Log in. There are subsequent passing runs that will also be a useful comparison.

Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote : Fix Released in juju-core 1.25

Juju-CI verified that this issue is Fix Released in juju-core 1.25:
    http://reports.vapour.ws/releases/3049

Revision history for this message
Martin Packman (gz) wrote :

Okay, this bug is actually super simple, and not lxc related. Look at any of the unit logs, for instance in:

<http://juju-ci.vapour.ws/job/OS-deployer/272/artifact/artifacts/>

machine-6/unit-nova-cloud-controller-0-2015-09-12T01-46-41.259.log

We're getting the line:

2015-09-12 01:46:40 DEBUG juju.worker.uniter.remotestate watcher.go:393 update status timer triggered

16721 times. That exact line. Meaning we're logging an update status line ~2**14 times per second. So, performance problem indeed.

Martin Packman (gz)
no longer affects: juju-core/1.25
Tim Penhey (thumper)
Changed in juju-core:
assignee: Cheryl Jennings (cherylj) → Tim Penhey (thumper)
status: Triaged → In Progress
Tim Penhey (thumper)
Changed in juju-core:
assignee: Tim Penhey (thumper) → Ian Booth (wallyworld)
Ian Booth (wallyworld)
Changed in juju-core:
milestone: none → 1.26-alpha1
Ian Booth (wallyworld)
Changed in juju-core:
status: In Progress → Fix Committed
Tim Penhey (thumper)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.