OS-deployer job fails to complete
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | juju-core |
Critical
|
Ian Booth | ||
Bug Description
The Openstack deployer bundle test is failing in CI on 1.25 and master.
Last successful run on master:
<http://
Timeout on master:
<http://
Timeout on 1.25:
<http://
The symptom seems to be we cannot create lxc containers, but there is no obvious smoking gun in the logs.
| Cheryl Jennings (cherylj) wrote : | #1 |
| Changed in juju-core: | |
| assignee: | nobody → Cheryl Jennings (cherylj) |
| Martin Packman (gz) wrote : | #2 |
If the lxc container logs are created, they are included, compare the earlier run from the same revision on 1.25:
<http://
I'm still trying to eliminate the possibility something is borked with our maas, but the other maas jobs are passing and we had a pass with this bundle on 1.24 after the 1.25 failure.
| Martin Packman (gz) wrote : | #3 |
Sorry, /var/lib/
| Cheryl Jennings (cherylj) wrote : | #4 |
Any luck getting the /var/lib/
| Dimiter Naydenov (dimitern) wrote : | #5 |
Especially in these cases where the container started, but couldn't connect to the API and rsyslog, some other files can be useful to include in the artifacts:: the run-time config /var/lib/
| Martin Packman (gz) wrote : | #6 |
This run in jenkins that has the lxc logs:
<http://
Of interest, some lxc containers did come up, others are still pending, and the job was still doing work when it was timed out. Also, the one of the charms complains due to a rename. I've updated the bundle we're using, and requeued a run with a longer timeout to see if that's more informative.
Historically, this job takes about 35 minutes to complete. The retests with master have been slowly progressing at 45 mins. I am not sure if our maas box is running slower or juju is running slower.
| Cheryl Jennings (cherylj) wrote : | #7 |
The link in comment #6 doesn't work for me. Is it correct?
| Martin Packman (gz) wrote : | #8 |
Log in. There are subsequent passing runs that will also be a useful comparison.
Juju-CI verified that this issue is Fix Released in juju-core 1.25:
http://
| Martin Packman (gz) wrote : | #10 |
Okay, this bug is actually super simple, and not lxc related. Look at any of the unit logs, for instance in:
<http://
machine-
We're getting the line:
2015-09-12 01:46:40 DEBUG juju.worker.
16721 times. That exact line. Meaning we're logging an update status line ~2**14 times per second. So, performance problem indeed.
| no longer affects: | juju-core/1.25 |
| Changed in juju-core: | |
| assignee: | Cheryl Jennings (cherylj) → Tim Penhey (thumper) |
| status: | Triaged → In Progress |
| Changed in juju-core: | |
| assignee: | Tim Penhey (thumper) → Ian Booth (wallyworld) |
| Changed in juju-core: | |
| milestone: | none → 1.26-alpha1 |
| Changed in juju-core: | |
| status: | In Progress → Fix Committed |
| Changed in juju-core: | |
| status: | Fix Committed → Fix Released |


Looked at the logs for machine-0, and saw that the containers were started, but then never connected to the API. Can you get the container logs from machine-0 in /var/lib/ juju/containers /juju-machine- 0-lxc-0/ *? (or tell me how to access this environment?)
Also, is there a /var/log/ juju/machine- 0-lxc-0. log on machine-0?