Juju cannot create vivid containers
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | juju-core |
High
|
Cheryl Jennings | ||
| | 1.23 |
Medium
|
Unassigned | ||
| | 1.24 |
High
|
Cheryl Jennings | ||
| | openstack-installer |
Confirmed
|
High
|
Unassigned | |
Bug Description
Per bug 1436415 and bug 1441319, juju 1.23 on vivid cannot do a trivial deployment because it cannot clone templates to make the containers.
machines:
"0":
agent-state: started
agent-version: 1.23-beta4.1
dns-name: localhost
instance-id: localhost
series: vivid
state-
"1":
agent-
"
instance-id: pending
series: vivid
"2":
agent-
instance-id: pending
series: vivid
All the logs for several tries are available for local-deploy-
http://
^ The console log and the machine logs are present.
Note the test was made none voting because vivid became very unstable when it switch to systemd, but Ubuntu will require this test to pass to accept this in Vivid. So Core or Ubuntu need to solve what ever is broken. The Vivid's lxc has gotten updates so we know there were bugs in lxc that needed fixing.
But as of a few commits ago, master (1.24) can! It has passed twice without human intervention. 1.23 has some of the commits just added to master, but not all. A possible fix may be in
Commit d9fe120 Merge pull request #2033 from tasdomas/
Commit cd3494e Merge pull request #2016 from cherylj/env-users …
ADDENDUM:
We know understand that Juju cannot create vivid containers because an upstart script is used to shut the container down. We don't know why master passed, but human intervention could have been a factor. Once a template container is made, juju can make additional vivid containers.
Juju can deploy trusty containers (for trusty charms). The principal scenario for developing and testing charms works.
| Changed in juju-core: | |
| assignee: | nobody → Cheryl Jennings (cherylj) |
| Curtis Hovey (sinzui) wrote : | #1 |
| Cheryl Jennings (cherylj) wrote : | #2 |
I know for certain that my commit (#2016) wouldn't effect this problem, and I don't believe #2033 would either. Going to take a look at the lxc log to see if I can get any more information.
| Curtis Hovey (sinzui) wrote : | #3 |
I have good news. beta4 mostly works on vivid. It cannot complete the vivid template creation and destroy environment, but it cann deploy trusty and precise charms.
The BAD:
1. beta4 says '"juju-
2. Destroy environment cannot shutdown mongod, prevent any subsequent bootstraps (without human intervention):
ERROR while stopping mongod: fork/exec /sbin/initctl: no such file or directory
The UGLY
1. A human can complete what juju doesn't and get a working setup.
sudo lxc-stop juju-vivid-
juju destroy-environment local
sudo killall -ABRT mongod
juju bootstrap
juju deploy <charm>
2. After running juju destroy-
sudo killall -ABRT mongod
The GOOD
1. By setting "default-series: trusty" or "precise" in environments.yaml, users can remove ambiguous situations where juju will try to deploy a vivid charm. Users will need to type "vivid" and presumably know they will need to do some work arounds. trusty charms will just work, but destroy-environment is still broken.
| Curtis Hovey (sinzui) wrote : | #4 |
I have updated CI to NOT delete the templates containers to contrive deployments so that we can see the destroy-environment issue.
| Curtis Hovey (sinzui) wrote : | #5 |
Tim using newly build 1.23 and I using the last built 1.23 cannot reproduce the destroy-environment error that I saw earlier today. Maybe the machine dirty from earlier testing such as trying to deploy vivid charms.. 1.23 is suitable to do this:
juju init
juju switch local
juju bootstrap
juju deploy ubuntu
juju status
juju destroy-environment
The ubuntu is implicitly trusty and juju can create trusty template containers. This case matches what Ubuntu will test.
The only remaining issue that must be solved soon is ensuring juju can create vivid template containers.
| description: | updated |
| summary: |
- 1.23 cannot deploy on vivid, but master can + Juju cannot create vivid containers |
| Changed in juju-core: | |
| milestone: | 1.23.0 → 1.24-alpha1 |
| John A Meinel (jameinel) wrote : | #6 |
We have seen this in the wild on Trusty when a container hangs trying to shutdown. We think that case was because of I/O blocking preventing the container from shutting down for more than 5 minutes.
There are other possibilities where we're using the init system (upstart vs systemd) to issue the cleanup and shutdown commands, rather than just running them at the end of cloud-init. I'm not sure why we need a service to shutdown cleanly.
| Changed in juju-core: | |
| milestone: | 1.24-alpha1 → none |
| milestone: | none → 1.24.0 |
| Changed in juju-core: | |
| milestone: | 1.24.0 → 1.25.0 |
| no longer affects: | juju-core/1.23 |
| Adam Stokes (adam-stokes) wrote : | #7 |
here is my status output:
| Changed in cloud-installer: | |
| status: | New → Confirmed |
| importance: | Undecided → High |
| tags: | added: cloud-installer |
| Adam Stokes (adam-stokes) wrote : | #8 |
Just FYI, we're blocked on this for getting nclxd support added into our installer.
Thanks!
| Cheryl Jennings (cherylj) wrote : | #9 |
Attempting to recreate locally.
| Cheryl Jennings (cherylj) wrote : | #10 |
Was able to recreate, and I think this is due to problems in the cloud-init script for vivid. I see this error in the console.log for the vivid container:
[ 4429.946677] cloud-init[7627]: + /bin/systemctl link /var/lib/
[ 4429.950535] cloud-init[7627]: Created symlink from /etc/systemd/
[ 4430.388097] cloud-init[7627]: + /bin/systemctl daemon-reload
[ 4430.723843] cloud-init[7627]: + /bin/systemctl enable /var/lib/
[ 4431.081232] cloud-init[7627]: The unit files have no [Install] section. They are not meant to be enabled
[ 4431.081770] cloud-init[7627]: using systemctl.
[ 4431.084314] cloud-init[7627]: Possible reasons for having this kind of units are:
[ 4431.084638] cloud-init[7627]: 1) A unit may be statically enabled by being symlinked from another unit's
[ 4431.084958] cloud-init[7627]: .wants/ or .requires/ directory.
[ 4431.085327] cloud-init[7627]: 2) A unit's purpose may be to act as a helper for some other unit which has
[ 4431.085642] cloud-init[7627]: a requirement dependency on it.
[ 4431.086912] cloud-init[7627]: 3) A unit may be started when needed via activation (socket, path, timer,
[ 4431.087249] cloud-init[7627]: D-Bus, udev, scripted systemctl call, ...).
Going to investigate what this unit file should look like.
| Cheryl Jennings (cherylj) wrote : | #11 |
Worked with Eric Snow and added in the [Install] section, even if this is a transient config and eliminated the above error. Even with that change, the container still did not stop. Looking at the container itself, I see that the juju-template-
May 15 19:29:36 juju-vivid-
May 15 19:29:36 juju-vivid-
May 15 19:29:36 juju-vivid-
May 15 19:29:36 juju-vivid-
May 15 19:29:36 juju-vivid-
A quick google search shows that maybe we need to add the dependency on cloud-final.
| Cheryl Jennings (cherylj) wrote : | #12 |
I'm quite certain I have a fix for this now, but I want to check with ericsnow to do some sanity checking before committing.
The short version is that we need to alter our cloud-config to properly use systemd to halt the container once cloud-init completes.
There were a couple of issues with the current config, but basically the unit file and its handling should change to:
1 - Include an [Install] section, even if this is a transient service.
2 - Not specify "Conflicts" when we want to stop after some other service completes. Using "Conflicts" will actually kill the service we're trying to run after if we start while it is still running.
3 - Change the "After" to be the cloud-config.
4 - After we enable the juju-template-
After making the modifications above, I was able to deploy a trivial vivid charm successfully. I will get the code changes in tomorrow once I've chatted with Eric.
Basically, the issue is that we want to use systemd to start a service that will halt the container once cloud-init completes. As part of this service, we want it to remove itself such that other containers started from the template we're creating don't just halt once they start up.
| Cheryl Jennings (cherylj) wrote : | #13 |
Gah, didn't mean to include that last paragraph in the previous comment...
| Eric Snow (ericsnowcurrently) wrote : | #14 |
@Cheryl, the solution you've outlined sounds correct. It may be worth asking smoser about it.
| Eric Snow (ericsnowcurrently) wrote : | #15 |
For the record, using the init system to reboot at the end of cloudinit feels like a hack to me. It is certainly fragile, as this bug attests. However, it may still be the best solution. Furthermore, I don't have any better alternatives to offer.
| Cheryl Jennings (cherylj) wrote : | #16 |
Turns out all this hand waving with systemd may be unneeded. Cloud-config has an option to halt a system after cloud-init completes. Going to give that a try and see if it works for our use case.
| Cheryl Jennings (cherylj) wrote : | #17 |
Using power_state in the cloud-config did power off the system as expected. Testing this change with precise and trusty.
| Cheryl Jennings (cherylj) wrote : | #18 |
Sent an email to smoser to find out if we'll ever be in a situation where the cloud-init version we're using doesn't support power_state, and if we'll be able to determine that when we're generating our cloud-config.
| Cheryl Jennings (cherylj) wrote : | #19 |
After talking with thumper, I'll be making the changes needed to get the systemd logic working properly as the power_state option in cloud-config is not guaranteed to be present in all cases, and we'd have to inject some os/version logic to determine if it would be present which would need to be updated every time a new os/version needed lxc support in juju.
| Cheryl Jennings (cherylj) wrote : | #20 |
I have a patch up for review for 1.24: http://
| Changed in juju-core: | |
| status: | Triaged → In Progress |
| Curtis Hovey (sinzui) wrote : | #21 |
We do not intend to back port the fix to 1.23.4 because 1.24.0 is scheduled for proposed this week. If there is a future change in plans, we will need to back port.
| Changed in juju-core: | |
| status: | In Progress → Fix Committed |
| Changed in juju-core: | |
| status: | Fix Committed → Fix Released |

Attached is /var/lib/ juju/containers /juju-vivid- lxc-template/ container. log from the machine. is is 120M uncompressed.