destroy-env leaves one lxc which remains pending on redeploy
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | juju-core |
Medium
|
Wayne Witzel III | ||
Bug Description
I'm repeatedly deploying a stack with 8 machines and around 50% of the time (small sample) one unit remains pending indefinitely:
$ juju status sca-dp-fe
environment: local
machines:
"5":
agent-state: pending
instance-id: michael-
series: trusty
hardware: arch=amd64
services:
sca-dp-fe:
charm: local:trusty/
exposed: false
units:
sca-dp-fe/0:
machine: "5"
I had been thinking I was running out of space, but checked and lxc reports the container as started. Checking the machine log for that machine shows that the machine is unable to connect to the api server due to the juju-generated CA being rejected, while all other machines obviously don't have this problem:
http://
$ apt-cache policy juju-core
juju-core:
Installed: 1.23.2-
Candidate: 1.23.2-
Destroying the environment (needing --force) and rebootstraping is currently the only way I know to get around this :/
| description: | updated |
| tags: | added: lxc |
| tags: | added: deploy |
| tags: | added: reliability repeatability |
| Wayne Witzel III (wwitzel3) wrote : | #1 |
| Changed in juju-core: | |
| assignee: | nobody → Wayne Witzel III (wwitzel3) |
| Wayne Witzel III (wwitzel3) wrote : | #2 |
After a dozen or so attempts on my own, I could not replicate this behaviour. Having access to the script you are using to create the environment would be helpful. Also is there a pattern to which machines become unavailable? Is it always number 5? I took some time to look through the LXC go, but nothing stood out in the way it created machines that might suggest the keys would be out of sync after a bulk create.
| Changed in juju-core: | |
| status: | New → Incomplete |
| Michael Nelson (michael.nelson) wrote : | #3 |
Thanks Wayne. I didn't notice whether it was always machine 5, but I did notice that it was a different service each time. I'll keep a lookout.
The mojo spec script I'm using is private, but you should have access to see it at:
lp:~canonical-sysadmins/canonical-mojo-specs/mojo-u1-software-center-agent
You can see the output of a run on PS4 (ie. not the local provider) at:
https:/
| Michael Nelson (michael.nelson) wrote : | #4 |
OK, I looked more closely at this today, and the issues is that destroy-environment isn't, in this case, destroying all the lxc's. But no feedback is given that this is the case. When I rebootstrap and redeploy, the existing unit is re-used... which probably explains the invalid juju-generated CA (from the previous deploy).
Details: http://
Let me know if there are specific logs that would be useful when destroying the env.
| summary: |
- One unit remains pending with local provider + destroy-env leaves one lxc which remains pending on redeploy |
| Curtis Hovey (sinzui) wrote : | #5 |
Does
juju destroy-env --force
clean up the bad container?
| tags: |
added: destroy-environment removed: deploy |
| tags: | added: local-provider |
| Changed in juju-core: | |
| status: | Incomplete → Triaged |
| Michael Nelson (michael.nelson) wrote : Re: [Bug 1453644] Re: destroy-env leaves one lxc which remains pending on redeploy | #6 |
On Tue, May 12, 2015 at 11:06 PM, Curtis Hovey <email address hidden> wrote:
> Does
> juju destroy-env --force
> clean up the bad container?
No - I think the paste above shows... without --force
destroy-environment fails, with --force it is leaving the one
container (I haven't tried re-running destroy-environment --force - I
assume it won't work as it's no longer bootstrapped). If there are
specific logs I can grab for you next time I destroy-
me know (I'll check anyway, of course)
| Michael Nelson (michael.nelson) wrote : | #7 |
More info from machine-0.log as I destroyed the environment this time (which also answers a previous question: No - it's not always machine-5 - this time it was 4 which wasn't destroyed after --force):
http://
As expected, re-running destroy-environment --force just complains that the environment isn't found.
$ juju destroy-environment local --force
ERROR cannot read environment info: environment "local" not found
| Changed in juju-core: | |
| status: | Triaged → In Progress |
| Changed in juju-core: | |
| assignee: | Wayne Witzel III (wwitzel3) → nobody |
| status: | In Progress → Triaged |
| assignee: | nobody → Wayne Witzel III (wwitzel3) |
| Michael Nelson (michael.nelson) wrote : | #8 |
Let me know if the In Progress -> Triaged is due to a lack of info - as I can reproduce this.
It happened again today, but even --force failed - but provided extra info:
$ juju destroy-environment local -y && juju bootstrap
ERROR failed to destroy environment "local"
If the environment is unusable, then you may run
juju destroy-environment --force
to forcefully destroy the environment. Upon doing so, review
your environment provider console for any resources that need
to be cleaned up. Using force will also by-pass destroy-envrionment block.
ERROR environment destruction failed: destroying environment: connection is shut down
$ juju destroy-environment local --force
ERROR while stopping machine agent: exec ["stop" "--system" "juju-agent-
dev-trusty# ~/configs/
$ sudo lxc-ls -f
NAME STATE IPV4 IPV6 AUTOSTART
-------
click-appstore-
click-index.trusty STOPPED - - NO
juju-precise-
juju-trusty-
michael-
mojo-u1-
mojo-u1-
sca-trusty STOPPED - - NO
| Wayne Witzel III (wwitzel3) wrote : | #9 |
I just wasn't actively working it, I have enough info to investigate. Thanks Michael.
| Changed in juju-core: | |
| status: | Triaged → In Progress |
| Changed in juju-core: | |
| status: | In Progress → Triaged |
| Changed in juju-core: | |
| importance: | Undecided → Medium |
| Changed in juju-core: | |
| status: | Triaged → Fix Released |


Are you using a script to deploy? If so, please include a copy of the script.