juju-core

destroy-env leaves one lxc which remains pending on redeploy

Bug #1453644 reported by Michael Nelson on 2015-05-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	juju-core	Fix Released	Medium	Wayne Witzel III

Bug Description

I'm repeatedly deploying a stack with 8 machines and around 50% of the time (small sample) one unit remains pending indefinitely:

$ juju status sca-dp-fe
environment: local
machines:
  "5":
    agent-state: pending
    instance-id: michael-local-machine-5
    series: trusty
    hardware: arch=amd64
services:
  sca-dp-fe:
    charm: local:trusty/apache2-0
    exposed: false
    units:
      sca-dp-fe/0:
        agent-state: allocating
        machine: "5"

I had been thinking I was running out of space, but checked and lxc reports the container as started. Checking the machine log for that machine shows that the machine is unable to connect to the api server due to the juju-generated CA being rejected, while all other machines obviously don't have this problem:

http://paste.ubuntu.com/11071952/

$ apt-cache policy juju-core
juju-core:
Installed: 1.23.2-0ubuntu1~14.04.1~juju1
Candidate: 1.23.2-0ubuntu1~14.04.1~juju1

Destroying the environment (needing --force) and rebootstraping is currently the only way I know to get around this :/

See original description

Tags:

Michael Nelson (michael.nelson) on 2015-05-11

description:

updated

Curtis Hovey (sinzui) on 2015-05-11

tags:	added: lxc
tags:	added: deploy

Curtis Hovey (sinzui) on 2015-05-11

tags:

added: reliability repeatability

Revision history for this message

Wayne Witzel III (wwitzel3) wrote on 2015-05-11:

Are you using a script to deploy? If so, please include a copy of the script.

Changed in juju-core:
assignee:	nobody → Wayne Witzel III (wwitzel3)

Revision history for this message

Wayne Witzel III (wwitzel3) wrote on 2015-05-11:

After a dozen or so attempts on my own, I could not replicate this behaviour. Having access to the script you are using to create the environment would be helpful. Also is there a pattern to which machines become unavailable? Is it always number 5? I took some time to look through the LXC go, but nothing stood out in the way it created machines that might suggest the keys would be out of sync after a bulk create.

Changed in juju-core:
status:	New → Incomplete

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2015-05-12:

Thanks Wayne. I didn't notice whether it was always machine 5, but I did notice that it was a different service each time. I'll keep a lookout.

The mojo spec script I'm using is private, but you should have access to see it at:

lp:~canonical-sysadmins/canonical-mojo-specs/mojo-u1-software-center-agent

You can see the output of a run on PS4 (ie. not the local provider) at:

https://ci.admin.canonical.com/job/live-u1-software-center-agent/503/console

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2015-05-12:

OK, I looked more closely at this today, and the issues is that destroy-environment isn't, in this case, destroying all the lxc's. But no feedback is given that this is the case. When I rebootstrap and redeploy, the existing unit is re-used... which probably explains the invalid juju-generated CA (from the previous deploy).

Details: http://paste.ubuntu.com/11089694/

Let me know if there are specific logs that would be useful when destroying the env.

summary:

- One unit remains pending with local provider
+ destroy-env leaves one lxc which remains pending on redeploy

Revision history for this message

Curtis Hovey (sinzui) wrote on 2015-05-12:

Does
juju destroy-env --force
clean up the bad container?

tags:	added: destroy-environment removed: deploy
tags:	added: local-provider
Changed in juju-core:
status:	Incomplete → Triaged

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2015-05-12: Re: [Bug 1453644] Re: destroy-env leaves one lxc which remains pending on redeploy

On Tue, May 12, 2015 at 11:06 PM, Curtis Hovey <email address hidden> wrote:
> Does
> juju destroy-env --force
> clean up the bad container?

No - I think the paste above shows... without --force
destroy-environment fails, with --force it is leaving the one
container (I haven't tried re-running destroy-environment --force - I
assume it won't work as it's no longer bootstrapped). If there are
specific logs I can grab for you next time I destroy-environment, let
me know (I'll check anyway, of course)

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2015-05-14:

More info from machine-0.log as I destroyed the environment this time (which also answers a previous question: No - it's not always machine-5 - this time it was 4 which wasn't destroyed after --force):

http://paste.ubuntu.com/11121652/

As expected, re-running destroy-environment --force just complains that the environment isn't found.
$ juju destroy-environment local --force
ERROR cannot read environment info: environment "local" not found

Wayne Witzel III (wwitzel3) on 2015-05-18

Changed in juju-core:
status:	Triaged → In Progress

Wayne Witzel III (wwitzel3) on 2015-05-19

Changed in juju-core:
assignee:	Wayne Witzel III (wwitzel3) → nobody
status:	In Progress → Triaged
assignee:	nobody → Wayne Witzel III (wwitzel3)

Revision history for this message

Michael Nelson (michael.nelson) wrote on 2015-05-21:

Let me know if the In Progress -> Triaged is due to a lack of info - as I can reproduce this.

It happened again today, but even --force failed - but provided extra info:

$ juju destroy-environment local -y && juju bootstrap
ERROR failed to destroy environment "local"

If the environment is unusable, then you may run

juju destroy-environment --force

to forcefully destroy the environment. Upon doing so, review
your environment provider console for any resources that need
to be cleaned up. Using force will also by-pass destroy-envrionment block.

ERROR environment destruction failed: destroying environment: connection is shut down

$ juju destroy-environment local --force
ERROR while stopping machine agent: exec ["stop" "--system" "juju-agent-michael-local"]: exit status 1 (stop: Method "Get" with signature "ss" on interface "org.freedesktop.DBus.Properties" doesn't exist)
dev-trusty# ~/configs/mojo-u1-software-center-agent
$ sudo lxc-ls -f
NAME STATE IPV4 IPV6 AUTOSTART
--------------------------------------------------------------------
click-appstore-api.trusty STOPPED - - NO
click-index.trusty STOPPED - - NO
juju-precise-lxc-template STOPPED - - NO
juju-trusty-lxc-template STOPPED - - NO
michael-local-machine-3 STOPPED - - YES
mojo-u1-clickappstore-web.trusty STOPPED - - NO
mojo-u1-software-center-agent.trusty STOPPED - - NO
sca-trusty STOPPED - - NO

Revision history for this message

Wayne Witzel III (wwitzel3) wrote on 2015-05-21:

I just wasn't actively working it, I have enough info to investigate. Thanks Michael.

Changed in juju-core:
status:	Triaged → In Progress

Wayne Witzel III (wwitzel3) on 2015-07-31

Changed in juju-core:
status:	In Progress → Triaged

Aaron Bentley (abentley) on 2015-09-09

Changed in juju-core:
importance:	Undecided → Medium

Curtis Hovey (sinzui) on 2016-04-24

Changed in juju-core:
status:	Triaged → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.