destroy-env leaves one lxc which remains pending on redeploy

Bug #1453644 reported by Michael Nelson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Medium
Wayne Witzel III

Bug Description

I'm repeatedly deploying a stack with 8 machines and around 50% of the time (small sample) one unit remains pending indefinitely:

$ juju status sca-dp-fe
environment: local
machines:
  "5":
    agent-state: pending
    instance-id: michael-local-machine-5
    series: trusty
    hardware: arch=amd64
services:
  sca-dp-fe:
    charm: local:trusty/apache2-0
    exposed: false
    units:
      sca-dp-fe/0:
        agent-state: allocating
        machine: "5"

I had been thinking I was running out of space, but checked and lxc reports the container as started. Checking the machine log for that machine shows that the machine is unable to connect to the api server due to the juju-generated CA being rejected, while all other machines obviously don't have this problem:

http://paste.ubuntu.com/11071952/

$ apt-cache policy juju-core
juju-core:
  Installed: 1.23.2-0ubuntu1~14.04.1~juju1
  Candidate: 1.23.2-0ubuntu1~14.04.1~juju1

Destroying the environment (needing --force) and rebootstraping is currently the only way I know to get around this :/

description: updated
Curtis Hovey (sinzui)
tags: added: lxc
tags: added: deploy
Curtis Hovey (sinzui)
tags: added: reliability repeatability
Revision history for this message
Wayne Witzel III (wwitzel3) wrote :

Are you using a script to deploy? If so, please include a copy of the script.

Changed in juju-core:
assignee: nobody → Wayne Witzel III (wwitzel3)
Revision history for this message
Wayne Witzel III (wwitzel3) wrote :

After a dozen or so attempts on my own, I could not replicate this behaviour. Having access to the script you are using to create the environment would be helpful. Also is there a pattern to which machines become unavailable? Is it always number 5? I took some time to look through the LXC go, but nothing stood out in the way it created machines that might suggest the keys would be out of sync after a bulk create.

Changed in juju-core:
status: New → Incomplete
Revision history for this message
Michael Nelson (michael.nelson) wrote :

Thanks Wayne. I didn't notice whether it was always machine 5, but I did notice that it was a different service each time. I'll keep a lookout.

The mojo spec script I'm using is private, but you should have access to see it at:

lp:~canonical-sysadmins/canonical-mojo-specs/mojo-u1-software-center-agent

You can see the output of a run on PS4 (ie. not the local provider) at:

https://ci.admin.canonical.com/job/live-u1-software-center-agent/503/console

Revision history for this message
Michael Nelson (michael.nelson) wrote :

OK, I looked more closely at this today, and the issues is that destroy-environment isn't, in this case, destroying all the lxc's. But no feedback is given that this is the case. When I rebootstrap and redeploy, the existing unit is re-used... which probably explains the invalid juju-generated CA (from the previous deploy).

Details: http://paste.ubuntu.com/11089694/

Let me know if there are specific logs that would be useful when destroying the env.

summary: - One unit remains pending with local provider
+ destroy-env leaves one lxc which remains pending on redeploy
Revision history for this message
Curtis Hovey (sinzui) wrote :

Does
    juju destroy-env --force
clean up the bad container?

tags: added: destroy-environment
removed: deploy
tags: added: local-provider
Changed in juju-core:
status: Incomplete → Triaged
Revision history for this message
Michael Nelson (michael.nelson) wrote : Re: [Bug 1453644] Re: destroy-env leaves one lxc which remains pending on redeploy

On Tue, May 12, 2015 at 11:06 PM, Curtis Hovey <email address hidden> wrote:
> Does
> juju destroy-env --force
> clean up the bad container?

No - I think the paste above shows... without --force
destroy-environment fails, with --force it is leaving the one
container (I haven't tried re-running destroy-environment --force - I
assume it won't work as it's no longer bootstrapped). If there are
specific logs I can grab for you next time I destroy-environment, let
me know (I'll check anyway, of course)

Revision history for this message
Michael Nelson (michael.nelson) wrote :

More info from machine-0.log as I destroyed the environment this time (which also answers a previous question: No - it's not always machine-5 - this time it was 4 which wasn't destroyed after --force):

http://paste.ubuntu.com/11121652/

As expected, re-running destroy-environment --force just complains that the environment isn't found.
$ juju destroy-environment local --force
ERROR cannot read environment info: environment "local" not found

Changed in juju-core:
status: Triaged → In Progress
Changed in juju-core:
assignee: Wayne Witzel III (wwitzel3) → nobody
status: In Progress → Triaged
assignee: nobody → Wayne Witzel III (wwitzel3)
Revision history for this message
Michael Nelson (michael.nelson) wrote :

Let me know if the In Progress -> Triaged is due to a lack of info - as I can reproduce this.

It happened again today, but even --force failed - but provided extra info:

$ juju destroy-environment local -y && juju bootstrap
ERROR failed to destroy environment "local"

If the environment is unusable, then you may run

    juju destroy-environment --force

to forcefully destroy the environment. Upon doing so, review
your environment provider console for any resources that need
to be cleaned up. Using force will also by-pass destroy-envrionment block.

ERROR environment destruction failed: destroying environment: connection is shut down

$ juju destroy-environment local --force
ERROR while stopping machine agent: exec ["stop" "--system" "juju-agent-michael-local"]: exit status 1 (stop: Method "Get" with signature "ss" on interface "org.freedesktop.DBus.Properties" doesn't exist)
dev-trusty# ~/configs/mojo-u1-software-center-agent
$ sudo lxc-ls -f
NAME STATE IPV4 IPV6 AUTOSTART
--------------------------------------------------------------------
click-appstore-api.trusty STOPPED - - NO
click-index.trusty STOPPED - - NO
juju-precise-lxc-template STOPPED - - NO
juju-trusty-lxc-template STOPPED - - NO
michael-local-machine-3 STOPPED - - YES
mojo-u1-clickappstore-web.trusty STOPPED - - NO
mojo-u1-software-center-agent.trusty STOPPED - - NO
sca-trusty STOPPED - - NO

Revision history for this message
Wayne Witzel III (wwitzel3) wrote :

I just wasn't actively working it, I have enough info to investigate. Thanks Michael.

Changed in juju-core:
status: Triaged → In Progress
Changed in juju-core:
status: In Progress → Triaged
Aaron Bentley (abentley)
Changed in juju-core:
importance: Undecided → Medium
Curtis Hovey (sinzui)
Changed in juju-core:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.