Juju agent hangs if debug-hooks session goes away

Bug #1479194 reported by Adam Israel on 2015-07-29
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju
High
Unassigned

Bug Description

Juju 1.24.3.1
Local provider, running inside a Vagrant VM

I ran into a possible race condition while debugging a charm and accidentally closing the window with the debug-hooks session, rather than logging out of the hook context. I've been able to consistently reproduce this scenario.

Summary:

If the window running a debug-hooks session is closed (without logging out of the session), the units agent state will never leave pending. Attempting to destroy the unit via `juju destroy-unit` leaves behind an environment where the unit is still running but stuck in a hook execution context.

Using `juju deployer -T` to terminate all machines leaves juju in a broken state, with the unit still running.

Running `juju destroy-environment local --force` resolves the juju state, but resets everything.

Steps to reproduce:

Open two terminal windows.

In terminal 1:

$ juju bootstrap
$ mkdir -p ~/charms/trusty
$ cd ~/charms/trusty
$ charm-create -t bash dummy
$ juju deploy --repository=$HOME/charms local:trusty/dummy

switch to terminal 2

$ juju debug-hooks dummy/0
(retry as necessary until the unit has an IP assigned and you enter the tmux session)

close the terminal 2 window; don't logout of the session -- close the window only

back in terminal one, the dummy/0 unit

`juju destroy-unit has no effect:

$ juju destroy-unit dummy/0 --debug
2015-07-29 03:39:49 INFO juju.cmd supercommand.go:37 running juju [1.24.3-trusty-amd64 gc]
2015-07-29 03:39:49 DEBUG juju.api api.go:168 trying cached API connection settings - endpoints [localhost:17070 10.0.2.15:17070 10.0.3.1:17070 172.16.250.15:17070]
2015-07-29 03:39:49 INFO juju.api api.go:280 connecting to API addresses: [localhost:17070 10.0.2.15:17070 10.0.3.1:17070 172.16.250.15:17070]
2015-07-29 03:39:49 INFO juju.api apiclient.go:331 dialing "wss://localhost:17070/environment/0b3e6af8-dd03-4747-8bd6-96b71b81ce6f/api"
2015-07-29 03:39:49 INFO juju.api apiclient.go:263 connection established to "wss://localhost:17070/environment/0b3e6af8-dd03-4747-8bd6-96b71b81ce6f/api"
2015-07-29 03:39:49 DEBUG juju.api api.go:492 API hostnames unchanged - not resolving
2015-07-29 03:39:49 DEBUG juju.api api.go:522 cacheChangedAPIInfo: serverUUID="0b3e6af8-dd03-4747-8bd6-96b71b81ce6f"
2015-07-29 03:39:49 INFO juju.cmd supercommand.go:436 command finished

$ juju status dummy/0
environment: local
machines:
  "1":
    agent-state: started
    agent-version: 1.24.3.1
    dns-name: 10.0.3.146
    instance-id: vagrant-local-machine-1
    series: trusty
    hardware: arch=amd64
services:
  dummy:
    charm: local:trusty/dummy-1
    exposed: false
    service-status:
      current: maintenance
      message: installing charm software
      since: 29 Jul 2015 03:37:54Z
    relations:
      peer-relation:
      - dummy
    units:
      dummy/0:
        workload-status:
          current: maintenance
          message: installing charm software
          since: 29 Jul 2015 03:37:54Z
        agent-status:
          current: executing
          message: running install hook
          since: 29 Jul 2015 03:37:54Z
          version: 1.24.3.1
        agent-state: pending
        agent-version: 1.24.3.1
        life: dying
        machine: "1"
        public-address: 10.0.3.146

Running `juju deployer -T` instead of `juju destroy-environment local` leaves dummy/0 running but in an error state that `juju resolved` would not resolve. It persisted a reboot; when I was able to finally destroy the environment, `juju bootstrap` failed:

$ juju bootstrap
Bootstrap failed, cleaning up the environment.
ERROR there was an issue examining the environment: cannot use 37017 as state port, already in use

In order to work around this, I had to:

$ sudo service juju-agent-vagrant-local stop
juju-agent-vagrant-local stop/waiting
$ sudo service juju-db-vagrant-local stop
juju-db-vagrant-local stop/waiting
$ sudo service juju-gui stop
stop: Unknown instance:

$ juju destroy-environment local --force
$ juju bootstrap

Alternatively, don't use `juju deployer -T`; run `juju destroy-environment local --force` instead.

At this point, the only way I've found to back out of this is to destroy the environment.

Curtis Hovey (sinzui) on 2015-07-29
tags: added: debug-hooks
tags: added: destroy-machine
Changed in juju-core:
status: New → Triaged
importance: Undecided → Medium
Changed in juju-core:
milestone: none → 1.26-alpha1
tags: added: bug-squad
Curtis Hovey (sinzui) on 2015-11-03
Changed in juju-core:
milestone: 1.26-alpha1 → 1.26-alpha2
Changed in juju-core:
importance: Medium → High
Curtis Hovey (sinzui) on 2015-11-03
tags: added: charmers
Changed in juju-core:
milestone: 1.26-alpha2 → 1.26-beta1
Changed in juju-core:
milestone: 1.26-beta1 → 2.0-alpha2
Changed in juju-core:
milestone: 2.0-alpha2 → 2.0-alpha3
Changed in juju-core:
milestone: 2.0-alpha3 → 2.0-beta4
Changed in juju-core:
milestone: 2.0-beta4 → 2.1.0
affects: juju-core → juju
Changed in juju:
milestone: 2.1.0 → none
milestone: none → 2.1.0
Anastasia (anastasia-macmood) wrote :

Removing 2.1 milestone as we will not be addressing this issue in 2.1.

Changed in juju:
milestone: 2.1.0 → none
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers