Juju agent hangs if debug-hooks session goes away
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
High
|
Unassigned |
Bug Description
Juju 1.24.3.1
Local provider, running inside a Vagrant VM
I ran into a possible race condition while debugging a charm and accidentally closing the window with the debug-hooks session, rather than logging out of the hook context. I've been able to consistently reproduce this scenario.
Summary:
If the window running a debug-hooks session is closed (without logging out of the session), the units agent state will never leave pending. Attempting to destroy the unit via `juju destroy-unit` leaves behind an environment where the unit is still running but stuck in a hook execution context.
Using `juju deployer -T` to terminate all machines leaves juju in a broken state, with the unit still running.
Running `juju destroy-environment local --force` resolves the juju state, but resets everything.
Steps to reproduce:
Open two terminal windows.
In terminal 1:
$ juju bootstrap
$ mkdir -p ~/charms/trusty
$ cd ~/charms/trusty
$ charm-create -t bash dummy
$ juju deploy --repository=
switch to terminal 2
$ juju debug-hooks dummy/0
(retry as necessary until the unit has an IP assigned and you enter the tmux session)
close the terminal 2 window; don't logout of the session -- close the window only
back in terminal one, the dummy/0 unit
`juju destroy-unit has no effect:
$ juju destroy-unit dummy/0 --debug
2015-07-29 03:39:49 INFO juju.cmd supercommand.go:37 running juju [1.24.3-
2015-07-29 03:39:49 DEBUG juju.api api.go:168 trying cached API connection settings - endpoints [localhost:17070 10.0.2.15:17070 10.0.3.1:17070 172.16.
2015-07-29 03:39:49 INFO juju.api api.go:280 connecting to API addresses: [localhost:17070 10.0.2.15:17070 10.0.3.1:17070 172.16.
2015-07-29 03:39:49 INFO juju.api apiclient.go:331 dialing "wss://
2015-07-29 03:39:49 INFO juju.api apiclient.go:263 connection established to "wss://
2015-07-29 03:39:49 DEBUG juju.api api.go:492 API hostnames unchanged - not resolving
2015-07-29 03:39:49 DEBUG juju.api api.go:522 cacheChangedAPI
2015-07-29 03:39:49 INFO juju.cmd supercommand.go:436 command finished
$ juju status dummy/0
environment: local
machines:
"1":
agent-state: started
agent-version: 1.24.3.1
dns-name: 10.0.3.146
instance-id: vagrant-
series: trusty
hardware: arch=amd64
services:
dummy:
charm: local:trusty/
exposed: false
service-status:
current: maintenance
message: installing charm software
since: 29 Jul 2015 03:37:54Z
relations:
peer-
- dummy
units:
dummy/0:
current: maintenance
message: installing charm software
since: 29 Jul 2015 03:37:54Z
current: executing
message: running install hook
since: 29 Jul 2015 03:37:54Z
version: 1.24.3.1
life: dying
machine: "1"
Running `juju deployer -T` instead of `juju destroy-environment local` leaves dummy/0 running but in an error state that `juju resolved` would not resolve. It persisted a reboot; when I was able to finally destroy the environment, `juju bootstrap` failed:
$ juju bootstrap
Bootstrap failed, cleaning up the environment.
ERROR there was an issue examining the environment: cannot use 37017 as state port, already in use
In order to work around this, I had to:
$ sudo service juju-agent-
juju-agent-
$ sudo service juju-db-
juju-db-
$ sudo service juju-gui stop
stop: Unknown instance:
$ juju destroy-environment local --force
$ juju bootstrap
Alternatively, don't use `juju deployer -T`; run `juju destroy-environment local --force` instead.
At this point, the only way I've found to back out of this is to destroy the environment.
tags: | added: debug-hooks |
tags: | added: destroy-machine |
Changed in juju-core: | |
status: | New → Triaged |
importance: | Undecided → Medium |
Changed in juju-core: | |
milestone: | none → 1.26-alpha1 |
tags: | added: bug-squad |
Changed in juju-core: | |
milestone: | 1.26-alpha1 → 1.26-alpha2 |
Changed in juju-core: | |
importance: | Medium → High |
tags: | added: charmers |
Changed in juju-core: | |
milestone: | 1.26-alpha2 → 1.26-beta1 |
Changed in juju-core: | |
milestone: | 1.26-beta1 → 2.0-alpha2 |
Changed in juju-core: | |
milestone: | 2.0-alpha2 → 2.0-alpha3 |
Changed in juju-core: | |
milestone: | 2.0-alpha3 → 2.0-beta4 |
Changed in juju-core: | |
milestone: | 2.0-beta4 → 2.1.0 |
affects: | juju-core → juju |
Changed in juju: | |
milestone: | 2.1.0 → none |
milestone: | none → 2.1.0 |
Changed in juju: | |
status: | Fix Committed → Fix Released |
Removing 2.1 milestone as we will not be addressing this issue in 2.1.