Juju agent hangs if debug-hooks session goes away

Bug #1479194 reported by Adam Israel
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Unassigned

Bug Description

Juju 1.24.3.1
Local provider, running inside a Vagrant VM

I ran into a possible race condition while debugging a charm and accidentally closing the window with the debug-hooks session, rather than logging out of the hook context. I've been able to consistently reproduce this scenario.

Summary:

If the window running a debug-hooks session is closed (without logging out of the session), the units agent state will never leave pending. Attempting to destroy the unit via `juju destroy-unit` leaves behind an environment where the unit is still running but stuck in a hook execution context.

Using `juju deployer -T` to terminate all machines leaves juju in a broken state, with the unit still running.

Running `juju destroy-environment local --force` resolves the juju state, but resets everything.

Steps to reproduce:

Open two terminal windows.

In terminal 1:

$ juju bootstrap
$ mkdir -p ~/charms/trusty
$ cd ~/charms/trusty
$ charm-create -t bash dummy
$ juju deploy --repository=$HOME/charms local:trusty/dummy

switch to terminal 2

$ juju debug-hooks dummy/0
(retry as necessary until the unit has an IP assigned and you enter the tmux session)

close the terminal 2 window; don't logout of the session -- close the window only

back in terminal one, the dummy/0 unit

`juju destroy-unit has no effect:

$ juju destroy-unit dummy/0 --debug
2015-07-29 03:39:49 INFO juju.cmd supercommand.go:37 running juju [1.24.3-trusty-amd64 gc]
2015-07-29 03:39:49 DEBUG juju.api api.go:168 trying cached API connection settings - endpoints [localhost:17070 10.0.2.15:17070 10.0.3.1:17070 172.16.250.15:17070]
2015-07-29 03:39:49 INFO juju.api api.go:280 connecting to API addresses: [localhost:17070 10.0.2.15:17070 10.0.3.1:17070 172.16.250.15:17070]
2015-07-29 03:39:49 INFO juju.api apiclient.go:331 dialing "wss://localhost:17070/environment/0b3e6af8-dd03-4747-8bd6-96b71b81ce6f/api"
2015-07-29 03:39:49 INFO juju.api apiclient.go:263 connection established to "wss://localhost:17070/environment/0b3e6af8-dd03-4747-8bd6-96b71b81ce6f/api"
2015-07-29 03:39:49 DEBUG juju.api api.go:492 API hostnames unchanged - not resolving
2015-07-29 03:39:49 DEBUG juju.api api.go:522 cacheChangedAPIInfo: serverUUID="0b3e6af8-dd03-4747-8bd6-96b71b81ce6f"
2015-07-29 03:39:49 INFO juju.cmd supercommand.go:436 command finished

$ juju status dummy/0
environment: local
machines:
  "1":
    agent-state: started
    agent-version: 1.24.3.1
    dns-name: 10.0.3.146
    instance-id: vagrant-local-machine-1
    series: trusty
    hardware: arch=amd64
services:
  dummy:
    charm: local:trusty/dummy-1
    exposed: false
    service-status:
      current: maintenance
      message: installing charm software
      since: 29 Jul 2015 03:37:54Z
    relations:
      peer-relation:
      - dummy
    units:
      dummy/0:
        workload-status:
          current: maintenance
          message: installing charm software
          since: 29 Jul 2015 03:37:54Z
        agent-status:
          current: executing
          message: running install hook
          since: 29 Jul 2015 03:37:54Z
          version: 1.24.3.1
        agent-state: pending
        agent-version: 1.24.3.1
        life: dying
        machine: "1"
        public-address: 10.0.3.146

Running `juju deployer -T` instead of `juju destroy-environment local` leaves dummy/0 running but in an error state that `juju resolved` would not resolve. It persisted a reboot; when I was able to finally destroy the environment, `juju bootstrap` failed:

$ juju bootstrap
Bootstrap failed, cleaning up the environment.
ERROR there was an issue examining the environment: cannot use 37017 as state port, already in use

In order to work around this, I had to:

$ sudo service juju-agent-vagrant-local stop
juju-agent-vagrant-local stop/waiting
$ sudo service juju-db-vagrant-local stop
juju-db-vagrant-local stop/waiting
$ sudo service juju-gui stop
stop: Unknown instance:

$ juju destroy-environment local --force
$ juju bootstrap

Alternatively, don't use `juju deployer -T`; run `juju destroy-environment local --force` instead.

At this point, the only way I've found to back out of this is to destroy the environment.

Curtis Hovey (sinzui)
tags: added: debug-hooks
tags: added: destroy-machine
Changed in juju-core:
status: New → Triaged
importance: Undecided → Medium
Changed in juju-core:
milestone: none → 1.26-alpha1
tags: added: bug-squad
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.26-alpha1 → 1.26-alpha2
Changed in juju-core:
importance: Medium → High
Curtis Hovey (sinzui)
tags: added: charmers
Changed in juju-core:
milestone: 1.26-alpha2 → 1.26-beta1
Changed in juju-core:
milestone: 1.26-beta1 → 2.0-alpha2
Changed in juju-core:
milestone: 2.0-alpha2 → 2.0-alpha3
Changed in juju-core:
milestone: 2.0-alpha3 → 2.0-beta4
Changed in juju-core:
milestone: 2.0-beta4 → 2.1.0
affects: juju-core → juju
Changed in juju:
milestone: 2.1.0 → none
milestone: none → 2.1.0
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Removing 2.1 milestone as we will not be addressing this issue in 2.1.

Changed in juju:
milestone: 2.1.0 → none
Revision history for this message
Anastasia (anastasia-macmood) wrote :

This is a fairly old report and I can no longer reproduce it with Juju 2.6-rc2.
The problem must have been fixed as a drive-by or a fix for another bug. I will amrk this as Fix Committed for 2.6-rc2 but it is possible that the problem was addressed earlier.

With Juju 2.6, even when I close console window with tmux session without properly terminating it, I can 'juju remove-unit' without any problems.

However, if 'remove-unit' is not enough, Juju 2.6 introduces '--force' option.

Changed in juju:
milestone: none → 2.6-rc2
status: Triaged → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.