Cannot destroy service when install hook failed

Bug #1219902 reported by Kurt
110
This bug affects 18 people
Affects Status Importance Assigned to Milestone
juju-core
Won't Fix
High
Unassigned
1.25
Won't Fix
Undecided
Unassigned
Docs
Fix Released
High
Marco Ceppi

Bug Description

So far this has only happened when deploying multiple services to a single node (with maas). When I see that a service such as rabbitmq-server fails to deploy successfully and go to destroy the service, the command appears to return with no errors, but the service is not removed. juju status shows the service is still on the host and has a status showing "life: dying". This never goes away.

EDIT: This was also seen deploying to canonistack with one service per node.

EDIT: @deej: With juju1.14.1-precise-amd64 if a service agent is stuck in the pending state, for example due to an error in installing the charm, that service is essentially unkillable. The service agent clearly appears to know it should be dying but never actually gets to the point where it can die.

kurt@maas-cntrl:~/deployment$ juju destroy-unit rabbitmq-server/0 --debug
2013-09-02 16:09:01 DEBUG juju.environs.maas environprovider.go:27 opening environment "maas".
2013-09-02 16:09:01 DEBUG juju state.go:138 waiting for DNS name(s) of state server instances [/MAAS/api/1.0/nodes/node-9c81fa82-1182-11e3-820f-000c2969475a/]
2013-09-02 16:09:01 INFO juju open.go:69 state: opening state; mongo addresses: ["ptqkd.master:37017"]; entity ""
2013-09-02 16:09:01 INFO juju open.go:107 state: connection established
2013-09-02 16:09:01 INFO juju supercommand.go:237 command finished
kurt@maas-cntrl:~/deployment$ juju -v status
2013-09-02 16:19:24 INFO juju open.go:69 state: opening state; mongo addresses: ["ptqkd.master:37017"]; entity ""
2013-09-02 16:19:24 INFO juju open.go:107 state: connection established
machines:
  "0":
    agent-state: started
    agent-version: 1.12.0
    dns-name: ptqkd.master
    instance-id: /MAAS/api/1.0/nodes/node-9c81fa82-1182-11e3-820f-000c2969475a/
    series: precise
services:
  juju-gui:
    charm: cs:precise/juju-gui-76
    exposed: true
    units:
      juju-gui/0:
        agent-state: started
        agent-version: 1.12.0
        machine: "0"
        public-address: ptqkd.master
  mysql:
    charm: cs:precise/mysql-27
    exposed: false
    relations:
      cluster:
      - mysql
    units:
      mysql/0:
        agent-state: error
        agent-state-info: 'hook failed: "config-changed"'
        agent-version: 1.12.0
        machine: "0"
        public-address: ptqkd.master
  rabbitmq-server:
    charm: cs:precise/rabbitmq-server-14
    exposed: false
    life: dying
    units:
      rabbitmq-server/0:
        agent-state: error
        agent-state-info: 'hook failed: "install"'
        agent-version: 1.12.0
        life: dying
        machine: "0"
        public-address: ptqkd.master
2013-09-02 16:19:24 INFO juju supercommand.go:237 command finished

Revision history for this message
Curtis Hovey (sinzui) wrote :

From @Aaron's duplicate

If the initial install failed, destroy-service does not work, but does not report that it isn't working.

$ juju destroy-service jenkins
$ juju status
...
services:
  jenkins:
    charm: local:precise/jenkins-2
    exposed: true
    life: dying
    units:
      jenkins/0:
        agent-state: error
        agent-state-info: 'hook failed: "install"'
        agent-version: 1.12.0
        life: dying
        machine: "1"
        public-address: 10.55.32.32

This went on until I destroyed the environment.

Later on, I determined that I could work around the issue by doing "juju resolve" first, before "juju destroy-service".

summary: - Cannot destroy service in 1.12
+ Cannot destroy service when install hook failed
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
description: updated
Revision history for this message
Curtis Hovey (sinzui) wrote :

This might be treated as a dupe of bug 1089291.

Curtis Hovey (sinzui)
tags: added: canonical-webops-juju
description: updated
Curtis Hovey (sinzui)
tags: added: canonical-webops
removed: canonical-webops-juju
Curtis Hovey (sinzui)
tags: added: destroy-service
Revision history for this message
Jason Robinson (jaywink) wrote :

Deployed nfs charm to local and it failed to install. Tried destroy-service and now it's just "dying"

nfs:
    charm: cs:precise/nfs-3
    exposed: false
    life: dying
    units:
      nfs/0:
        agent-state: error
        agent-state-info: 'hook failed: "install"'
        agent-version: 1.16.0.1
        life: dying
        machine: "3"
        public-address: 10.0.3.115

juju --version
1.16.3-saucy-amd64

Revision history for this message
Jason Robinson (jaywink) wrote :

$ juju destroy-service nfs --debug

2013-11-15 12:47:16 DEBUG juju.environs open.go:75 ConfigForName found bootstrap config map[string]interface {}{"admin-secret":"c9755317b2afb5bb90e35b7175298d1b", "agent-version":"1.16.0", "api-port":17070, "shared-storage-port":8041, "state-port":37017, "tools-url":"", "type":"local", "bootstrap-ip":"10.0.3.1", "ca-cert":"-----BEGIN CERTIFICATE-----\nMIICWTCCAcSgAwIBAgIBADALBgkqhkiG9w0BAQUwQzENMAsGA1UEChMEanVqdTEy\nMDAGA1UEAwwpanVqdS1nZW5lcmF0ZWQgQ0EgZm9yIGVudmlyb25tZW50ICJsb2Nh\nbCIwHhcNMTMxMTE1MTAwOTI0WhcNMjMxMTE1MTAxNDI0WjBDMQ0wCwYDVQQKEwRq\ndWp1MTIwMAYDVQQDDClqdWp1LWdlbmVyYXRlZCBDQSBmb3IgZW52aXJvbm1lbnQg\nImxvY2FsIjCBnTALBgkqhkiG9w0BAQEDgY0AMIGJAoGBANj/Cz3lvet8+2JhwP3I\nRXSicHH8nMOSsU+SDPpISFIozpxvuIoQwh+Ql3lSRB5tS3xCnLrSy0jIteHk0YqH\nj3HtJ1++xGVbTUliCfNogggFKw8jxej/wuzaN4nNmdG+jqaPB/lMEJH6x6h8nCAG\nd8FH7GneyB1hsZPaSmlevnabAgMBAAGjYzBhMA4GA1UdDwEB/wQEAwIApDAPBgNV\nHRMBAf8EBTADAQH/MB0GA1UdDgQWBBSwnwwsTc2rAyj7t+wbUp7hr2m69DAfBgNV\nHSMEGDAWgBSwnwwsTc2rAyj7t+wbUp7hr2m69DALBgkqhkiG9w0BAQUDgYEAqlKr\n+PtcuN/mEa4oKcHUIJpzkFWuYkJxMHklCN25yq/JRWmzb0sqlmfnL2i8MLkLyQL2\n/QDZAOYseYKrHl1BXXQa6BuEVWnkhc9GreI/xgIOKCxvPYIW4rQui2GG4sOoVpXj\naCLfy0wFlQjVtOna6tzh0T2mkgoNygFcatM3vUE=\n-----END CERTIFICATE-----\n", "firewall-mode":"instance", "image-metadata-url":"", "name":"local", "network-bridge":"lxcbr0", "authorized-keys":"<snip>", "ca-private-key":"-----BEGIN RSA PRIVATE KEY-----\nMIICWwIBAAKBgQDY/ws95b3rfPtiYcD9yEV0onBx/JzDkrFPkgz6SEhSKM6cb7iK\nEMIfkJd5UkQebUt8Qpy60stIyLXh5NGKh49x7SdfvsRlW01JYgnzaIIIBSsPI8Xo\n/8Ls2jeJzZnRvo6mjwf5TBCR+seofJwgBnfBR+xp3sgdYbGT2kppXr52mwIDAQAB\nAoGAb+MR6NiNFN9cv12oJTMenQUk9aFLM4xv4JduGZ8rqzFfV0pD3OzHK39imyvP\n8+BnC6tNJQeLyfuzzMYvRT6gtyo507Zc/JMxyxUvkh9OG5FLtNWPy2k/vIC3tte1\nRyYviSvriRx0eK5+lVw2j+szNnrj5SJZvv/BOOhhLlcfAAECQQD3V+pSxpT/r3Fq\nylgroRgRtWYgIdgdf5kWvT8xU3Tjhly5I2eLq48t+diM3mnTvfUyzMi0Bytq8ds2\nbCf+NfQBAkEA4Jc7XBt9FXJ2dVl9jKJuQjFACPBsForMfMdyQE3i81hqHMQq5zNl\ncbf3AZcTCIUtrDJLyU7AzQzeJKPmxMu6mwJAZACek8MIQOwtHfEbfuBN+/LsjHdC\nioKpYaE4KHhGnSsY2B2xYq4FYKBQZnwSK3L07QoQ54CylDWe3L0T2lr0AQJAEPB3\nIrBUE90JQDqatJO/uzBZwxLXJDAd0j98x4rYVkBR6I+HKN8AhL46XB1X6ymYU0eL\n3cgZ1J1m196g1jEDhwJANG24lwBktoufFQOg0LKzTJYYa83nm3k+koxcuYK6x8Qx\nJhPGApAnH1BM8hHSK2lLaZsLtFf0Gg7GDx/1Edhwvg==\n-----END RSA PRIVATE KEY-----\n", "logging-config":"<root>=DEBUG", "default-series":"precise", "development":false, "root-dir":"/home/jaywink/.juju/local", "ssl-hostname-verification":true, "storage-port":8040}
2013-11-15 12:47:16 INFO juju.provider.local environprovider.go:32 opening environment "local"
2013-11-15 12:47:16 DEBUG juju state.go:160 waiting for DNS name(s) of state server instances [localhost]
2013-11-15 12:47:16 INFO juju.state open.go:68 opening state; mongo addresses: ["10.0.3.1:37017"]; entity ""
2013-11-15 12:47:16 INFO juju.state open.go:106 connection established
2013-11-15 12:47:16 INFO juju supercommand.go:286 command finished

Revision history for this message
Jason Robinson (jaywink) wrote :

Seems running;

$ juju resolved nfs/0

helped to clear situation

Revision history for this message
Jason Robinson (jaywink) wrote :

It was unknown to me that if an install hook fails, 'resolved' is needed.

A friendly hint from destroy-service when it thinks an error needs resolving first would be fantastic (think of git friendlyness) - now it just siltently does it's job but the service doesn't really die because the unit is in error.

Revision history for this message
Nicola Larosa (teknico) wrote :

Seen the same behavior, also when deploying multiple services in LXC containers to a single node with MAAS.

The "juju resolved" workaround is now documented at https://juju.ubuntu.com/docs/charms-destroy.html#caveats , and works for me.

Ryan Finnie (fo0bar)
tags: added: canonical-is
Changed in juju-core:
importance: High → Medium
Revision history for this message
Merlijn Sebrechts (merlijn-sebrechts) wrote :

I'd like to point out that the `juju resolved` workaround is not ideal for us. We deploy services that manage other clouds. When installation fails and we issue `juju resolved`, the service will start to manage the cloud environment as if nothing went wrong. This will fail in very unpredictable and possibly destructive ways.

For the moment, the only thing we can do is `juju destroy-service` and `juju destroy-machine --force`. This works when the service is deployed inside a container, but we can't use this method when the service is on a physical machine, or we'd lose the physical machine...

Destroy-service waiting for the stop hook is a very good default. Having a way to tell Juju "skip the queue and remove this service NOW" would be really helpful for us...

Changed in juju-core:
importance: Medium → High
Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Merlijn

If this failure still exists, please provide newer logs to ensure that we can address it as part of 2.0 work.

Since there is a workaround for previous versions, we will not be addressing this for 1.25.

Changed in juju-core:
status: Triaged → Incomplete
Changed in juju-core:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.