remove-unit and destroy-service don't work when agent state is error

Bug #1173224 reported by Andreas Hasenack
This bug report is a duplicate of:  Bug #1089289: destroy-unit --force. Edit Remove
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
New
Undecided
Unassigned

Bug Description

In this scenario, the unit has the agent-state in error:
$ juju status
machines:
  "0":
    agent-state: started
    agent-version: 1.10.0.1
    dns-name: 10.55.63.219
    instance-id: c05e2388-fceb-4910-8a0a-d2d06574fb1a
    series: precise
  "2":
    agent-state: started
    agent-version: 1.10.0.1
    dns-name: 10.55.63.220
    instance-id: 44095fed-e1b5-4e90-b208-3c81594428e8
    series: precise
services:
  cinder:
    charm: local:precise/cinder-26
    exposed: false
    relations:
      cluster:
      - cinder
    units:
      cinder/0:
        agent-state: error
        agent-state-info: 'hook failed: "install"'
        agent-version: 1.10.0.1
        machine: "2"
        public-address: 10.55.63.220

I then alternatively try to destroy the service and remove the unit, but it doesn't work. Nor can I terminate the machine:
$ juju destroy-service cinder

$ juju terminate-machine 2
error: no machines were destroyed: machine 2 has unit "cinder/0" assigned

$ juju remove-unit cinder/0

$ juju status
machines:
  "0":
    agent-state: started
    agent-version: 1.10.0.1
    dns-name: 10.55.63.219
    instance-id: c05e2388-fceb-4910-8a0a-d2d06574fb1a
    series: precise
  "2":
    agent-state: started
    agent-version: 1.10.0.1
    dns-name: 10.55.63.220
    instance-id: 44095fed-e1b5-4e90-b208-3c81594428e8
    series: precise
services:
  cinder:
    charm: local:precise/cinder-26
    exposed: false
    life: dying
    units:
      cinder/0:
        agent-state: error
        agent-state-info: 'hook failed: "install"'
        agent-version: 1.10.0.1
        life: dying
        machine: "2"
        public-address: 10.55.63.220

As far as I can see, the cinder service and the cinder/0 unit are stuck.

Revision history for this message
William Reade (fwereade) wrote :

The current behaviour is in fact as intended; error states are intended to prevent a unit from doing anything until a human has solved the problem that juju considers intractable. This is done with `juju resolved`, which indicates to juju that you have yourself completed the task that juju failed to do. This would of course be a bare-faced lie, and wouldn't help the next hook's chances of success much, but by repeatedly resolving errors without looking you can assist a dying unit to its eventual suicide.

In this specific case -- a failure on install -- I think it would be reasonable for the unit to be removed directly when it was destroyed; and it is probably reasonable to do so at any point up to the successful completion of the start hook; but once that's run, we really ought to be running a stop hook before shutting the unit down. And once it's joined relations the question is harder still; so we err on the side of safety, and ask for interventions whenever we're unsure. So I have two proposals to address the near and far terms:

1) destroy-unit on a unit that has not run its "start" hook should remove the unit directly regardless of error state.

2) destroy-unit --force on a unit that has run its "start" hook should cause it to run all hooks necessary for it to disengage, but to ignore error states and continue blindly on through "stop" to death.

Would either, or both, address your needs?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Both options sound good to me. I think you mean remove-unit, though, or are you talking about a new command called destroy-unit?

Revision history for this message
William Reade (fwereade) wrote :

I actually mean destroy-unit, which is the new canonical name that is aliased by remove-unit. (1) and (2) have been split out into lp:1176740 and (pre-existing) lp:1089289 (with further notes to reflect this). Marking duplicate of the latter.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.