juju-core

remove-unit and destroy-service don't work when agent state is error

Bug #1173224 reported by Andreas Hasenack on 2013-04-26

This bug report is a duplicate of: Bug #1089289: destroy-unit --force. Edit Remove

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	juju-core	New	Undecided	Unassigned

Bug Description

In this scenario, the unit has the agent-state in error:
$ juju status
machines:
  "0":
    agent-state: started
    agent-version: 1.10.0.1
    dns-name: 10.55.63.219
    instance-id: c05e2388-fceb-4910-8a0a-d2d06574fb1a
    series: precise
  "2":
    agent-state: started
    agent-version: 1.10.0.1
    dns-name: 10.55.63.220
    instance-id: 44095fed-e1b5-4e90-b208-3c81594428e8
    series: precise
services:
  cinder:
    charm: local:precise/cinder-26
    exposed: false
    relations:
      cluster:
      - cinder
    units:
      cinder/0:
        agent-state: error
        agent-state-info: 'hook failed: "install"'
        agent-version: 1.10.0.1
        machine: "2"
        public-address: 10.55.63.220

I then alternatively try to destroy the service and remove the unit, but it doesn't work. Nor can I terminate the machine:
$ juju destroy-service cinder

$ juju terminate-machine 2
error: no machines were destroyed: machine 2 has unit "cinder/0" assigned

$ juju remove-unit cinder/0

$ juju status
machines:
  "0":
    agent-state: started
    agent-version: 1.10.0.1
    dns-name: 10.55.63.219
    instance-id: c05e2388-fceb-4910-8a0a-d2d06574fb1a
    series: precise
  "2":
    agent-state: started
    agent-version: 1.10.0.1
    dns-name: 10.55.63.220
    instance-id: 44095fed-e1b5-4e90-b208-3c81594428e8
    series: precise
services:
  cinder:
    charm: local:precise/cinder-26
    exposed: false
    life: dying
    units:
      cinder/0:
        agent-state: error
        agent-state-info: 'hook failed: "install"'
        agent-version: 1.10.0.1
        life: dying
        machine: "2"
        public-address: 10.55.63.220

As far as I can see, the cinder service and the cinder/0 unit are stuck.

Revision history for this message

William Reade (fwereade) wrote on 2013-04-28:

The current behaviour is in fact as intended; error states are intended to prevent a unit from doing anything until a human has solved the problem that juju considers intractable. This is done with `juju resolved`, which indicates to juju that you have yourself completed the task that juju failed to do. This would of course be a bare-faced lie, and wouldn't help the next hook's chances of success much, but by repeatedly resolving errors without looking you can assist a dying unit to its eventual suicide.

In this specific case -- a failure on install -- I think it would be reasonable for the unit to be removed directly when it was destroyed; and it is probably reasonable to do so at any point up to the successful completion of the start hook; but once that's run, we really ought to be running a stop hook before shutting the unit down. And once it's joined relations the question is harder still; so we err on the side of safety, and ask for interventions whenever we're unsure. So I have two proposals to address the near and far terms:

1) destroy-unit on a unit that has not run its "start" hook should remove the unit directly regardless of error state.

2) destroy-unit --force on a unit that has run its "start" hook should cause it to run all hooks necessary for it to disengage, but to ignore error states and continue blindly on through "stop" to death.

Would either, or both, address your needs?

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2013-04-29:

Both options sound good to me. I think you mean remove-unit, though, or are you talking about a new command called destroy-unit?

Revision history for this message

William Reade (fwereade) wrote on 2013-05-06:

I actually mean destroy-unit, which is the new canonical name that is aliased by remove-unit. (1) and (2) have been split out into lp:1176740 and (pre-existing) lp:1089289 (with further notes to reflect this). Marking duplicate of the latter.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1089289 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.