Destroying a service in error state fails silently

Bug #1168154 reported by Madison Scott-Clary
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
juju-core
Triaged
Low
Unassigned

Bug Description

If a service fails to deploy, it cannot be destroyed.

To reproduce:
1. Bootstrap
2. Deploy buildbot-master
3. Watch debug-log for install hook failure
4. Destroy buildbot-master

This should really be handled with a --force flag for destroy-service, as in lp:1183309 et al, but it could possibly be mitigated as described in the comments.

Roger Peppe (rogpeppe)
Changed in juju-core:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
William Reade (fwereade) wrote :

Sounds to me like "working as intended". You need to resolve the install error (`juju resolved buildbot-master/0`) to unblock the unit and allow it to complete its lifecycle. Am I missing something?

There is a case to be made that we could somewhat extend the range of situations in which unit.Destroy directly removes the unit, so that it's considered fair game up until the first moment the agent sets status -- is this what you're looking for?

I am much more concerned about the buildbot failure, though. Do you know anything about its cause?

Revision history for this message
Roger Peppe (rogpeppe) wrote :

> There is a case to be made that we could somewhat extend the
> range of situations in which unit.Destroy directly removes the unit,
> so that it's considered fair game up until the first moment the agent
> sets status -- is this what you're looking for?

That wouldn't fix this problem - the agent has set the status
but it's an error state, so presumably the uniter is refusing
to do anything until the error state is resolved.
And indeed, when I marked the unit as resolved, everything
proceeded as expected and the service disappeared.

Here's a one possibility for what happens when the uniter
sees Dying:

- If we haven't successfully run the install and start hooks,
just die.
- Run the stop hook (even if we're in an error state);
if it fails, we leave the unit in the stop hook's error state,
otherwise we die.

> I am much more concerned about the buildbot failure, though. Do you
> know anything about its cause?

it tried to do "apt-get install -y --force-yes python-shell-toolbox" which failed (error: Unable to locate package python-shell-toolbox)

Revision history for this message
Gary Poster (gary) wrote :

This is a bad user experience from the GUI's perspective.

- In general, if I say "destroy this service," I expect the system to destroy the service and handle any related issues that can be handled mechanically.
- If, for some reason, the destruction can't happen because manual intervention is actually necessary somewhere, I'd expect the "destroy" call to fail with a helpful error message explaining what I need to do to make the destroy succeed.
- If, for some reason, that can't happen either, at the very least I'd expect a hand-holding notification in the GUI: for instance, a message that had to be acknowledged that explained that there was a problem, and what I manually needed to do to resolve it. I suppose the commandline equivalent would be to add something to the juju status output, though that seems problematic in a number of ways.

Simply not finishing a partially completed destruction without any message at all is not a good approach for us.

buildbot failure: the charm expects the pyjuju ppa to be installed, which is where python-shell-toolbox is found for precise. A charm failure. Unfortunately, the charm doesn't have a maintainer any more (my squad used to be the maintainer but we have no need for it any more, and AIUI buildbot is considered to be a very low priority target ATM).

Changed in juju-core:
status: Confirmed → In Progress
status: In Progress → Confirmed
Revision history for this message
William Reade (fwereade) wrote :

Importance acknowledged; I believe this will be addressed by lp:1183309 (the gui can just set the force flag by default).

We would in the meantime be able to mitigate the effects somewhat by extending the range of situations in which a unit can be removed directly (any time before its machine agent starts) and can be set directly to Dead (any time before the "start" hook completes successfully), but I'm not sure it's worth the effort vs a proper fix.

Changed in juju-core:
status: Confirmed → Triaged
importance: High → Low
description: updated
Curtis Hovey (sinzui)
tags: added: destroy-service
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.