service with no units stuck in lifecycle dying

Bug #1233457 reported by Kapil Thangavelu on 2013-10-01
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
juju-core
High
William Reade
1.16
Critical
William Reade
juju-core (Ubuntu)
Undecided
Unassigned
Saucy
Undecided
Unassigned

Bug Description

[Impact]
Services will no service units get stuck in 'dying' state preventing their removal from a deployment.

[Test Case]
juju deploy mysql
juju terminate-machine --force <machineid of mysql>
juju destroy-service mysql

[Regression Potential]
Part of the upstream tested 1.16.6 release. Change looks limited to impacted code path only

[Original Report]
[Report from the field, a service with no units (previously destroyed) is stuck in lifecycle dying. Per status snippet

mysql:
 charm: local:precise/mysql-309
 exposed:false
 life: dying
 relations:
     cluster:
     - mysql

Related branches

Kapil Thangavelu (hazmat) wrote :

poking at the underlying mongodb shows that the mysql service still has an extant relation and no units, per william on irc <fwereade> hazmat, to me the really critical thing is that one of those units apparently managed to leave scope without updating the relation doc's unitcount

Kapil Thangavelu (hazmat) wrote :

<fwereade> hazmat, the service was kept alive by the relation, which was kept alive by its unit count, which implied there'd be a unit to do the final leavescope and set off the dominos to take down the relation and the service

John A Meinel (jameinel) on 2013-10-03
Changed in juju-core:
importance: Undecided → High
status: New → Triaged
Curtis Hovey (sinzui) on 2013-10-12
tags: added: destroy-service

I have an export of the mongodb for this environment, if anyone needs it
for additional analysis.

On Sat, Oct 12, 2013 at 1:21 PM, Curtis Hovey <email address hidden> wrote:

> ** Tags added: destroy-service
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1233457
>
> Title:
> service with no units stuck in lifecycle dying
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1233457/+subscriptions
>

Curtis Hovey (sinzui) on 2013-10-17
tags: added: cts-cloud-review
removed: cts
Changed in juju-core:
milestone: none → 1.17.0
Curtis Hovey (sinzui) wrote :

This issue relates to bug 1205451. In this case, the machine terminated before the state server could tell the agent that it is dead. In the other bug, the machine terminated for other reasons. In both cases, The state-server does not recognise that the agent and machine are gone, so it only needs to remove the record of the agent.

On Friday, October 25, 2013, Curtis Hovey wrote:

> This issue relates to bug 1205451. In this case, the machine terminated
> before the state server could tell the agent that it is dead. In the
> other bug, the machine terminated for other reasons. In both cases, The
> state-server does not recognise that the agent and machine are gone, so
> it only needs to remove the record of the agent.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1233457
>
> Title:
> service with no units stuck in lifecycle dying
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1233457/+subscriptions
>

William Reade (fwereade) wrote :

I don't *think* it's related to lp:1205451 -- according to the transaction log captured by hazmat, mysql/0 never actually tried to leave relation scope for that relation... but *did* otherwise shut down cleanly. This points to a bug in the Uniter itself; still investigating.

Curtis Hovey (sinzui) on 2013-11-06
tags: added: state-server
William Reade (fwereade) wrote :

Root cause remains undetermined, but we can still ensure units are not removed without leaving all their relation scopes. Fix in progress.

Changed in juju-core:
assignee: nobody → William Reade (fwereade)
Curtis Hovey (sinzui) on 2013-11-07
Changed in juju-core:
status: Triaged → In Progress
William Reade (fwereade) on 2013-11-07
Changed in juju-core:
milestone: 1.17.0 → 2.0
Mark Ramm (mark-ramm) on 2013-11-08
Changed in juju-core:
importance: High → Critical
milestone: 2.0 → 1.17.0
Tim Penhey (thumper) on 2013-11-10
Changed in juju-core:
status: In Progress → Fix Committed
William Reade (fwereade) on 2013-11-13
Changed in juju-core:
milestone: 1.17.0 → 2.0
status: Fix Committed → In Progress
William Reade (fwereade) on 2013-11-14
Changed in juju-core:
milestone: 2.0 → 1.17.0
status: In Progress → Fix Committed
Curtis Hovey (sinzui) on 2013-11-20
Changed in juju-core:
importance: Critical → High
Curtis Hovey (sinzui) on 2013-12-20
Changed in juju-core:
status: Fix Committed → Fix Released
James Page (james-page) on 2014-02-07
Changed in juju-core (Ubuntu):
status: New → Fix Released
James Page (james-page) on 2014-03-25
description: updated
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in juju-core (Ubuntu Saucy):
status: New → Confirmed
Rolf Leggewie (r0lf) wrote :

saucy has seen the end of its life and is no longer receiving any updates. Marking the saucy task for this ticket as "Won't Fix".

Changed in juju-core (Ubuntu Saucy):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers