juju controller stuck in infinite loop during teardown - "lease manager stopped" errors

Bug #1591387 reported by Charles Butler
72
This bug affects 9 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
William Reade

Bug Description

Juju version: 2.0-beta8-xenial-amd64
cloud: amazon

Performed: juju destroy-controller amazon -y --destroy-all-models

What I expected: The controller to terminate

What Happened: inifinite loop of attempted teardown.

Destroying controller "local.amazon"
Waiting for resources to be reclaimed
Waiting on 1 model, 1 machine, 4 services
Waiting on 1 model, 1 machine, 4 services
... indefinitely

Associated logs of the issue from model: 'controller', machine: 0 have been attached.

I'm unsure of how to 'unstick' this aside from purging the controller configuration from the $JUJU_DIR and terminating the instances from my cloud control panel from this state.

Revision history for this message
Charles Butler (lazypower) wrote :
Revision history for this message
Charles Butler (lazypower) wrote :
Revision history for this message
Cheryl Jennings (cherylj) wrote :

See the first comment in bug #1566426 for how to force down a controller.

I see a lot of "lease manager stopped" errors in the log, which I thought was fixed in beta8 via bug #1573136. Sounds like it's not :(

Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.0-beta10
summary: - juju controller stuck in infinite loop during teardown
+ juju controller stuck in infinite loop during teardown - "lease manager
+ stopped" errors
Revision history for this message
William Reade (fwereade) wrote :

A lot of the error spam is addressed in http://reviews.vapour.ws/r/5151/ -- which does fix a real issue, that a removed model's engine will not stop running -- but that doesn't help with the wedged cleanup. Still looking.

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta10 → 2.0-beta11
Revision history for this message
William Reade (fwereade) wrote :

(alexis has useful repros *only* destroying a single model, and seeing the model "removed" but still holding resources; that narrows it down significantly, so it's almost certainly a state-only bug.)

William Reade (fwereade)
Changed in juju-core:
assignee: nobody → William Reade (fwereade)
status: Triaged → In Progress
tags: added: 2.0 destroy-controller
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta11 → 2.0-beta12
Revision history for this message
William Reade (fwereade) wrote :

Single-model case appears to be distinct (models *are* cleaned up, but the CLI command doesn't wait for them -- is this another bug?).

The only repro I observed live had a dying controller, and an untouched hosted model which *could* be cleaned up with an explicit destroy-model -- doing so unblocked destroy-controller. That strongly implies that the state cleanups aren't being run; and it turns out that the state-cleaner worker actually doesn't run *all* the time a developer using the feature in state might expect.

That is addressed in http://reviews.vapour.ws/r/5200/ -- with which I haven't managed to repro, but I've also had a hard time triggering it myself beforehand. So, while it's a move in a sensible direction, there's only very weak and circumstantial evidence for it being an actual fix.

Would very much appreciate testing from someone who sees this bug regularly.

Changed in juju-core:
importance: High → Critical
Changed in juju-core:
status: In Progress → Fix Committed
Revision history for this message
Suchitra Venugopal (suchvenu) wrote :

Hi,
 I am seeing lot of similar messages in the debug-log continuously, even though the charm gets deployed properly.

machine-0: 2016-07-13 09:57:07 ERROR juju.worker.dependency engine.go:526 "log-forwarder" manifold worker returned unexpected error: creating log forwarding orchestrator: model-level log forwarding not supported
unit-ibm-db2-0: 2016-07-13 09:57:08 ERROR juju.worker.dependency engine.go:526 "leadership-tracker" manifold worker returned unexpected error: leadership failure: lease manager stopped

Thanks & Regards,
 Suchitra Venugopal

Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
affects: juju-core → juju
Changed in juju:
milestone: 2.0-beta12 → none
milestone: none → 2.0-beta12
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.