timeouts when destroying models

Bug #1884557 reported by Tom Haddon
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Confirmed
Low
Unassigned

Bug Description

Since upgrading our CI controller to 2.8.0 we're seeing lots of models stuck in "destroying" state:

Destroying Model charm-testing-cassandra-bionic-137
Destroying model
Waiting for model to be removed, 3 machine(s), 2 application(s).................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
......................................................................ERROR timeout after 30m0s timeout

The models appear to have no machines, cores or units associated with them any more, but are still marked as "destroying". Interestingly, when re-running "juju destroy-model" on such a model, it begins with "Waiting for model to be removed, 3 machine(s), 2 application(s)" again, even though the output from "juju models" shows 0 machines, cores and units.

The client is 2.8.0-xenial-amd64.

Related branches

Revision history for this message
Tom Haddon (mthaddon) wrote :

It seems to work okay with --force, fwiw:

$ juju destroy-model --force charm-testing-cassandra-bionic-137
WARNING! This command will destroy the "charm-testing-cassandra-bionic-137" model.
This includes all machines, applications, data and other resources.

Continue [y/N]? y
Destroying model
Waiting for model to be removed, 3 machine(s), 2 application(s).....
Waiting for model to be removed, 3 machine(s).....
Waiting for model to be removed, 2 machine(s).....
Waiting for model to be removed, 1 machine(s).....
Waiting for model to be removed....
Model destroyed.

Revision history for this message
Pen Gale (pengale) wrote :

@mthaddon: do you have a crashdump from one of the failed teardowns? Does it always happen in a model containing cassandra related charms?

Changed in juju:
status: New → Incomplete
Revision history for this message
Tom Haddon (mthaddon) wrote :

It doesn't just happen with models containing cassandra related charms, it happens with pretty much every model attached to this controller.

I've uploaded a crashdump to somewhere you have access to, and will let you know outside of this bug in case it contains sensitive information.

Changed in juju:
status: Incomplete → New
Revision history for this message
Ian Booth (wallyworld) wrote :

When this happens, what we really need is the output of juju dump-model (both of the model being destroyed and the controller model, with any secrets redacted).
That will show what is stuck in the dying state and hence holding up graceful removal of the model.
The controller logs around the time of destroy would also help.

Can we get an example of that info to enable us to work on diagnosing what is happening? Using --force bypasses Juju's expectation that stuff will shutdown cleanly and it eventually just removes model entities regardless.

There's usually 2 root causes - storage not being detached / volumes removed, or unis not leaving scope because relation departed/broken hooks are not completed successfully.

tags: added: destroy-model
Revision history for this message
Haw Loeung (hloeung) wrote :

For cassandra models, it's units not leaving scope because hooks are not completed successfully. See:

| https://paste.ubuntu.com/p/nYHnGpX4sS/

Changed in juju:
status: New → Confirmed
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: Undecided → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.