ERROR: model has been removed

Bug #1721786 reported by Cory Johns
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Anastasia

Bug Description

On the edge snap of Juju, with the LXD provider, when adding and removing models in parallel, the controller seems to get into a bad state where you cannot list models or even destroy the controller because of an error stating that a particular model "has been removed"

Example: https://pastebin.ubuntu.com/25685991/

Revision history for this message
Cory Johns (johnsca) wrote :

I started seeing this when running the integration tests in python-libjuju (tox -e integration), which runs a bunch of tests in parallel, most of which create their own unique model to run in which they remove when complete. One of the tests tests ControllerFacade.AllModels which sometimes returns this error. But we've also seen all tests pass (including that one) and it show up during tear down of the lxd controller. I also hit an instance where I ran the tests on a local lxd controller and the error persisted after the tests and through multiple interactions via the CLI, as shown in the pastebin, so it doesn't seem strictly like a timing issue.

tags: added: python-libjuju
Revision history for this message
Cory Johns (johnsca) wrote :

Here's my snap info:

name: juju
summary: "juju client"
publisher: canonical
contact: http://jujucharms.com
description: |
  Through the use of charms, juju provides you with shareable, re-usable, and
  repeatable expressions of devops best practices.
snap-id: e2CPHpB1fUxcKtCyJTsm5t3hN9axJ0yj
commands:
  - juju
tracking: edge
installed: 2.3-beta2+develop-ab12669 (2645) 52MB classic
refreshed: 2017-10-05 20:26:21 -0400 EDT
channels:
  stable: 2.2.4 (2529) 25MB classic
  candidate: 2.2.5+2.2-c1f373c (2623) 43MB classic
  beta: 2.3-beta1 (2640) 29MB classic
  edge: 2.3-beta2+develop-ab12669 (2645) 52MB classic

Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.3-beta2
importance: Undecided → High
status: New → Triaged
Ian Booth (wallyworld)
Changed in juju:
importance: High → Critical
Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Cory Johns (johnsca),
Do you have any logs that you could add to this report?

Changed in juju:
assignee: nobody → Anastasia (anastasia-macmood)
status: Triaged → In Progress
status: In Progress → Incomplete
Revision history for this message
Anastasia (anastasia-macmood) wrote :

The actual error is coming from https://github.com/juju/juju/blob/develop/state/pool.go#L71.

I am looking into figuring out how did we end up here...

Revision history for this message
Anastasia (anastasia-macmood) wrote :
Changed in juju:
assignee: Anastasia (anastasia-macmood) → nobody
Revision history for this message
Cory Johns (johnsca) wrote :

Here's a controller log with debug enabled where I reproduced the issue: http://paste.ubuntu.com/25720263/

I can reproduce it pretty much 100% of the time (only tested on LXD) using the branch for this PR and running `tox -e integration`: https://github.com/juju/python-libjuju/pull/171

Changed in juju:
status: Incomplete → Triaged
Changed in juju:
assignee: nobody → Anastasia (anastasia-macmood)
Revision history for this message
Anastasia (anastasia-macmood) wrote :

After further investigation, I can see that we definitely can end up in a twist.

It looks like we may not release a connection to a model in a timely manner after the model has been destroyed (pool item flagged as removed).

I have seen similar errors from other API calls and in the workers' code. It can actually be fairly frequent, depending on the model and how saturated it was - deployed charms, users, etc.
This is just a small sample of places that I have extracted both from the log output above as well as from my own testing:

* "model-upgrader" [DEBUG juju.worker.dependency engine.go:553 stack trace:
model XXX has been removed
github.com/juju/juju/worker/modelupgrader/worker.go:86:
github.com/juju/juju/worker/modelupgrader/manifold.go:66:]

* Controller[""].AllModels

* ModelManager[""].ModelStatus

* ModelManager[""].ListModels

I will work on tracing where we are not releasing model reference.
I will also investigate all callers of the pool.Get() and pool.GetModel() to consider how they should handle NotFound error.

Changed in juju:
status: Triaged → In Progress
Revision history for this message
Anastasia (anastasia-macmood) wrote :
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → In Progress
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Part 2 - model destruction and backup: https://github.com/juju/juju/pull/7956

Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.