dead model not removed, triggering issues with duplicate names

Bug #1740815 reported by Paul Gear
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Expired
High
Unassigned
2.3
Won't Fix
High
Unassigned

Bug Description

'juju status -m modelname --format=json' occasionally reports the message:

ERROR model "username/modelname" has been removed from the controller, run 'juju models' and switch to one of them. There are 999 accessible models on controller "controllername".

This occurs when running 'juju models', finding all models with valid life and state, then running 'juju status' on each of them per above. Running the command again shortly afterwards works fine.

This is occurring on 2.2.6 client with 2.2.8 agents, but was also seen before the upgrade to 2.2.8.

Andrew Wilkins (axwalk)
Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.4-beta1
Paul Gear (paulgear)
description: updated
Paul Gear (paulgear)
tags: added: canonical-is
Revision history for this message
Andrew Wilkins (axwalk) wrote :

As far as I can tell, the only way we would get that error message is if:

(1) the model has been removed, or
(2) there was an error on the backend accessing the database for that model

Based on the bug description, we can rule out (1), because you state that a later call to "juju status" for the same model succeeds.

So (2) seems likely, but we would need much more detailed information to diagnose the error. That, and we're not being specific enough in our error checking, and considering any error to fetch model data from mongo as a "model not found" error: https://github.com/juju/juju/pull/8252.

For more detailed info, you can use TRACE level logging on the client side for the "juju status" commands:

    juju status --logging-config='<root>=TRACE' --show-log -m admin/foo

Changed in juju:
status: Triaged → Incomplete
assignee: nobody → Andrew Wilkins (axwalk)
Revision history for this message
Paul Gear (paulgear) wrote :

We have a 2nd model on the same controller exhibiting similar problems, but with a different message:

ERROR cannot determine model status: cannot obtain current model config: settings not found (not found)

juju list-models output & corresponding juju status error at https://pastebin.canonical.com/206539/

Revision history for this message
Paul Gear (paulgear) wrote :

@axwalk: Trace logs for juju status on both models uploaded to https://private-fileshare.canonical.com/~paulgear/lp1740815/ The first one is called tracelog.txt.gz, the 2nd is tracelog-settings-not-found.txt.gz. Please let us know if you want a separate bug about the 2nd symptom. It feels like the same problem, though, because it's intermittent.

Changed in juju:
status: Incomplete → New
Revision history for this message
Stuart Bishop (stub) wrote :

I just tripped over the same model on the production controller.

The AllModels API call returns this Model data, two models with the same name and owner but different UUIDs:

{Name:qa1-staging UUID:9e20b702-8762-4b46-81a8-0adaf5923ebc OwnerTag:user-stg-certification-django-lts}
{Name:qa1-staging UUID:3dcdb3d9-3d74-4684-86c9-3ecd9e52958e OwnerTag:user-stg-certification-django-lts}

Requesting a API DumpModels (simplified) for 3dcdb3d9-3d74-4684-86c9-3ecd9e52958e fails with the (probably broken) error message:

status for model: status data for e not found

'juju list-models' shows the qa1-staging model with UUID 9e20b702-8762-4b46-81a8-0adaf5923ebc. It does not list 3dcdb3d9-3d74-4684-86c9-3ecd9e52958e.

Perhaps intermittent means the tool is flipping between using a valid UUID and invalid UUID?

Revision history for this message
Joel Sing (jsing) wrote :

The problem appears to be as follows - the models collection definitely contains two models that have the same controller, same owner and same name. One of these has a life of 2 (dead), whereas the other has a life of 0 (alive):

https://pastebin.canonical.com/206592/

Having duplicate controller/owner/name seems less than ideal at any point in time, however if this is permitted then it seems various things are not filtering on life (which they would presumably need to do to avoid duplicate names appearing outside of the database).

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Thanks for the debugging everyone. I agree, having two models with the same owner and name seems very wrong. I thought we disallowed that - I'll look into it.

Revision history for this message
Tim Penhey (thumper) wrote :

We do disallow it except for this exact case.

As the model transitions from dying to dead, the constraint is that prevents another model of the same name is removed. This was done because the delay before the undertaker cleaned up the dead model caused failures in the solutions QA process.

The query that is listing the models needs to take this into account.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

@thumper, I'm not convinced there's any value in showing it to the user in its Dead state. Once it's Dead, it should be removed quickly thereafter. If it's not, there's nothing a user can do with it, and its presence doesn't block anything else.

One problem with surfacing them, at the moment, is that the ModelManager.ListModels API does not include Life. So the client code that maps model names to UUIDs doesn't/can't know which one to use.

IMO, we should just filter out Dead models from the ModelManager.ListModels API.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

... and after re-reading your comment, maybe we're in violent agreement? I read "query that is listing..." as the client code. But perhaps you meant the mongo query?

Revision history for this message
Joel Sing (jsing) wrote :

It's certainly not being "removed quickly thereafter" (i.e. it is still in the database today) - I suspect there's a secondary issue resulting in it not being cleaned up.

Re "caused failures in the solutions QA process" - does that mean that the ordering was changed purely to make the QA process work? If so, that also seems problematic...

Revision history for this message
Stuart Bishop (stub) wrote :

Using the Juju API, AllModels returns the dying model. DumpModels fails with a bogus error message when passed the dying model. And to retrieve the lifecycle state, you seem to need to call ModelInfo but this fails with a permission error when given the dying model. There seems to be a lot of code that doesn't cope with them correctly, so seems best to filter them out as soon as possible.

Changed in juju:
status: New → Triaged
Revision history for this message
Joel Sing (jsing) wrote :

The dead model still persists in mongo (as per #5) - there are no entries in db.cleanups and looking at the logs there is:

2018-01-02 04:41:51 ERROR juju.worker.cleaner cleaner.go:83 cannot cleanup state: codec.ReadHeader error: error receiving message: read tcp 10.25.2.110:50260->10.25.2.109:17070: read: connection reset by peer
2018-01-02 04:41:51 ERROR juju.worker.dependency engine.go:551 "state-cleaner" manifold worker returned unexpected error: codec.ReadHeader error: error receiving message: read tcp 10.25.2.110:50260->10.25.2.109:17070: read: connection reset by peer
2018-01-02 07:43:24 ERROR juju.worker.cleaner cleaner.go:83 cannot cleanup state: codec.ReadHeader error: error receiving message: read tcp 10.25.2.110:49816->10.25.2.109:17070: read: connection reset by peer
2018-01-02 07:43:24 ERROR juju.worker.cleaner cleaner.go:83 cannot cleanup state: codec.ReadHeader error: error receiving message: read tcp 10.25.2.110:50050->10.25.2.109:17070: read: connection reset by peer

The timestamps match the date (but not the time) that the model is marked as dead. I suspect that there may be an ordering issue here that has resulted in the clean up job being removed, even though the clean up failed since it lost the API connection.

summary: - ERROR model "modelname" has been removed from the controller, run 'juju
- models' and switch to one of them.
+ dead model not removed, triggering issues with duplicate names
Changed in juju:
assignee: Andrew Wilkins (axwalk) → nobody
Changed in juju:
milestone: 2.4-beta1 → none
Ian Booth (wallyworld)
tags: added: teardown
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Since the reporting of this bug more than a year ago, there were a few changes both in the model destruction (from 2.6 onwards) and checking of model names/UUID from (2.5).

Could we please confirm, preferably with Juju 2.6, that this is still an issue?
If it is, could we please have a clearly reproducible scenario?

With newer model destruction logic, it will be less likely that models-scheduled-for-desctruction will be lingering in database and it will be harder to get Juju in this twist.

I am marking this report as Incomplete until we hear of a newer reproduction.

Changed in juju:
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.