ERROR model <uuid> has been removed (cannot add-model immediately after delete-model)

Bug #1709324 reported by Ryan Beisner
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Medium
Tim Penhey
OpenStack Charm Test Infra
Fix Released
Critical
Ryan Beisner

Bug Description

With 2.2.2, cannot add-model immediately after delete-model.

--

After upgrading from Juju 2.2.1 to 2.2.2, we started having the following issue with iterative deploys using the Juju OpenStack Provider.

"ERROR model f66bb9f7-fef3-4f7b-82dd-c22a13388b8a has been removed"

+ juju switch auto-osci-sv00
auto-osci-sv00 (controller) (no change)
+ juju switch auto-osci-sv00:auto-osci-sv00
auto-osci-sv00 (controller) -> auto-osci-sv00:admin/auto-osci-sv00
+ juju set-model-constraints -m auto-osci-sv00 virt-type=kvm
ERROR model f66bb9f7-fef3-4f7b-82dd-c22a13388b8a has been removed

https://openstack-ci-reports.ubuntu.com/artifacts/test_charm_pipeline_amulet_full/openstack/charm-neutron-api/482995/7/365/consoleText.test_charm_single_1264.txt

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Revision history for this message
Ryan Beisner (1chb1n) wrote :

This has a high impact to OpenStack Charm CI because the juju command exits non-zero and causes failures unrelated to the charm code under test.

Changed in charm-test-infra:
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
Ryan Beisner (1chb1n) wrote :

I also destroyed the controllers and rebootstrapped with 2.2.2, still have the same behavior. So that confirms that the issue is not specific to an upgraded controller.

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

Ignore comments #2 & #5. They are expected with destroy-model, however imo, the output is poorly phrased.

Ian Booth (wallyworld)
Changed in juju:
assignee: nobody → Anastasia (anastasia-macmood)
Revision history for this message
Anastasia (anastasia-macmood) wrote :

This is occurring because the same model was managed via different clients.
This will be fixed as part of the work that will keep client side store more accurately up to date.

Changed in juju:
status: New → Triaged
importance: Undecided → Medium
tags: added: usability
Revision history for this message
Ryan Beisner (1chb1n) wrote :

I'm not sure that Comment #7 is accurate. We destroyed controllers, rebootstrapped new ones, all with 2.2.2, and still experience the "ERROR model FOO has been removed" quite frequently.

no longer affects: charm-test-infra
Changed in charm-test-infra:
status: New → Confirmed
Ryan Beisner (1chb1n)
Changed in charm-test-infra:
importance: Undecided → High
Ryan Beisner (1chb1n)
tags: added: repeatability
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

I can confirm that this issue is trivially reproducible on 2.2.2 with newly bootstrapped controllers (100% of cases with MAAS).

Locally on my laptop with the localhost cloud:

➜ ~ juju --version
2.2.2-zesty-amd64

➜ ~ juju bootstrap localhost && juju add-model test && juju destroy-model test
Creating Juju controller "localhost-localhost" on localhost/localhost
Looking for packaged Juju agent version 2.2.2 for amd64
To configure your system to better support LXD containers, please see: https://github.com/lxc/lxd/blob/master/doc/production-setup.md
Launching controller instance(s) on localhost/localhost...
 - juju-ff41d1-0 (arch=amd64)
Fetching Juju GUI 2.8.0
Waiting for address
Attempting to connect to 10.122.52.138:22
Bootstrap agent now started
Contacting Juju controller at 10.122.52.138 to verify accessibility...
Bootstrap complete, "localhost-localhost" controller now available.
Controller machines are in the "controller" model.
Initial model "default" added.
Added 'test' model on localhost/localhost with credential 'localhost' for user 'admin'

WARNING! This command will destroy the "test" model.
This includes all machines, applications, data and other resources.

Continue [y/N]? y
Destroying model
Waiting on model to be removed...
Unable to get the model status from the API: model 054839fe-a304-4665-841d-688ee6c39697 has been removed.

tags: added: cpec
Revision history for this message
Ryan Beisner (1chb1n) wrote :

Moving to critical as it has become prevalent in CI runs with OpenStack Charms, preventing us from landing code for software engineers.

Changed in charm-test-infra:
importance: High → Critical
assignee: nobody → Ryan Beisner (1chb1n)
Revision history for this message
Ryan Beisner (1chb1n) wrote :

The essence of this bug is that with 2.2.2, one can no longer loop add-model and destroy-model.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

Here is a reproducer (fails on the 2nd iteration consistently):

#### Commands:
for i in $(seq 1 100); do
    echo $i
    juju add-model auto-osci-sv01 serverstack
    # Deploy some things, or not, doesn't seem to affect
    # whether or not this bug is hit.
    juju destroy-model auto-osci-sv01 -y
done

#### Output:
==== 1
Added 'auto-osci-sv01' model on serverstack/serverstack with credential 'osci' for user 'admin'
Destroying model
Waiting on model to be removed...
Unable to get the model status from the API: model 6a712acf-5be2-4a60-8e4d-8e1e4b18a139 has been removed.
==== 2
ERROR failed to create new model: model "auto-osci-sv01" for admin already exists (already exists)

Revision history for this message
Ryan Beisner (1chb1n) wrote :

I'm not in favor of the following as a recommended solution, but this supports the idea that a new race exists in 2.2.2:

On my cloud, when I add a 10s sleep after the destroy, I can do 100 iterations of add-model/destroy-model successfully.

description: updated
summary: - ERROR model <uuid> has been removed
+ ERROR model <uuid> has been removed (cannot add-model immediately after
+ delete-model)
Felipe Reyes (freyes)
tags: added: sts
Revision history for this message
John A Meinel (jameinel) wrote :

I see the "Unable to get the model status" line, but I checked and $? was still 0 for destroy-model. Are you sure this is the cause of nonzero exit?

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

John, the exit code is 0 for me on a localhost controller:

juju add-model default && juju destroy-model default && echo $?
Added 'default' model on localhost/localhost with credential 'localhost' for user 'admin'
WARNING! This command will destroy the "default" model.
This includes all machines, applications, data and other resources.

Continue [y/N]? y
Destroying model
Waiting on model to be removed...
Waiting on model to be removed...
Unable to get the model status from the API: model 8f623638-8d68-4626-8610-f5fd8b09d313 has been removed.
0

Good point.

But:

(openstack-client) ubuntu@maas:~/bundles⟫ juju add-model test
Uploading credential 'samaas/admin/samaas-admin' to controller
ERROR failed to create new model: model "test" for admin already exists (already exists)
(openstack-client) 1 ubuntu@maas:~/bundles⟫ juju destroy-model test
ERROR cannot read model info: model samaas:admin/test not found
(openstack-client) 1 ubuntu@maas:~/bundles⟫ juju add-model test
Uploading credential 'samaas/admin/samaas-admin' to controller
Added 'test' model with credential 'samaas-admin' for user 'admin'
(openstack-client) ubuntu@maas:~/bundles⟫ juju destroy-model test
WARNING! This command will destroy the "test" model.
This includes all machines, applications, data and other resources.

Continue [y/N]? y
Destroying model
Waiting on model to be removed...
Waiting on model to be removed...
Waiting on model to be removed...
Unable to get the model status from the API: model 62ae489d-1e23-4533-8fa8-b7f9fc7f26d6 has been removed.
(openstack-client) ubuntu@maas:~/bundles⟫ juju add-model test
Uploading credential 'samaas/admin/samaas-admin' to controller
ERROR failed to create new model: model "test" for admin already exists (already exists)
(openstack-client) 1 ubuntu@maas:~/bundles⟫ juju destroy-model test
WARNING! This command will destroy the "test" model.
This includes all machines, applications, data and other resources.

Continue [y/N]? y
WARNING cannot read current model: current model for controller samaas not found
ERROR cannot connect to API: model "test" has been removed from the controller, run 'juju models' and switch to one of them.
There are 2 accessible models on controller "samaas".
(openstack-client) 1 ubuntu@maas:~/bundles⟫ juju add-model test
Uploading credential 'samaas/admin/samaas-admin' to controller
Added 'test' model with credential 'samaas-admin' for user 'admin'
(openstack-client) ubuntu@maas:~/bundles⟫ juju destroy-model test
WARNING! This command will destroy the "test" model.
This includes all machines, applications, data and other resources.

Continue [y/N]? y
Destroying model
Waiting on model to be removed...
Waiting on model to be removed...
Unable to get the model status from the API: model eebb491f-e394-4741-80de-4ac3ac14ee91 has been removed.
(openstack-client) ubuntu@maas:~/bundles⟫ juju add-model test
Uploading credential 'samaas/admin/samaas-admin' to controller
ERROR failed to create new model: model "test" for admin already exists (already exists)

I was not able to get exit code 1 on a localhost controller. On a MAAS (non-ha) controller the issue is intermittent.

It looks like a race condition.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

In other words: we get the message consistently but the exit code is sometimes 0 and sometimes 1.

This explains why not every CI run is a failure.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

With Ryan's reproducer:

http://paste.ubuntu.com/25327081/

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

If you do it on a localhost provider with Ryan's reproducer it is fast enough to not even return "Unable to get the model status from the API".

http://paste.ubuntu.com/25327122/

If you do a removal by hand afterwards you get the message but not the exit code.

Revision history for this message
Tim Penhey (thumper) wrote :

I am not able to see where the non-zero exit code comes from.

The error being shown is not overly helpful as it is what we are after, but the last iteration through this code threw away some error information so we are unable to determine the error type on the client at this stage.

Yes it is a race condition, the client waits for the model to be dead, but this is creates a race between the code removing all details of the model from the DB, and the request to create a new model. I'll look at fixing the problem creating a new model, but not the error at this stage.

Revision history for this message
Tim Penhey (thumper) wrote :
Changed in juju:
assignee: Anastasia (anastasia-macmood) → Tim Penhey (thumper)
milestone: none → 2.2.3
status: Triaged → In Progress
Chris Gregan (cgregan)
tags: added: cdo-qa-blocker
Tim Penhey (thumper)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
Ante Karamatić (ivoks)
tags: added: cpe-onsite
removed: cpec
Revision history for this message
James Page (james-page) wrote :

UOSCI is now on 2.2.4 - marking charm-test-infra bug as released.

Changed in charm-test-infra:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.