[azure] Model deletion fails with timeout

Bug #1977858 reported by Vladimir Grevtsev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Ian Booth

Bug Description

juju 2.9.31

expected result: operator should be able to remove the model without using --force flags and/or experiencing timeouts
actual result: azure model cannot be deleted without using --force flag.

=== steps to reproduce
juju bootstrap azure/westeurope --config resource-group-name=JujuController --config=logging-config="<root>=DEBUG" --no-default-model azure-controller
juju add-model workload --config="resource-group-name=JujuWorkload"

$ juju status
Model Controller Cloud/Region Version SLA Timestamp
workload azure-controller azure/westeurope 2.9.31 unsupported 12:08:48Z

App Version Status Scale Charm Channel Rev Exposed Message
hello-juju active 1 hello-juju stable 8 yes
postgresql 12.11 active 1 postgresql stable 239 no Live master (12.11)

Unit Workload Agent Machine Public address Ports Message
hello-juju/0* active idle 0 20.86.102.41 80/tcp
postgresql/0* active idle 1 20.229.87.153 5432/tcp Live master (12.11)

Machine State DNS Inst id Series AZ Message
0 started 20.86.102.41 machine-0 focal
1 started 20.229.87.153 machine-1 focal

$ juju destroy-model workload
WARNING! This command will destroy the "workload" model.
This includes all machines, applications, data and other resources.

Continue [y/N]? y
Destroying model
Waiting for model to be removed, 2 machine(s), 2 application(s).................
................................
Waiting for model to be removed, 2 machine(s), 1 application(s)................
Waiting for model to be removed, 1 machine(s), 1 application(s)...
Waiting for model to be removed, 1 machine(s)...................................
........................
Waiting for model to be removed.................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
........
Because the destroy model operation did not finish, there may be cloud resources left behind.
Run 'destroy-model <model-name> --timeout=0 --force' to clean up the Juju model database records
even with potentially orphaned cloud resources.
ERROR timeout after 30m0s

$ juju destroy-model workload --debug
12:39:23 INFO juju.cmd supercommand.go:56 running juju [2.9.31 0f2ce8e528a67fa3f735dff39a1a68c44540bb97 gc go1.18.2]
12:39:23 DEBUG juju.cmd supercommand.go:57 args: []string{"/snap/juju/19414/bin/juju", "destroy-model", "workload", "--debug"}
WARNING! This command will destroy the "workload" model.
This includes all machines, applications, data and other resources.

Continue [y/N]? y
12:39:25 INFO juju.juju api.go:78 connecting to API addresses: [20.234.186.121:17070 192.168.16.4:17070]
12:39:25 DEBUG juju.api apiclient.go:1153 successfully dialed "wss://20.234.186.121:17070/api"
12:39:25 INFO juju.api apiclient.go:688 connection established to "wss://20.234.186.121:17070/api"
12:39:25 INFO juju.juju api.go:330 API endpoints changed from [192.168.16.4:17070 20.234.186.121:17070] to [20.234.186.121:17070 192.168.16.4:17070]
12:39:25 INFO juju.juju api.go:78 connecting to API addresses: [20.234.186.121:17070 192.168.16.4:17070]
12:39:25 DEBUG juju.api apiclient.go:1153 successfully dialed "wss://20.234.186.121:17070/model/a85d016f-951e-413c-8260-483d7ad62dd1/api"
12:39:25 INFO juju.api apiclient.go:688 connection established to "wss://20.234.186.121:17070/model/a85d016f-951e-413c-8260-483d7ad62dd1/api"
Destroying model
Waiting for model to be removed.................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
................................................................................
...............................................
Because the destroy model operation did not finish, there may be cloud resources left behind.
Run 'destroy-model <model-name> --timeout=0 --force' to clean up the Juju model database records
even with potentially orphaned cloud resources.
13:09:25 DEBUG juju.api monitor.go:35 RPC connection died
13:09:25 DEBUG juju.api monitor.go:35 RPC connection died
ERROR timeout after 30m0s
13:09:25 DEBUG cmd supercommand.go:537 error stack:
/build/snapcraft-juju-25888574271dd1b08771e6ebeeab8ad6/parts/juju/src/cmd/juju/model/destroy.go:467: timeout after 30m0s

$ juju status
Model Controller Cloud/Region Version SLA Timestamp Notes
workload azure-controller azure/westeurope 2.9.31 unsupported 13:33:17Z tearing down cloud environment

Model "admin/workload" is empty.

$ az resource list --resource-group JujuWorkload -o table
Name ResourceGroup Location Type Status
--------------------- --------------- ---------- --------------------------------------- --------
machine-1 JUJUWORKLOAD westeurope Microsoft.Compute/disks
juju-internal-nsg JujuWorkload westeurope Microsoft.Network/networkSecurityGroups
juju-internal-network JujuWorkload westeurope Microsoft.Network/virtualNetworks

# remove model with --force does the trick, but leaves us with some leftovers:

$ juju destroy-model workload --force
WARNING! This command will destroy the "workload" model.
This includes all machines, applications, data and other resources.

Continue [y/N]? y
Destroying model
Waiting for model to be removed.....................................
Model destroyed.

$ az resource list --resource-group JujuWorkload -o table
Name ResourceGroup Location Type Status
--------------------- --------------- ---------- --------------------------------------- --------
machine-1 JUJUWORKLOAD westeurope Microsoft.Compute/disks
juju-internal-nsg JujuWorkload westeurope Microsoft.Network/networkSecurityGroups
juju-internal-network JujuWorkload westeurope Microsoft.Network/virtualNetworks

description: updated
Ian Booth (wallyworld)
tags: added: teardown
Revision history for this message
Ian Booth (wallyworld) wrote :

This looks like a custom resource group is used rather than allowing juju to create the group.
For juju created resource groups, we simply delete the group and everything in it goes away. For custom resource groups, we need to delete the resources manually. So seems like there's an issue there.

Changed in juju:
milestone: none → 2.9.33
status: New → Triaged
importance: Undecided → High
tags: added: azure-provider
Revision history for this message
Ian Booth (wallyworld) wrote :

When juju iterates over the custom resource group to manually delete resources in that group which belong to the model being deleted, it will log as a warning any resources that were not removed. Can you look at the controller logs to see if there's a line like:

WARNING ... could not delete all Azure resources, remaining ...

Revision history for this message
Ian Booth (wallyworld) wrote :

I ran up a quick test, looks like there's an Azure API versioning issue and it's not accepting the delete requests

ERROR juju.worker.dependency "undertaker" manifold worker returned unexpected error: cannot destroy cloud resources: deleting resources: error deleting resource "/subscriptions/2eebf55a-1e02-45d8-a299-02aed8aea00b/resourceGroups/Test/providers/Microsoft.Network/networkSecurityGroups/juju-internal-nsg": &errors.unformatter{message:"deleting resource \"/subscriptions/2eebf55a-1e02-45d8-a299-02aed8aea00b/resourceGroups/Test/providers/Microsoft.Network/networkSecurityGroups/juju-internal-nsg\": resources.Client#DeleteByID: Failure sending request: StatusCode=400 -- Original Error: Code=\"NoRegisteredProviderFound\" Message=\"No registered resource provider found for location 'australiasoutheast' and API version '2021-11-01' for type 'networkSecurityGroups'. The supported api-versions are '2014-12-01-preview, 2015-05-01-preview, 2015-06-15, 2016-03-30, 2016-06-01, 2016-07-01, 2016-08-01, 2016-09-01, 2016-10-01, 2016-11-01, 2016-12-01, 2017-03-01, 2017-04-01, 2017-06-01, 2017-08-01, 2017-09-01, 2017-10-01, 2017-11-01, 2018-01-01, 2018-02-01, 2018-03-01, 2018-04-01, 2018-05-01, 2018-06-01, 2018-07-01, 2018-08-01, 2018-10-01, 2018-11-01, 2018-12-01, 2019-02-01, 2019-04-01, 2019-06-01, 2019-07-01, 2019-08-01, 2019-09-01, 2019-11-01, 2019-12-01, 2020-01-01, 2020-03-01, 2020-04-01, 2020-05-01, 2020-06-01, 2020-07-01, 2020-08-01, 2020-11-01, 2021-01-01, 2021-02-01, 2021-03-01, 2021-04-01, 2021-05-01, 2021-06-01, 2021-08-01, 2021-12-01, 2022-01-01'. The supported locations are 'westus, eastus, northeurope, westeurope, eastasia, southeastasia, northcentralus, southcentralus, centralus, eastus2, japaneast, japanwest, brazilsouth, australiaeast, australiasoutheast, centralindia, southindia, westindia, canadacentral, canadaeast, westcentralus, westus2, ukwest, uksouth, koreacentral, koreasouth, francecentral, australiacentral, southafricanorth, uaenorth, switzerlandnorth, germanywestcentral, norwayeast, westus3, jioindiawest, swedencentral'.\"", cause:autorest.DetailedError{Original:(*azure.ServiceError)(0xc003749e00), PackageType:"resources.Client", Method:"DeleteByID", StatusCode:400, Message:"Failure sending request", ServiceError:[]uint8(nil), Response:(*http.Response)(0xc0037586c0)}, previous:autorest.DetailedError{Original:(*azure.ServiceError)(0xc003749e00), PackageType:"resources.Client", Method:"DeleteByID", StatusCode:400, Message:"Failure sending request", ServiceError:[]uint8(nil), Response:(*http.Response)(0xc0037586c0)}, file:"/home/ian/juju/go/src/juju/juju/provider/azure/environ.go", line:2204}

Revision history for this message
Ian Booth (wallyworld) wrote :

And there's API incompatibilities. Updating the SDK to v65 and using compute API version "2021-12-01", vm creation fails. But to get to the newer APIs requires migration to a brand new, slightly incompatible SDK.

Revision history for this message
Ian Booth (wallyworld) wrote :

I think this PR might fix the issue. I added a model using an existing resource group and was able to destroy the model successfully.

https://github.com/juju/juju/pull/14170

Changed in juju:
assignee: nobody → Ian Booth (wallyworld)
status: Triaged → In Progress
Ian Booth (wallyworld)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.