juju fails to destroy model

Bug #1876345 reported by Nicolas Bock
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
High
Ian Booth

Bug Description

Running

juju destroy-model --force --destroy-storage ovn

eventually times out and fails to remove the model from `juju models`.

Revision history for this message
Nicolas Bock (nicolasbock) wrote :

$ juju show-model ovn
ovn:
  name: admin/ovn
  short-name: ovn
  model-uuid: 033ad9d9-06b2-48c6-8e9d-c97738584cdd
  model-type: iaas
  controller-uuid: de32fc72-f62c-4a22-8621-30a87fe21d77
  controller-name: nicolasbock
  is-controller: false
  owner: admin
  cloud: stsstack
  region: stsstack
  type: openstack
  life: dying
  status:
    current: destroying
    message: 'attempt 1 to destroy model failed (will retry): model not empty, found
      1 machine, 1 volume (model not empty)'
    since: 3 hours ago
  users:
    admin:
      display-name: admin
      access: admin
      last-connection: 3 hours ago
  sla: unsupported
  agent-version: 2.7.4
  credential:
    name: nicolasbock
    owner: admin
    cloud: stsstack
    validity-check: valid

Revision history for this message
Nicolas Bock (nicolasbock) wrote :
Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.8-rc1
status: New → Triaged
importance: Undecided → High
Revision history for this message
Ian Booth (wallyworld) wrote :

I had a look at the model dump and there's only one model in there, and it's not the ovn model with uuid 033ad9d9-06b2-48c6-8e9d-c97738584cdd.

There's a model with uuid b88a1f35-f041-4e1e-8e82-50e05bab02b6 which I'm guessing is the controller model.

Can you take another look at dump-db to ensure it's come off the right controller?
Maybe a dump-model on the ovn model could be useful as well / instead.

Revision history for this message
Nicolas Bock (nicolasbock) wrote :
Revision history for this message
Nicolas Bock (nicolasbock) wrote :
Revision history for this message
Nicolas Bock (nicolasbock) wrote :

I switched to the `ovn` model and ran `dump-db` and `dump-model`.

Tim Penhey (thumper)
Changed in juju:
assignee: nobody → Ian Booth (wallyworld)
status: Triaged → In Progress
Revision history for this message
Ian Booth (wallyworld) wrote :

The data seems to show that:

juju machine 6 is the remaining machine in the model
it is associated with nova compute instance ed21b162-7245-41fa-a0cf-bcc9644e68fe
that instance is not found in openstack, and the last known status was BUILD, so it never got running

juju machine 6 has a volume, volume attachment, and volume attachment plan
the volume and attachment plans are dying, in the process of being removed, but this step is failing because listing attachments for the volume gets a 404 since the instance is not found

Revision history for this message
Ian Booth (wallyworld) wrote :

Can you also attach log files, from the controller and ovn models?

Revision history for this message
Nicolas Bock (nicolasbock) wrote :

You mean `juju debug-log --replay --model {ovn,controller}`?

Revision history for this message
Ian Booth (wallyworld) wrote :

Yes please
Or you can grab the log files directly from /var/log/juju

Revision history for this message
Nicolas Bock (nicolasbock) wrote :
Revision history for this message
Nicolas Bock (nicolasbock) wrote :
Revision history for this message
Nicolas Bock (nicolasbock) wrote :

Done

Revision history for this message
Ian Booth (wallyworld) wrote :

Thank you. I think we have enough now to put together a fix so that --force can handle these sorts of storage related issues cleaning up.

Revision history for this message
Nicolas Bock (nicolasbock) wrote :

Thanks!

Ian Booth (wallyworld)
Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Nicolas Bock (nicolasbock) wrote :

Hi Ian,

I finally figured out that I could install the juju snap from the candidate channel to test whether your fix works. Sorry this took so long.

I ran

juju destroy-model --destroy-storage --force ovn

with juju-2.8-rc2-bionic-amd64 and it's still not working it seems. Juju keeps printing out '.' characters with not apparent progress judging from a `juju status --model ovn`.

Please let me know if there is any other information I can provide that might help here. I can also give you access to my bastion on STSStack so you can poke around yourself if that helps.

Thanks

Revision history for this message
Ian Booth (wallyworld) wrote :

Just to clarify, did you install the snap and bootstrap a brand new rc2 controller, or upgrade existing controllers and all models via upgrade-controller and upgrade-model?

If you're still seeing issues with controllers and models all running rc2, providing access would help a lot as they may be another corner case that needs to be addressed. There's a lot of moving parts cleanly destroying a model and if one bit fails, it needs bespoke handling to unstick the process. Feel free to send me the details.

Revision history for this message
Nicolas Bock (nicolasbock) wrote :
Revision history for this message
Nicolas Bock (nicolasbock) wrote :
Revision history for this message
Nicolas Bock (nicolasbock) wrote :
Revision history for this message
Nicolas Bock (nicolasbock) wrote :
Revision history for this message
Nicolas Bock (nicolasbock) wrote :

Hi Ian,

I have run

juju destroy-model --force --destroy-storage ovn

again and pulled logs and db from the controller and ovn models.

Revision history for this message
Ian Booth (wallyworld) wrote :

The ovn model dump shows that for machine 6, there's:
- a dying volume
- a dying volume plan
- an alive volume attachment

Because the cloud instance for machine 6 is gone, the volume attachment cannot be cleanly removed to unblock the removal of the associated volume and plan.

The -force param is meant to deal with this by scheduling a job to forcibly remove these items from the juju model after 2 minutes of trying to do it nicely. However, there's an attribute on the machine called "force-destroyed" which is set to true the first time, to avoid scheduling the job more than once.

force-destroyed is currently true which may have been from previous attempts.

Can we try this:

set the force-destroyed back to false by logging into model db with the mongo client and running

db.machines.update({"_id" : "033ad9d9-06b2-48c6-8e9d-c97738584cdd:6"}, {$set: {"force-destroyed": false}})

confirm by looking at the result

db.machines.find({"_id" : "033ad9d9-06b2-48c6-8e9d-c97738584cdd:6"}).pretty()

Turn on extra logging

juju model-config -m controller logging-config="<root>=INFO;juju.state=DEBUG"

Try destroy-model --force again

If that doesn't work, reset the force-destroyed flag back to false and try destroy-machine 6 --force and see if that gets rid of the machine.

Once we see how the above works out, we can figure out if there's anything to fix.

Revision history for this message
Nicolas Bock (nicolasbock) wrote :

Hi Ian,

Hi. I am running `db.machines.update({"_id" : "033ad9d9-06b2-48c6-8e9d-c97738584cdd:6"}, {$set: {"force-destroyed": false}})` in the controllers DB but `db.machines.find({"_id" : "033ad9d9-06b2-48c6-8e9d-c97738584cdd:6"}).pretty()` keeps showing `true` for that field. When I do the same for a fictitious field, e.g. `force-destroyed-x`, I can flip the value between `true` and `false` which leads me to believe that I am using the command correctly.

But, I am probably missing something...

Revision history for this message
Ian Booth (wallyworld) wrote :

Turn on the extra logging first.

It may be that cleanup jobs are running and setting it back to true. You can check by

db.cleanups.find().pretty()

(the logging should also indicate if any get run, we specifically want to see if the forced storage cleanup gets run, so find() or logging should tell that)

It would be interesting to see what cleanup jobs there are queued, so grab that before trying the next step.

Try deleting all the cleanup jobs

db.cleanups.deleteMany({})

Might need to check that none get queued again and run delete again.

Then try destroy-model --force

Harry Pidcock (hpidcock)
Changed in juju:
status: Fix Committed → Fix Released
Ian Booth (wallyworld)
Changed in juju:
status: Fix Released → Triaged
milestone: 2.8-rc1 → 2.8.1
Tim Penhey (thumper)
Changed in juju:
status: Triaged → Incomplete
Changed in juju:
milestone: 2.8.1 → 2.8.2
Changed in juju:
milestone: 2.8.2 → 2.8.3
Changed in juju:
milestone: 2.8.4 → 2.8.5
Changed in juju:
milestone: 2.8.5 → 2.8.6
John A Meinel (jameinel)
Changed in juju:
milestone: 2.8.6 → 2.8-next
Changed in juju:
milestone: 2.8-next → none
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers