Juju failing to remove unit due to attached storage stuck dying
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
High
|
Ian Booth |
Bug Description
Hi,
Noticed a lot of noise about juju failing to remove machines, logging such as this:
| 2021-11-15 00:22:15 WARNING juju.state cleanup.go:213 cleanup failed in model 5e38a904-
It looks like at some point, there was a volume attached but that's now stuck being cleaned up:
| volumes:
| "3":
| provider-id: dd052012-
| attachments:
| machines:
| "19":
| device: vdb
| read-only: false
| life: alive
| pool: cinder
| size: 51200
| persistent: true
| life: dying
| status:
| current: attaching
| message: |-
| failed to list volume attachments
| caused by: Resource at http://
| caused by: request (http://
| since: 03 Mar 2020 06:19:06Z
See https:/
(sorry, company private)
Not having a name associated with this storage means we can't try force removing it with 'juju remove-storage).
This is on a recently upgraded 2.9.18 controller (was 2.8.7).
description: | updated |
Changed in juju: | |
milestone: | none → 2.9.20 |
importance: | Undecided → High |
status: | New → Triaged |
Changed in juju: | |
status: | In Progress → Fix Committed |
Changed in juju: | |
milestone: | 2.9.20 → 2.9.21 |
Changed in juju: | |
status: | Fix Committed → Fix Released |
What happens if you:
set debug logging on the storage provisioner worker, ie add to logging-config
juju.worker. storageprovisio ner=DEBUG
juju remove-machine 19 --force
(it might take a few minutes to time out and run the force)
Can you do that and attach pastebins of juju dump-db:
- the cleanups collection
- machine 19 from the machines collection
Also the last 5 or 10 minutes of logs, ie starting before the remove --force was run.
Then turn off the extra debugging.
Thanks