Canonical Juju

Juju failing to remove unit due to attached storage stuck dying

Bug #1950928 reported by Haw Loeung on 2021-11-15

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	High	Ian Booth	Canonical Juju 2.9.21

Bug Description

Hi,

Noticed a lot of noise about juju failing to remove machines, logging such as this:

| 2021-11-15 00:22:15 WARNING juju.state cleanup.go:213 cleanup failed in model 5e38a904-8ee0-48db-8ff1-7e2feee0835a for machine("19"): machine 19 has attachments [volume-3]

It looks like at some point, there was a volume attached but that's now stuck being cleaned up:

| volumes:
| "3":
| provider-id: dd052012-96c3-4df3-b0e7-87e5b0507788
| attachments:
| machines:
| "19":
| device: vdb
| read-only: false
| life: alive
| pool: cinder
| size: 51200
| persistent: true
| life: dying
| status:
| current: attaching
| message: |-
| failed to list volume attachments
| caused by: Resource at http://...:8774/v2/.../servers/.../os-volume_attachments not found
| caused by: request (http://...:8774/v2/.../servers/.../os-volume_attachments) returned unexpected status: 404; error info: {"itemNotFound": {"message": "The resource could not be found.", "code": 404}}
| since: 03 Mar 2020 06:19:06Z

See https://pastebin.canonical.com/p/fNVxjn7SDm/ and https://pastebin.canonical.com/p/nXVKHSJcgc/
(sorry, company private)

Not having a name associated with this storage means we can't try force removing it with 'juju remove-storage).

This is on a recently upgraded 2.9.18 controller (was 2.8.7).

See original description

Haw Loeung (hloeung) on 2021-11-15

description:

updated

Ian Booth (wallyworld) on 2021-11-15

Changed in juju:
milestone:	none → 2.9.20
importance:	Undecided → High
status:	New → Triaged

Revision history for this message

Ian Booth (wallyworld) wrote on 2021-11-17 (last edit on 2021-11-17):

What happens if you:

set debug logging on the storage provisioner worker, ie add to logging-config

juju.worker.storageprovisioner=DEBUG

juju remove-machine 19 --force

(it might take a few minutes to time out and run the force)

Can you do that and attach pastebins of juju dump-db:
- the cleanups collection
- machine 19 from the machines collection

Also the last 5 or 10 minutes of logs, ie starting before the remove --force was run.

Then turn off the extra debugging.

Thanks

Revision history for this message

Ian Booth (wallyworld) wrote on 2021-11-18:

From the logs

2021-11-18 01:38:49 WARNING juju.state cleanup.go:213 cleanup failed in model 5e38a904-8ee0-48db-8ff1-7e2feee0835a for forceRemoveMachine("19"): removing attachment plan of volume 3 from machine 19: state changing too quickly; try again soon

--force should have worked but there's an issue with the logic and it's failing to handle the bad/incomplete data is it finding.

Changed in juju:
assignee:	nobody → Ian Booth (wallyworld)
status:	Triaged → In Progress

Revision history for this message

Haw Loeung (hloeung) wrote on 2021-11-18 (last edit on 2021-11-18):

Per Ian, this DB query helped unstick things:

| juju:PRIMARY> db.volumeattachments.update(
| ... {_id: "5e38a904-8ee0-48db-8ff1-7e2feee0835a:19:3"},
| ... { $set:{life: 1}}
| ... )

Where it's model-uuid : machine-num : volume-num. All that provided by the controller log:

| 2021-11-15 00:22:15 WARNING juju.state cleanup.go:213 cleanup failed in model 5e38a904-8ee0-48db-8ff1-7e2feee0835a for machine("19"): machine 19 has attachments [volume-3]

Revision history for this message

Ian Booth (wallyworld) wrote on 2021-11-18:

https://github.com/juju/juju/pull/13510

Ian Booth (wallyworld) on 2021-11-18

Changed in juju:
status:	In Progress → Fix Committed

Ian Booth (wallyworld) on 2021-12-01

Changed in juju:
milestone:	2.9.20 → 2.9.21

Canonical Juju QA Bot (juju-qa-bot) on 2021-12-02

Changed in juju:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.