juju using Openstack provider does not remove security groups on remove-machine after a failed provisioning

Bug #1940637 reported by Gareth Woolridge
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Committed
High
Simon Richardson
2.9
Fix Released
Undecided
Simon Richardson
3.0
Fix Released
Undecided
Simon Richardson

Bug Description

When using the Openstack provider on juju 2.8.9 controller/model we hit an Openstack quota issue which caused add-unit to fail with instances stuck in error state.

We removed the machines eg juju remove-machine --force but observed that openstack security groups pertaining to those instances were not removed. This meant after bumping instance quota and trying again we hit the same issue due to being at secgroup quota limit.

It was necessary to remove the security groups manually with openstack security group delete ....

Expected outcome: juju should tidyup after itself wrt secgroups.

Revision history for this message
John A Meinel (jameinel) wrote :

Is this happening if you also just 'juju remove-machine' without '--force' ?
What is the underlying error state, is the issue that the machine never came up, or it did come up but then ended up hanging after a while with an error.
Certainly we shouldn't be leaking security groups as a general case. And we should be trying to clean them up even with '--force'. It does mean that if we fail to clean them up, likely '--force' will still progress the removal of the rest of the machine information because you asked us to.

However, if any 'juju remove-machine' leaks a security group, that is definitely something we should be fixing. I also wonder if 'force' makes us progress faster, and that leaves us hitting something like "cannot remove security group because it is still in use".

Changed in juju:
importance: Undecided → High
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
Revision history for this message
Haw Loeung (hloeung) wrote (last edit ):

We're still seeing this, even with 2.9.21.

We're using --force in this environment that I'm looking at (the prometheus check one). It was recently switched to using --force since the juju controller issues post-upgrade and was leaking resources causing us to manually clean them up. Resources being juju applications, units, and OpenStack instances.

Changed in juju:
status: Expired → New
Revision history for this message
John A Meinel (jameinel) wrote :

So you are using --force, which says "even if you get errors during cleanup, continue to remove the machine from the model", which is somewhat likely to leave things like security groups behind (if the initial request to clean them up had failed).

Changed in juju:
status: New → Triaged
tags: added: cleanup openstack-provider
Changed in juju:
milestone: none → 2.9-next
Harry Pidcock (hpidcock)
Changed in juju:
milestone: 2.9-next → 3.1-beta1
Changed in juju:
assignee: nobody → Simon Richardson (simonrichardson)
status: Triaged → In Progress
Changed in juju:
milestone: 3.1-beta1 → 3.1-rc1
Changed in juju:
milestone: 3.1-rc1 → 3.1-rc2
Changed in juju:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.