AWS Integrator Charm

load balancers created for Kube API via the loadbalancer endpoint are not deleted

Bug #1874864 reported by Jeff Hillman on 2020-04-24

This bug affects 1 person

	Status	Importance	Assigned to
AWS Integrator Charm	Triaged	Medium	Unassigned
Canonical Juju	Incomplete	High	Unassigned
OpenStack Octavia Charm	Invalid	Undecided	Unassigned
Openstack Integrator Charm	Triaged	Medium	Unassigned

Bug Description

Using openstack-integrator in place of kubeapi-load-balancer.

The load balancer is created properly, and the appropriate members (k8s-masters) are joined to the load balancer.

When the kubernetes model is destroyed, the load balancer is still present in openstack.

Trying to manually delete the load balancer shows that it still has members.

trying to delete the pool associated with the load balancer (in an effort to manually clean it up), gives the following error:

---

$ openstack loadbalancer pool delete e045da82-a95c-44f7-9efc-dd5c42aa7c55
Load Balancer f6f01eee-c6c4-4a60-984b-48d57fde2bdf is immutable and cannot be updated. (HTTP 409) (Request-ID: req-77bae993-2148-4a46-8c97-f37ded4bb0fd)

---

Jeff Hillman (jhillman) on 2020-04-27

summary:

- load balancers created via the loadbalancer endpoint are not deleted
+ load balancers created for Kube API via the loadbalancer endpoint are
+ not deleted

Revision history for this message

Cory Johns (johnsca) wrote on 2020-04-27:

There is explicit cleanup logic in the charm ([1] and [2]) but it seems like the stop hook may not be having a chance to complete that. Options there are to move the cleanup earlier, to the relation hooks, and to provide an action to do an explicit manual cleanup, like the AWS integrator charm has. If it's a hook race condition during model destroy, then it may still require a manual intervention in the teardown to ensure that the cleanup is run before the destroy-model command is run, but it should be escalated up to the Juju team if that's the case.

Alternatively, if the cleanup is failing, we need to figure out why. As discussed on IRC, you're going to try running the specific cleanup command that the charm uses (openstack loadbalancer delete --cascade $lb_name) to see if that same "immutable" error occurs.

[1]: https://github.com/juju-solutions/charm-openstack-integrator/blob/master/reactive/openstack.py#L125-L128

[2]: https://github.com/juju-solutions/charm-openstack-integrator/blob/master/lib/charms/layer/openstack.py#L140-L154

Revision history for this message

Jeff Hillman (jhillman) wrote on 2020-04-27:

Running 'openstack loadbalancer delete --cascade <$lb-id>' worked.

It removed the old loadbalancer.

Revision history for this message

Jeff Hillman (jhillman) wrote on 2020-04-27:

It should be noted, that after the loadbalancer is deleted with --cascade, the security group for this loadbalancer still exists.

---

---

This is still requiring a manual clean up.

Revision history for this message

Cory Johns (johnsca) wrote on 2020-04-27:

Hrm. It seems that the cleanup code originally didn't cleanup the SG because it was global rather than per-LB [1]. However, it seems that is no longer the case and a new SG is used for each LB [2]. That does need to be fixed.

[1]: https://github.com/juju-solutions/charm-openstack-integrator/blob/master/lib/charms/layer/openstack.py#L141-L142

[2]: https://github.com/juju-solutions/charm-openstack-integrator/blob/master/lib/charms/layer/openstack.py#L419

Revision history for this message

Cory Johns (johnsca) wrote on 2020-04-27:

The fact that the LB itself is not being cleaned up indicates that the stop hook [1] is either not running or being interrupted, during model destruction. I'm attaching Juju to this bug because my expectation is that the destruction would wait until the stop hook has run.

Just to confirm, Jeff: I trust that you are using neither --force nor --no-wait when destroying the model?

[1]: https://github.com/juju-solutions/charm-openstack-integrator/blob/master/reactive/openstack.py#L125-L128

Revision history for this message

Jeff Hillman (jhillman) wrote on 2020-04-27:

No switches are used. Simply 'juju destroy-model kubernetes'

Revision history for this message

Cory Johns (johnsca) wrote on 2020-04-27:

Adding the AWS integrator as well because it has a similar cleanup stop hook [1] which is also known to not get run during model destruction (which is why we added the purge-iam-entities to that charm).

[1]: https://github.com/juju-solutions/charm-aws-integrator/blob/master/reactive/aws.py#L154-L161

Revision history for this message

Cory Johns (johnsca) wrote on 2020-04-27:

I should note that when trying to debug this in the past, I had issues with `juju debug-logs` terminating output before the model was completely destroyed. At the time, it was not entirely clear if that was just a communication issue or not, but I think the implementation of debug-logs has improved since then so we might be able to get some better info.

Revision history for this message

Jeff Hillman (jhillman) wrote on 2020-04-28:

os-integrator.log Edit (66.0 KiB, text/plain)

openstack-integrator log grabbed by 'juju debug-log --include openstack-integrator'

It starts right before the destroy-model command was issues (with no switches).

Revision history for this message

Ian Booth (wallyworld) wrote on 2020-04-28:

#10

We'll investigate any Juju issue post 2.8. Adding to the 2.8.1 milestone initially so the bug doesn't get lost.

Changed in juju:
milestone:	none → 2.8.1
status:	New → Triaged
importance:	Undecided → High

Tim Van Steenburgh (tvansteenburgh) on 2020-05-01

Changed in charm-aws-integrator:
status:	New → Triaged
Changed in charm-openstack-integrator:
status:	New → Triaged
Changed in charm-aws-integrator:
importance:	Undecided → Medium
Changed in charm-openstack-integrator:
importance:	Undecided → Medium

Tim Penhey (thumper) on 2020-06-08

Changed in juju:
status:	Triaged → Incomplete
milestone:	2.8.1 → none

Revision history for this message

Drew Freiberger (afreiberger) wrote on 2021-08-18:

#11

With the loadbalancer not being deleted, this ended up triggering a bug in Octavia which I've filed upstream, hoping for an Ussuri backport if there's a fix in master.

Tagging charm-octavia to monitor https://storyboard.openstack.org/#!/story/2009128 which provides details of what I've found which relates to this bug.

Revision history for this message

Alex Kavanagh (ajkavanagh) wrote on 2023-06-16:

#12

This is not an Octavia charm issue, unless it is believed that the charm is configuring the octavia application incorrectly. If this *is* the case, then please re-open the bug on the octavia charm.

Changed in charm-octavia:
status:	New → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

os-integrator.log Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.