aws integrator charm hitting issue ''An error occurred (TagLimitExceeded) when calling the CreateTags operation: the TagSet:"

Bug #1823106 reported by Calvin Hartwell
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
AWS Integrator Charm
Fix Released
High
Cory Johns

Bug Description

Hi all,

It seems that the aws integrator charm is not functioning correctly, I am hitting this issue with the latest revision 8 on both CDK 1.13 and 1.14, this does not appear to be an issue caused by account permissions, instead k8s/Juju is trying to setup too many tags and it is breaking the deployment.

Here is the output from the issue: https://pastebin.ubuntu.com/p/nRT8RpV7pm/
Standard bundle: https://pastebin.ubuntu.com/p/tR3ZTWMxmX/

Cheers

Changed in charm-aws-integrator:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Cory Johns (johnsca)
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

@calvin, I hit this last week. Workaround for me was to remove the subnet tags from the AWS console. Be careful not to remove any tags that may be associated with an active deployment that you care about -- in my case, i had destroyed the controller, so it was safe for me to remove all tags.

More details in bug 1821785.

Revision history for this message
Cory Johns (johnsca) wrote :

This is due to the fact that the VPC is reused for multiple clusters and the subnets persist with the VPCs and retain the tags for every cluster. https://bugs.launchpad.net/charm-aws-integrator/+bug/1821785 is the same issue.

I'll work on improving the integrator charm cleaning up the subnet tags, but this is complicated by two issues:

1) The tags are requested by the related charm, so the integrator can't know which ones it should clean up. It can track the ones requested by current relations so it can try to clean them up during teardown, but that leads to issue 2.

2) The teardown process doesn't always allow the integrator charm to fully complete all of its cleanup process. This is why we have the purge-iam-entities action on the charm, to find and clean up unused entities from previous clusters. However, without the cached info about what tags the integrator created on behalf of related applications, we can't properly find the subnet tags to clean.

I think the best that we can manage is to ensure that the integrator at least tries to clean up the subnet tags during teardown, and provide another action to make it easier to inspect and selectively clean out tags on the subnets, since those are the only entities that are persistent. There will inherently be a manual aspect to this, though. The README for the integrator already mentions that this type of thing is a concern, and that the operator will need to keep an eye on resources that may have been allocated which the integrator charm does not know about or cannot reliably detect and clean up.

Revision history for this message
Cory Johns (johnsca) wrote :

In the meantime, you can manually clean up any tags on the default VPC's subnets that start with kuberentes.io/cluster/kubernetes- as long as the generated cluster tag following that is not for a cluster that is in active use. You can find out the cluster tag for a given CDK cluster by doing the following:

juju run --unit kubernetes-master/x -- leader-get cluster_tag

Where "x" is the unit number of the leader, if the master is in HA mode.

Revision history for this message
Calvin Hartwell (calvinh) wrote :

@Cory thanks for the quick response, I will delete the tags but it would be good if we could fix this long-term as you described.

Cory Johns (johnsca)
Changed in charm-aws-integrator:
status: Triaged → In Progress
Cory Johns (johnsca)
Changed in charm-aws-integrator:
status: In Progress → Fix Committed
Revision history for this message
Cory Johns (johnsca) wrote :

This is released to stable as cs:~containers/aws-integrator-9

Changed in charm-aws-integrator:
status: Fix Committed → Fix Released
Revision history for this message
Cory Johns (johnsca) wrote :

Note: The automatic cleanup generally isn't very reliable when simply doing a destroy-model (you'd need to manually remove all relations and wait for it to settle), so you almost certainly want to using the new action:

juju run-action aws-integrator/0 purge-subnet-tags include=kubernetes.io/cluster

That will only remove the K8s cluster tags, which are likely the ones filling up the quota. You can also inspect the tags with the other new action:

juju run-action aws-integrator/0 list-subnet-tags

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.