Nodes using Ubuntu AMIs lose connectivity after a while on ipv6 EKS clusters
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cloud-images |
Confirmed
|
Undecided
|
Tomáš Virtus |
Bug Description
I'm trying to add Ubuntu nodes to my IPv6 EKS cluster (an eksctl template is attached for reference but it's a fairly simple one).
Everything seems fine initially, but after 5-10 minutes, any containers running on the new node cannot be exec'd into, cannot have their logs viewed or ports-forwarded, although they appear to be running fine otherwise. It seems that traffic stops routing to pods with the following error:
```
kubectl port-forward svc/tf-notebook 8080:80
error: error upgrading connection: error dialing backend: dial tcp [2406:da18:
```
Additional notes:
1. This only happens on IPv6 clusters. If I simply switch the attached eksctl template to IPv4, things work as expected.
2. The same thing happens with both eksctl managed node groups and Karpenter nodes. If it's an Ubuntu node, the node can be made to join the cluster, but the pods eventually lose connectivity. If the AmiFamily is switched to AL2 or IPv4, everything works as expected.
3. Another somewhat related issue is that when IPv6 is specified, the nodes do not join the cluster by default. It turned out that this is because of the following issue:
`Service Ipv6 Cidr must be provided when ip-family is specified as IPV6`.
I have to manually specify the IPV6_CIDR for the node to successfully join the cluster. Happens with both managed node groups and Karpenter nodes. I patched the bootstrap script to infer the IPV6_CIDR as follows:
```
SERVICE_
```
Thanks Nuwan for reporting this bug. To proceed, we need more information. please provide:
* the logs (from kubelet, from the pod, journalctl, /etc/cloud/ build.info) from the Ubuntu node
* exact steps & commands to reproduce the issue