Nodes using Ubuntu AMIs lose connectivity after a while on ipv6 EKS clusters

Bug #2046323 reported by Nuwan
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-images
Confirmed
Undecided
Tomáš Virtus

Bug Description

I'm trying to add Ubuntu nodes to my IPv6 EKS cluster (an eksctl template is attached for reference but it's a fairly simple one).

Everything seems fine initially, but after 5-10 minutes, any containers running on the new node cannot be exec'd into, cannot have their logs viewed or ports-forwarded, although they appear to be running fine otherwise. It seems that traffic stops routing to pods with the following error:
```
kubectl port-forward svc/tf-notebook 8080:80
error: error upgrading connection: error dialing backend: dial tcp [2406:da18:cf9:9a01:9fb2:f466:1e29:e199]:10250: connect: no route to host
```

Additional notes:
1. This only happens on IPv6 clusters. If I simply switch the attached eksctl template to IPv4, things work as expected.

2. The same thing happens with both eksctl managed node groups and Karpenter nodes. If it's an Ubuntu node, the node can be made to join the cluster, but the pods eventually lose connectivity. If the AmiFamily is switched to AL2 or IPv4, everything works as expected.

3. Another somewhat related issue is that when IPv6 is specified, the nodes do not join the cluster by default. It turned out that this is because of the following issue:
`Service Ipv6 Cidr must be provided when ip-family is specified as IPV6`.
I have to manually specify the IPV6_CIDR for the node to successfully join the cluster. Happens with both managed node groups and Karpenter nodes. I patched the bootstrap script to infer the IPV6_CIDR as follows:
```
SERVICE_IPV6_CIDR=$(aws eks describe-cluster --name ${CLUSTER_NAME} --query cluster.kubernetesNetworkConfig.serviceIpv6Cidr --output text)
```

Tags: cpc-3587
Revision history for this message
Nuwan (nuwan-ag) wrote :
description: updated
tags: added: cpc-3587
description: updated
Revision history for this message
Thomas Bechtold (toabctl) wrote :

Thanks Nuwan for reporting this bug. To proceed, we need more information. please provide:

* the logs (from kubelet, from the pod, journalctl, /etc/cloud/build.info) from the Ubuntu node
* exact steps & commands to reproduce the issue

Changed in cloud-images:
status: New → Incomplete
Revision history for this message
Nuwan (nuwan-ag) wrote :
Revision history for this message
Nuwan (nuwan-ag) wrote :

Thanks for looking into this Thomas. I've attached the logs you requested as well as a minimal bash script that recreates the issue (requires eksctl + kubectl). In essence:

1. Launch an eksctl cluster that uses ipv6 with an Ubuntu2004 worker node
2. Create a pod in that worker node
3. Wait for a while till the pods logs can no longer be viewed due to "no route to host"

Revision history for this message
Nuwan (nuwan-ag) wrote :

Just realized the tar archive I attached is empty. Have reattached a fixed version.

Revision history for this message
Nuwan (nuwan-ag) wrote :

Just realized the tar archive I attached is empty. Have reattached a fixed version.

Revision history for this message
Thomas Bechtold (toabctl) wrote :

@Nuwan, thx for the logs. Could you also provide the output of "systemctl show systemd-networkd.service" please?

Revision history for this message
Tomáš Virtus (virtustom) wrote (last edit ):

I've done some debugging and I think the issue is in the systemd-networkd DHCPv6 implementation in systemd 245.4-4ubuntu3.22.

After losing initial IPv6 address, systemd-networkd with debug logging enabled[1] logs the following periodically:

Jan 11 21:31:37 ip-192-168-16-204 systemd-networkd[60061]: DHCPv6 CLIENT: Sent SOLICIT
Jan 11 21:31:37 ip-192-168-16-204 systemd-networkd[60061]: DHCPv6 CLIENT: Next retransmission in 52s
Jan 11 21:31:37 ip-192-168-16-204 systemd-networkd[60061]: DHCPv6 CLIENT: ADVERTISE has wrong IAID for IA PD
Jan 11 21:31:37 ip-192-168-16-204 systemd-networkd[60061]: DHCPv6 CLIENT: Recv ADVERTISE

The same issue with same log messages is reported here: https://github.com/systemd/systemd/issues/20803 and the fix here https://github.com/systemd/systemd/pull/20807 is present in systemd 250. Unfortunately, the fix is not easy to backport to 245. I've tried to backport few relevant parent commits first, but too much has changed in src/libsystemd-networkd.

I've tried to modify systemd 245.4 with the attached patch that just ignores wrong IAID. With this patch applied, IPv6 adress is successfully set. In focal:

apt source systemd
cd systemd-245.4
patch -p1 <../systemd-ignore-wrong-dhcp6-iaid.patch
DEB_BUILD_OPTIONS="noopt nonocheck" debuild -b -uc -us -nc
cd ..
dpkg -i libsystemd0_245.4-4ubuntu3.22_amd64.deb systemd_245.4-4ubuntu3.22_amd64.deb

I have no idea why the IPv6 address is initially set. It's not set again even after you reboot the machine. That sounds like the DHCPv6 client state is stored somewhere but I couldn't find it anywhere in /var. I don't know how to debug it. I need to enable systemd-networkd debug logging before first boot. There's preBootstrapCommands property[2] on managedNodeGroup but cloud-init runs after systemd-networkd.

Currently I don't know how to fix it properly.

[1] https://superuser.com/questions/1187633/how-to-debug-systemd-networkd/1234599#1234599
[2] https://eksctl.io/usage/schema/#managedNodeGroups-preBootstrapCommands

Revision history for this message
Nuwan (nuwan-ag) wrote :

@virtustom Thanks for working on this and it looks like we may be a step closer to an answer. What I've been doing to add custom patches like the `SERVICE_IPV6_CIDR` mentioned above is to rebuild a Packer image with the latest Ubuntu EKS image as the base, with my changes layered on top. I suppose the same thing can be done to enable systemd debug logging?

Revision history for this message
Tomáš Virtus (virtustom) wrote :

we will fix this in 1.29 with the jammy migration

Changed in cloud-images:
assignee: nobody → Tomáš Virtus (virtustom)
status: Incomplete → Confirmed
Revision history for this message
Tomáš Virtus (virtustom) wrote :

@nuwan-ag I've modified the image with the systemd-networkd override that enables debug logging. The log of `journalctl -b -u systemd-networkd` is attached. It seems to me that the DHCPv6 server sets an IAID that passes systemd 245 checks on first few replies, and then it replies with a "wrong" IAID for the rest of the instance's life.

The diff between systemd 245 and 249.11 is too big. I don't think it's likely that the fix will be backported to 245, or that systemd will be updated to at least 249.11 in focal.

I've tested your example with jammy, and it works, node joins the cluster and doesn't lose the IPv6 address, and I can query the log repeatedly. EKS will be available on jammy in version 1.29.

I didn't have to set SERVICE_IPV6_CIDR. Could you please point me to where you found the quote "Service Ipv6 Cidr must be provided when ip-family is specified as IPV6"? AL2 images don't set as you do either: https://raw.githubusercontent.com/awslabs/amazon-eks-ami/master/files/bootstrap.sh

Revision history for this message
Nuwan (nuwan-ag) wrote :

@virtustom Makes sense to wait for Jammy, since the current ubuntu image is pretty ancient at this point.

Regarding `SERVICE_IPV6_CIDR`, I received that error from the bootstrap.sh file in the Ubuntu image. The closest I could find to it in the eks image is this line: https://github.com/awslabs/amazon-eks-ami/blob/632a6ddb2e5b9fedd1f6cd21bd3ce7d274153f61/files/bootstrap.sh#L460

It doesn't trigger an issue on the AL2 image because it's inferred on this line: https://github.com/awslabs/amazon-eks-ami/blob/632a6ddb2e5b9fedd1f6cd21bd3ce7d274153f61/files/bootstrap.sh#L383
I simplified that and ported it across.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.