cloud-images

Nodes using Ubuntu AMIs lose connectivity after a while on ipv6 EKS clusters

Bug #2046323 reported by Nuwan on 2023-12-13

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	cloud-images	Confirmed	Undecided	Tomáš Virtus

Bug Description

I'm trying to add Ubuntu nodes to my IPv6 EKS cluster (an eksctl template is attached for reference but it's a fairly simple one).

Everything seems fine initially, but after 5-10 minutes, any containers running on the new node cannot be exec'd into, cannot have their logs viewed or ports-forwarded, although they appear to be running fine otherwise. It seems that traffic stops routing to pods with the following error:
```
kubectl port-forward svc/tf-notebook 8080:80
error: error upgrading connection: error dialing backend: dial tcp [2406:da18:cf9:9a01:9fb2:f466:1e29:e199]:10250: connect: no route to host
```

Additional notes:
1. This only happens on IPv6 clusters. If I simply switch the attached eksctl template to IPv4, things work as expected.

2. The same thing happens with both eksctl managed node groups and Karpenter nodes. If it's an Ubuntu node, the node can be made to join the cluster, but the pods eventually lose connectivity. If the AmiFamily is switched to AL2 or IPv4, everything works as expected.

3. Another somewhat related issue is that when IPv6 is specified, the nodes do not join the cluster by default. It turned out that this is because of the following issue:
`Service Ipv6 Cidr must be provided when ip-family is specified as IPV6`.
I have to manually specify the IPV6_CIDR for the node to successfully join the cluster. Happens with both managed node groups and Karpenter nodes. I patched the bootstrap script to infer the IPV6_CIDR as follows:
```
SERVICE_IPV6_CIDR=$(aws eks describe-cluster --name ${CLUSTER_NAME} --query cluster.kubernetesNetworkConfig.serviceIpv6Cidr --output text)
```

Tags:

Revision history for this message

Nuwan (nuwan-ag) wrote on 2023-12-13:

eksctl template Edit (702 bytes, text/plain)

Thomas Bechtold (toabctl) on 2023-12-13

description:	updated
tags:	added: cpc-3587

Thomas Bechtold (toabctl) on 2023-12-13

description:

updated

Revision history for this message

Thomas Bechtold (toabctl) wrote on 2023-12-14:

Thanks Nuwan for reporting this bug. To proceed, we need more information. please provide:

* the logs (from kubelet, from the pod, journalctl, /etc/cloud/build.info) from the Ubuntu node
* exact steps & commands to reproduce the issue

Changed in cloud-images:
status:	New → Incomplete

Revision history for this message

Nuwan (nuwan-ag) wrote on 2023-12-14:

/etc/cloud/build.info Edit (39 bytes, application/x-info)

Revision history for this message

Nuwan (nuwan-ag) wrote on 2023-12-14:

Script to recreate and requested logs Edit (29 bytes, application/x-tar)

Thanks for looking into this Thomas. I've attached the logs you requested as well as a minimal bash script that recreates the issue (requires eksctl + kubectl). In essence:

1. Launch an eksctl cluster that uses ipv6 with an Ubuntu2004 worker node
2. Create a pod in that worker node
3. Wait for a while till the pods logs can no longer be viewed due to "no route to host"

Revision history for this message

Nuwan (nuwan-ag) wrote on 2023-12-14:

Logs and scripts Edit (76.0 KiB, application/x-tar)

Just realized the tar archive I attached is empty. Have reattached a fixed version.

Revision history for this message

Nuwan (nuwan-ag) wrote on 2023-12-14:

Logs and scripts Edit (76.0 KiB, application/x-tar)

Just realized the tar archive I attached is empty. Have reattached a fixed version.

Revision history for this message

Thomas Bechtold (toabctl) wrote on 2024-01-03:

@Nuwan, thx for the logs. Could you also provide the output of "systemctl show systemd-networkd.service" please?

Revision history for this message

Tomáš Virtus (virtustom) wrote on 2024-01-11 (last edit on 2024-01-11):

systemd-ignore-wrong-dhcp6-iaid.patch Edit (785 bytes, text/plain)

I've done some debugging and I think the issue is in the systemd-networkd DHCPv6 implementation in systemd 245.4-4ubuntu3.22.

After losing initial IPv6 address, systemd-networkd with debug logging enabled[1] logs the following periodically:

Jan 11 21:31:37 ip-192-168-16-204 systemd-networkd[60061]: DHCPv6 CLIENT: Sent SOLICIT
Jan 11 21:31:37 ip-192-168-16-204 systemd-networkd[60061]: DHCPv6 CLIENT: Next retransmission in 52s
Jan 11 21:31:37 ip-192-168-16-204 systemd-networkd[60061]: DHCPv6 CLIENT: ADVERTISE has wrong IAID for IA PD
Jan 11 21:31:37 ip-192-168-16-204 systemd-networkd[60061]: DHCPv6 CLIENT: Recv ADVERTISE

The same issue with same log messages is reported here: https://github.com/systemd/systemd/issues/20803 and the fix here https://github.com/systemd/systemd/pull/20807 is present in systemd 250. Unfortunately, the fix is not easy to backport to 245. I've tried to backport few relevant parent commits first, but too much has changed in src/libsystemd-networkd.

I've tried to modify systemd 245.4 with the attached patch that just ignores wrong IAID. With this patch applied, IPv6 adress is successfully set. In focal:

apt source systemd
cd systemd-245.4
patch -p1 <../systemd-ignore-wrong-dhcp6-iaid.patch
DEB_BUILD_OPTIONS="noopt nonocheck" debuild -b -uc -us -nc
cd ..
dpkg -i libsystemd0_245.4-4ubuntu3.22_amd64.deb systemd_245.4-4ubuntu3.22_amd64.deb

I have no idea why the IPv6 address is initially set. It's not set again even after you reboot the machine. That sounds like the DHCPv6 client state is stored somewhere but I couldn't find it anywhere in /var. I don't know how to debug it. I need to enable systemd-networkd debug logging before first boot. There's preBootstrapCommands property[2] on managedNodeGroup but cloud-init runs after systemd-networkd.

Currently I don't know how to fix it properly.

[1] https://superuser.com/questions/1187633/how-to-debug-systemd-networkd/1234599#1234599
[2] https://eksctl.io/usage/schema/#managedNodeGroups-preBootstrapCommands

I've done some debugging and I think the issue is in the systemd-networkd DHCPv6 implementation in systemd 245.4-4ubuntu3.22.

After losing initial IPv6 address, systemd-networkd with debug logging enabled[1] logs the following periodically:

I've tried to modify systemd 245.4 with the attached patch that just ignores wrong IAID. With this patch applied, IPv6 adress is successfully set. In focal:

Currently I don't know how to fix it properly.

[1] https://superuser.com/questions/1187633/how-to-debug-systemd-networkd/1234599#1234599
[2] https://eksctl.io/usage/schema/#managedNodeGroups-preBootstrapCommands

Revision history for this message

Nuwan (nuwan-ag) wrote on 2024-01-12:

@virtustom Thanks for working on this and it looks like we may be a step closer to an answer. What I've been doing to add custom patches like the `SERVICE_IPV6_CIDR` mentioned above is to rebuild a Packer image with the latest Ubuntu EKS image as the base, with my changes layered on top. I suppose the same thing can be done to enable systemd debug logging?

Revision history for this message

Tomáš Virtus (virtustom) wrote on 2024-01-12:

#10

we will fix this in 1.29 with the jammy migration

Changed in cloud-images:
assignee:	nobody → Tomáš Virtus (virtustom)
status:	Incomplete → Confirmed

Revision history for this message

Tomáš Virtus (virtustom) wrote on 2024-01-15:

#11

systemd-networkd.log Edit (230.5 KiB, text/plain)

@nuwan-ag I've modified the image with the systemd-networkd override that enables debug logging. The log of `journalctl -b -u systemd-networkd` is attached. It seems to me that the DHCPv6 server sets an IAID that passes systemd 245 checks on first few replies, and then it replies with a "wrong" IAID for the rest of the instance's life.

The diff between systemd 245 and 249.11 is too big. I don't think it's likely that the fix will be backported to 245, or that systemd will be updated to at least 249.11 in focal.

I've tested your example with jammy, and it works, node joins the cluster and doesn't lose the IPv6 address, and I can query the log repeatedly. EKS will be available on jammy in version 1.29.

I didn't have to set SERVICE_IPV6_CIDR. Could you please point me to where you found the quote "Service Ipv6 Cidr must be provided when ip-family is specified as IPV6"? AL2 images don't set as you do either: https://raw.githubusercontent.com/awslabs/amazon-eks-ami/master/files/bootstrap.sh