Issue with EKS Cluster joining on Latest Ubuntu EKS Optimized AMIs

Bug #2049611 reported by Gracjan Grabowski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-images
Fix Released
Undecided
Tomáš Virtus

Bug Description

Hello everyone,

I'm encountering a problem when using the newest Ubuntu AMI EKS optimized. Specifically, I'm facing issues with EKS Cluster joining.

I attempted the workaround suggested in this link: https://bugs.launchpad.net/cloud-images/+bug/2045791, but unfortunately, it didn't resolve the problem.

AMIs causing the issue:
ami-0f3ea2eb3faa6e2b6 ubuntu-eks/k8s_1.26/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20231213.1
ami-004b4213ca29ada16 ubuntu-eks/k8s_1.26/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20231105
ami-0984af2dedae97f46 ubuntu-eks/k8s_1.27/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20231213.1
ami-0fde44ab1d7b005e8 ubuntu-eks/k8s_1.27/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20231117

These AMIs are working well:
ami-074c0d8d07da7f245 ubuntu-eks/k8s_1.26/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20230616
ami-0ccf557a5464c4733 ubuntu-eks/k8s_1.27/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20230714

I'm encountering the following error in the EKS NodeGroup log: "NodeCreationFailure Couldn't proceed with the upgrade process as new nodes are not joining the node group."

I attempted to run bootstrap.sh manually, but it seems to be stuck:
/etc/eks/bootstrap.sh bd-int-eks-cluster --kubelet-extra-args '--max-pods=110' --b64-cluster-ca $B64_CLUSTER_CA --apiserver-endpoint $API_SERVER_URL --dns-cluster-ip $K8S_CLUSTER_DNS_IP --use-max-pods false

Using containerd as the container runtime
Aliasing EKS k8s snap commands
Stopping k8s daemons until configured
Stopped.
Cluster "kubernetes" set.
2024-01-17 09:33:27,092:__main__:INFO:Setting kubelet-eks key >cluster-dns< to >10.100.0.10<
2024-01-17 09:33:27,107:__main__:INFO:received '202/Accepted' from snapd for PUT on /v2/snaps/kubelet-eks/conf (change-id: 19)

Cloud-init-output.log:
Using containerd as the container runtime
Aliasing EKS k8s snap commands
Added:
  - kubelet-eks.kubelet as kubelet
Added:
  - kubectl-eks.kubectl as kubectl
Stopping k8s daemons until configured
Stopped.
Cluster "kubernetes" set.
2024-01-17 09:04:27,223:__main__:INFO:Setting kubelet-eks key >cluster-dns< to >10.100.0.10<
2024-01-17 09:04:27,239:__main__:INFO:received '202/Accepted' from snapd for PUT on /v2/snaps/kubelet-eks/conf (change-id: 6)
2024-01-17 09:08:27,296:__main__:ERROR:timeout while waiting for in-progress changes
2024-01-17 09:08:27,297:__main__:INFO:result for change: {'id': '6', 'kind': 'configure-snap', 'summary': 'Change configuration of "kubelet-eks" snap', 'status': 'Done', 'tasks': [{'id': '128', 'kind': 'run-hook', 'summary': 'Run configure hook of "kubelet-eks" snap', 'status': 'Done', 'progress': {'label': '', 'done': 1, 'total': 1}, 'spawn-time': '2024-01-17T09:04:27.224751922Z', 'ready-time': '2024-01-17T09:04:29.988173989Z'}], 'ready': True, 'spawn-time': '2024-01-17T09:04:27.224784346Z', 'ready-time': '2024-01-17T09:04:29.988177399Z'}
Container runtime is containerd total: 290.8 (484.0 KiB/s)
2024-01-17 09:08:31,174:__main__:INFO:Setting kubelet-eks key >container-runtime< to >remote<
2024-01-17 09:08:31,196:__main__:INFO:received '202/Accepted' from snapd for PUT on /v2/snaps/kubelet-eks/conf (change-id: 7)
2024-01-17 09:12:31,296:__main__:ERROR:timeout while waiting for in-progress changes
2024-01-17 09:12:31,298:__main__:INFO:result for change: {'id': '7', 'kind': 'configure-snap', 'summary': 'Change configuration of "kubelet-eks" snap', 'status': 'Done', 'tasks': [{'id': '129', 'kind': 'run-hook', 'summary': 'Run configure hook of "kubelet-eks" snap', 'status': 'Done', 'progress': {'label': '', 'done': 1, 'total': 1}, 'spawn-time': '2024-01-17T09:08:31.176039532Z', 'ready-time': '2024-01-17T09:08:33.89372478Z'}], 'ready': True, 'spawn-time': '2024-01-17T09:08:31.176065403Z', 'ready-time': '2024-01-17T09:08:33.89372832Z'}
2024-01-17 09:12:31,615:__main__:INFO:Setting kubelet-eks key >container-runtime-endpoint< to >unix:///run/containerd/containerd.sock<
2024-01-17 09:12:31,652:__main__:INFO:received '202/Accepted' from snapd for PUT on /v2/snaps/kubelet-eks/conf (change-id: 8)
2024-01-17 09:16:31,680:__main__:ERROR:timeout while waiting for in-progress changes
2024-01-17 09:16:31,688:__main__:INFO:result for change: {'id': '8', 'kind': 'configure-snap', 'summary': 'Change configuration of "kubelet-eks" snap', 'status': 'Done', 'tasks': [{'id': '130', 'kind': 'run-hook', 'summary': 'Run configure hook of "kubelet-eks" snap', 'status': 'Done', 'progress': {'label': '', 'done': 1, 'total': 1}, 'spawn-time': '2024-01-17T09:12:31.622651724Z', 'ready-time': '2024-01-17T09:12:35.823591143Z'}], 'ready': True, 'spawn-time': '2024-01-17T09:12:31.622687105Z', 'ready-time': '2024-01-17T09:12:35.823593913Z'}
cloud-provider is external
2024-01-17 09:16:35,553:__main__:INFO:Setting kubelet-eks key >hostname-override< to >ip-11-243-100-38.eu-west-1.compute.internal<
2024-01-17 09:16:35,594:__main__:INFO:received '202/Accepted' from snapd for PUT on /v2/snaps/kubelet-eks/conf (change-id: 9)
2024-01-17 09:20:35,616:__main__:ERROR:timeout while waiting for in-progress changes
2024-01-17 09:20:35,618:__main__:INFO:result for change: {'id': '9', 'kind': 'configure-snap', 'summary': 'Change configuration of "kubelet-eks" snap', 'status': 'Done', 'tasks': [{'id': '131', 'kind': 'run-hook', 'summary': 'Run configure hook of "kubelet-eks" snap', 'status': 'Done', 'progress': {'label': '', 'done': 1, 'total': 1}, 'spawn-time': '2024-01-17T09:16:35.57120716Z', 'ready-time': '2024-01-17T09:16:41.834718414Z'}], 'ready': True, 'spawn-time': '2024-01-17T09:16:35.571229331Z', 'ready-time': '2024-01-17T09:16:41.834721155Z'}
2024-01-17 09:20:35,820:__main__:INFO:Setting kubelet-eks key >image-credential-provider-config< to >/etc/eks/ecr-credential-provider/config.json<
2024-01-17 09:20:35,850:__main__:INFO:received '202/Accepted' from snapd for PUT on /v2/snaps/kubelet-eks/conf (change-id: 10)

Any assistance or suggestions would be greatly appreciated.
Thank you!

---
External link: https://warthogs.atlassian.net/browse/CPC-3731

Tags: cpc-3731
description: updated
tags: added: cpc-3731
Revision history for this message
Tomáš Virtus (virtustom) wrote :

Hey Gracjan. Thanks for the report. I will take a look. Could you please also share more details about your setup? Did you use eksctl? If so, can you share commands and configs you used?

Changed in cloud-images:
assignee: nobody → Tomáš Virtus (virtustom)
Revision history for this message
Gracjan Grabowski (gracjan-grabowski) wrote :

We use CloudFormation to set up the cluster. The AMI and Kubernetes version are specified in the CloudFormation parameters, and we just update the CloudFormation stack.
The master plane updates automatically during stack deployment. To change the worker nodes' AMI, we manually update the Launch Template version in the AWS EKS console.

We use also UserData script for WorkerNodes:
Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0
--//
Content-Type: text/x-shellscript; charset="us-ascii"
    #!/bin/bash
    # retrieve current region
    TOKEN=`curl -sX PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 3600"`
    region=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" -s http://169.254.169.254/latest/meta-data/placement/availability-zone | sed 's/\(.*\)[a-z]/\1/')
    secret_arn=$(aws --region $region ssm get-parameter --name /proxy/paas/secret/arn/${ProxyId} --query 'Parameter.Value' | xargs)
    credentials=$(aws --region $region secretsmanager get-secret-value --secret-id $secret_arn --query 'SecretString' --output text)
    username=$(echo $credentials | grep -o '\"username\":\"[a-zA-Z0-9+-]\{0,\}\"' | awk -F":" '{ print $2 }' | xargs)
    password=$(echo $credentials | grep -o '\"password\":\"[a-zA-Z0-9+-]\{0,\}\"' | awk -F":" '{ print $2 }' | xargs)
    # build HTTP proxy url
    proxy_http="http://$username:$password@**blurred**:8080"
    # build HTTPS proxy url
    proxy_https="https://$username:$password@**blurred**:8443"
    no_proxy=localhost,127.0.0.1,169.254.169.254,.internal,s3.amazonaws.com,.$region.amazonaws.com,ec2.$region.amazonaws.com
    NO_PROXY=localhost,127.0.0.1,169.254.169.254,.internal,s3.amazonaws.com,.$region.amazonaws.com,ec2.$region.amazonaws.com
    /bin/echo "export http_proxy=$proxy_http" > /etc/profile.d/proxy.sh
    /bin/echo "export https_proxy=$proxy_https" >> /etc/profile.d/proxy.sh
    /bin/echo "export HTTP_PROXY=$proxy_http" >> /etc/profile.d/proxy.sh
    /bin/echo "export HTTPS_PROXY=$proxy_https" >> /etc/profile.d/proxy.sh
    /bin/echo "export no_proxy=$no_proxy" >> /etc/profile.d/proxy.sh
    /bin/echo "export NO_PROXY=$no_proxy" >> /etc/profile.d/proxy.sh
    source /etc/profile
    # add apt setup script
    /bin/echo "# Making Apt Outbound Proxy aware" >> /etc/apt/apt.conf.d/proxy.conf
    /bin/echo "Acquire::http::Proxy \"socks5h://$username:$password@**blurred**:8000\";" >> /etc/apt/apt.conf.d/proxy.conf
    /bin/echo "Acquire::https::Proxy \"socks5h://$username:$password@**blurred**:8000\";" >> /etc/apt/apt.conf.d/proxy.conf
    # join the cluster
    B64_CLUSTER_CA=${EksCluster.CertificateAuthorityData}
    API_SERVER_URL=${EksCluster.Endpoint}
    K8S_CLUSTER_DNS_IP=10.100.0.10
    /etc/eks/bootstrap.sh ${EksCluster} --kubelet-extra-args '--max-pods=110' --b64-cluster-ca $B64_CLUSTER_CA --apiserver-endpoint $API_SERVER_URL --dns-cluster-ip $K8S_CLUSTER_DNS_IP --use-max-pods false
--//

Changed in cloud-images:
status: New → In Progress
Changed in cloud-images:
status: In Progress → Fix Committed
Revision history for this message
Gracjan Grabowski (gracjan-grabowski) wrote :

Were you able to fix this?

Revision history for this message
Thomas Bechtold (toabctl) wrote :

@Gracjan, can you try the images with serial 20240123 please? Those should be available for all EKS versions

Changed in cloud-images:
status: Fix Committed → Fix Released
Revision history for this message
Gracjan Grabowski (gracjan-grabowski) wrote :

Thank you, the latest version works fine.
ami-0c5cb5ca9c381bf22 - ubuntu-eks/k8s_1.27/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20240123

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.