I tried to connect managed node groups with the following base AMIs:
ubuntu-eks/k8s_1.25/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20231201 -- fails to label the node
ubuntu-eks/k8s_1.26/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20231201 -- fails to label the node
ubuntu-eks/k8s_1.26/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20231204.1--fails to join the cluster with a error in user-data.log on line 506 in the bootstrap.sh script.
I guess I am just unlucky in that I tried to roll a new ubuntu AMI to our cluster this week.
As they called out the /etc/eks/ootstrap.sh is different and has changed in these newer amis and has issues in all the different versions I have tried
At first it was just labeling was not working
I could see the kubelet was just not being started with the node labels
In a working 1.24 image it looks like
$ ps -ef | grep kube
root 3833 1 1 13:33 ? 00:04:08 /snap/kubelet-eks/198/kubelet --node-labels=ec2.amazonaws.com/as-label-env=dev2,ec2.amazonaws.com/as-label-type=paravision-processor_gpu --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook --authorization-mode=Webhook --cgroup-driver=cgroupfs --client-ca-file=/etc/kubernetes/pki/ca.crt --cloud-provider=aws --cluster-dns=172.20.0.10 --cluster-domain=cluster.local --config=/etc/kubernetes/kubelet/kubelet-config.json --container-runtime=remote --container-runtime-endpoint=unix:///run/containerd/containerd.sock --feature-gates=RotateKubeletServerCertificate=true --kubeconfig=/var/lib/kubelet/kubeconfig --node-ip=10.0.20.16 --pod-infra-container-image=602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/pause:3.5 --register-node --resolv-conf=/run/systemd/resolve/resolv.conf
-------------------
Seeing those bug reports I tried to grab the latest AMI just now, and that one doesn't even connect to our cluster.
Here is the user-data.log error
....
2023-12-06 16:15:26,674:__main__:INFO:No more changes in progress ...
2023-12-06 16:15:26,676:__main__:INFO:result for change: {'id': '28', 'kind': 'configure-snap', 'summary': 'Change configuration of "kubelet-eks" snap', 'status': 'Done', 'tasks': [{'id': '151', 'kind': 'run-hook', 'summary': 'Run configure hook of "kubelet-eks" snap', 'status': 'Done', 'progress': {'label': '', 'done': 1, 'total': 1}, 'spawn-time': '2023-12-06T16:15:25.546818263Z', 'ready-time': '2023-12-06T16:15:26.659445389Z'}], 'ready': True, 'spawn-time': '2023-12-06T16:15:25.546834552Z', 'ready-time': '2023-12-06T16:15:26.659446614Z'}
usage: snapdhelper.py configure [-h] snapname key value
snapdhelper.py configure: error: the following arguments are required: value
Exited with error on line 506
-----------------
Also our user-data script for all of these is the same and looks like this for example
#!/bin/bash
#
# This script is meant to be run in the User Data of each EKS worker instance that hosts applications. It registers the
# instance with the proper EKS cluster based on data provided by Terraform. Note that this script assumes it is running
# from an AMI that is derived from the EKS optimized AMIs that AWS provides.
set -e
# Send the log output from this script to user-data.log, syslog, and the console
# From: https://alestic.com/2010/12/ec2-user-data-output/
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
# Here we call the bootstrap script to register the EKS worker node to the control plane.
# Maps tags to labels for tags with the specific label prefix defined in var.worker_label_prefix
# https://github.com/gruntwork-io/terraform-aws-eks/tree/master/modules/eks-scripts
function register_eks_worker {
NODE_LABELS="ec2.amazonaws.com/as-label-env=dev2,ec2.amazonaws.com/as-label-type=paravision-processor_gpu"
/etc/eks/bootstrap.sh \
--apiserver-endpoint "https://C870147FDA923006BED90BC4DE7A2B34.gr7.us-east-2.eks.amazonaws.com" \
--b64-cluster-ca "XXXXX" --kubelet-extra-args "--node-labels=\"$NODE_LABELS\"" \
"saas-dev2-eks"
}
function run {
register_eks_worker
}
run
Happy to attach more logs etc if you just let me know what you want. Hoping someone can help me!
We have an EKS cluster in aws 1.25
I tried to connect managed node groups with the following base AMIs:
ubuntu- eks/k8s_ 1.25/images/ hvm-ssd/ ubuntu- focal-20. 04-amd64- server- 20231201 -- fails to label the node eks/k8s_ 1.26/images/ hvm-ssd/ ubuntu- focal-20. 04-amd64- server- 20231201 -- fails to label the node eks/k8s_ 1.26/images/ hvm-ssd/ ubuntu- focal-20. 04-amd64- server- 20231204. 1--fails to join the cluster with a error in user-data.log on line 506 in the bootstrap.sh script.
ubuntu-
ubuntu-
I guess I am just unlucky in that I tried to roll a new ubuntu AMI to our cluster this week.
I believe these bugs are related https:/ /bugs.launchpad .net/cloud- images/ +bug/2040477 and https:/ /bugs.launchpad .net/cloud- images/ +bug/2045311
As they called out the /etc/eks/ ootstrap. sh is different and has changed in these newer amis and has issues in all the different versions I have tried
At first it was just labeling was not working
I could see the kubelet was just not being started with the node labels eks/198/ kubelet --node- labels= ec2.amazonaws. com/as- label-env= dev2,ec2. amazonaws. com/as- label-type= paravision- processor_ gpu --address=0.0.0.0 --anonymous- auth=false --authenticatio n-token- webhook --authorization -mode=Webhook --cgroup- driver= cgroupfs --client- ca-file= /etc/kubernetes /pki/ca. crt --cloud- provider= aws --cluster- dns=172. 20.0.10 --cluster- domain= cluster. local --config= /etc/kubernetes /kubelet/ kubelet- config. json --container- runtime= remote --container- runtime- endpoint= unix:// /run/containerd /containerd. sock --feature- gates=RotateKub eletServerCerti ficate= true --kubeconfig= /var/lib/ kubelet/ kubeconfig --node- ip=10.0. 20.16 --pod-infra- container- image=602401143 452.dkr. ecr.us- east-2. amazonaws. com/eks/ pause:3. 5 --register-node --resolv- conf=/run/ systemd/ resolve/ resolv. conf
In a working 1.24 image it looks like
$ ps -ef | grep kube
root 3833 1 1 13:33 ? 00:04:08 /snap/kubelet-
Where in one of my first two above it shows
$ ps -ef | grep kube eks/202/ kubelet --address=0.0.0.0 --anonymous- auth=false --authenticatio n-token- webhook --authorization -mode=Webhook --cgroup- driver= cgroupfs --client- ca-file= /etc/kubernetes /pki/ca. crt --cloud- provider= aws --cluster- dns=172. 20.0.10 --cluster- domain= cluster. local --config= /etc/kubernetes /kubelet/ kubelet- config. json --container- runtime= remote --container- runtime- endpoint= unix:// /run/containerd /containerd. sock --kubeconfig= /var/lib/ kubelet/ kubeconfig --node- ip=10.0. 21.117 --pod-infra- container- image=602401143 452.dkr. ecr.us- east-2. amazonaws. com/eks/ pause:3. 5 --register-node --resolv- conf=/run/ systemd/ resolve/ resolv. conf
root 4059 1 1 Dec05 ? 00:24:01 /snap/kubelet-
-------------------
Seeing those bug reports I tried to grab the latest AMI just now, and that one doesn't even connect to our cluster.
Here is the user-data.log error 674:__main_ _:INFO: No more changes in progress ... 676:__main_ _:INFO: result for change: {'id': '28', 'kind': 'configure-snap', 'summary': 'Change configuration of "kubelet-eks" snap', 'status': 'Done', 'tasks': [{'id': '151', 'kind': 'run-hook', 'summary': 'Run configure hook of "kubelet-eks" snap', 'status': 'Done', 'progress': {'label': '', 'done': 1, 'total': 1}, 'spawn-time': '2023-12- 06T16:15: 25.546818263Z' , 'ready-time': '2023-12- 06T16:15: 26.659445389Z' }], 'ready': True, 'spawn-time': '2023-12- 06T16:15: 25.546834552Z' , 'ready-time': '2023-12- 06T16:15: 26.659446614Z' }
....
2023-12-06 16:15:26,
2023-12-06 16:15:26,
usage: snapdhelper.py configure [-h] snapname key value
snapdhelper.py configure: error: the following arguments are required: value
Exited with error on line 506
-----------------
Also our user-data script for all of these is the same and looks like this for example
#!/bin/bash
#
# This script is meant to be run in the User Data of each EKS worker instance that hosts applications. It registers the
# instance with the proper EKS cluster based on data provided by Terraform. Note that this script assumes it is running
# from an AMI that is derived from the EKS optimized AMIs that AWS provides.
set -e
# Send the log output from this script to user-data.log, syslog, and the console /alestic. com/2010/ 12/ec2- user-data- output/ user-data. log|logger -t user-data -s 2>/dev/console) 2>&1
# From: https:/
exec > >(tee /var/log/
# Here we call the bootstrap script to register the EKS worker node to the control plane. label_prefix /github. com/gruntwork- io/terraform- aws-eks/ tree/master/ modules/ eks-scripts LABELS= "ec2.amazonaws. com/as- label-env= dev2,ec2. amazonaws. com/as- label-type= paravision- processor_ gpu" eks/bootstrap. sh \ -endpoint "https:/ /C870147FDA9230 06BED90BC4DE7A2 B34.gr7. us-east- 2.eks.amazonaws .com" \ cluster- ca "XXXXX" --kubelet- extra-args "--node- labels= \"$NODE_ LABELS\ "" \
# Maps tags to labels for tags with the specific label prefix defined in var.worker_
# https:/
function register_eks_worker {
NODE_
/etc/
--apiserver
--b64-
"saas-dev2-eks"
}
function run { eks_worker
register_
}
run
Happy to attach more logs etc if you just let me know what you want. Hoping someone can help me!