Latest Ubuntu Cloud Image AMI for EKS is packaged with AWS CLI version 1.x which causes /etc/eks/bootstrap.sh to silently misconfigure the cluster DNS when the EKS cluster has a custom Service IP CIDR address

Bug #1982107 reported by Tenzin Lhakhang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-images
Fix Released
Undecided
Thomas Bechtold

Bug Description

What happened:
When an EKS cluster with a custom Kubernetes Service IP CIDR is created with Ubuntu cloud-image worker nodes, the /etc/eks/bootstrap.sh script silently misconfigures the --cluster-dns argument to kubelet in /var/snap/kubelet-eks/70/args file.

The Ubuntu cloud-image AMI (us-east-1, EKS 1.21) is ami-04c4f2c4799614025. This AMI from the official AWS EKS Ubuntu cloud images catalog https://cloud-images.ubuntu.com/docs/aws/eks/.

# Distro details
root@ip-10-109-4-64:~# lsb_release -a 2> /dev/null
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04
Codename: focal

# Kubernetes version
root@ip-10-109-4-64:~# kubectl version --short
Client Version: v1.21.9
Server Version: v1.21.12-eks-a64ea69

---

What you expected to happen:
The /etc/eks/bootstrap.sh script to properly configure the --cluster-dns argument to kubelet and not default to 172.20.0.10.

---

How to reproduce it (as minimally and precisely as possible):
Here is a link to the /etc/eks/bootstrap.sh script: https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh#L373.

In my test case, I've extracted the command of interest executed by the bootstrap.sh script:

#
# Below is the extracted command from the /etc/eks/bootstrap.sh script, see line 373
#
AWS_DEFAULT_REGION=us-east-1
CLUSTER_NAME=iat-dev-us-east-1-eks
aws eks describe-cluster \
            --region=${AWS_DEFAULT_REGION} \
            --name=${CLUSTER_NAME} \
            --output=text \
            --query 'cluster.{endpoint: endpoint, serviceIpv4Cidr: kubernetesNetworkConfig.serviceIpv4Cidr, serviceIpv6Cidr: kubernetesNetworkConfig.serviceIpv6Cidr, clusterIpFamily: kubernetesNetworkConfig.ipFamily}'

Failure when using AWS CLI version 1.x.x. The v1 of the CLI doesn't produce a kubernetesNetworkConfig field. The output is declared as None, and this causes the bootstrap.sh to fallback to the default EKS Service CIDR block.

#
# The Ubuntu cloud-image ami-04c4f2c4799614025 is installed with version 1.18 of the AWS CLI
# AMI image was retrieved for us-east-1 from: https://cloud-images.ubuntu.com/docs/aws/eks/
#
root@ip-10-109-4-64:~# aws --version/
aws-cli/1.18.69 Python/3.8.10 Linux/5.13.0-1031-aws botocore/1.16.19

#
# The aws_describe_cluster script produces None for the Service CIDR address.
#
root@ip-10-109-4-64:~# bash aws_describe_cluster.sh
None https://6CCB47A35A560104CFDE3CAF89B1A0D6.gr7.us-east-1.eks.amazonaws.com None None

---

Anything else we need to know?:

Working when used with AWS CLI version 2.x.x
See that the 3rd field contains the custom EKS service IP CIDR address.
#
# This host is installed with version 2.7.4 of the AWS CLI
#
root@test-awscli-1:~# aws --version
aws-cli/2.7.4 Python/3.9.11 Linux/5.10.0-11-amd64 exe/x86_64.debian.11 prompt/off

#
# The same script produces the correct Service CIDR address.
#
root@test-awscli-1:~# bash aws_describe_cluster.sh
ipv4 https://6CCB47A35A560104CFDE3CAF89B1A0D6.gr7.us-east-1.eks.amazonaws.com 10.109.16.0/20 None

---

I've also posted an issue on the Amazon-EKS-AMI github repository.
 It would be great if they could enhance the bootstrap.sh script to guard against AWS CLI version requirements. Issue link: https://github.com/awslabs/amazon-eks-ami/issues/963

Revision history for this message
Tenzin Lhakhang (tlhakhan) wrote :

GitHub maintainer said the below, which resolves my general bug described above. When the new Ubuntu Cloud AMI is created from the latest Amazon EKS AMI release, this bug should be resolved.

> We now install 2.x CLI instead of relying on the version available in the package manager. Unfortunately I don't have an update on Ubuntu's AMI, we don't track those issues here.

GitHub issues comment link: https://github.com/awslabs/amazon-eks-ami/issues/963#issuecomment-1320372621

Revision history for this message
Tenzin Lhakhang (tlhakhan) wrote :

GitHub issues comment link that mentions fix to this bug: https://github.com/awslabs/amazon-eks-ami/issues/963#issuecomment-1320372621

Changed in cloud-images:
status: New → Fix Released
Revision history for this message
Thomas Bechtold (toabctl) wrote :

@Tenzin, I don't think this is fixed for the Ubuntu EKS Worker nodes. The Ubuntu EKS worker nodes use the aws-cli snap which is still version 1. We are working (together with AWS) on an aws-cli 2.x snap and when that's available, we'll include it in the Ubuntu EKS worker nodes. But until then this bug is not fixed.

Changed in cloud-images:
status: Fix Released → In Progress
Revision history for this message
Tenzin Lhakhang (tlhakhan) wrote :

Ah! Thank you for verifying. I had misunderstood the GitHub comment from the contributor.

Changed in cloud-images:
assignee: nobody → Thomas Bechtold (toabctl)
Revision history for this message
Thomas Bechtold (toabctl) wrote :

The aws-cli snap has now a v2/stable channel with a recent version. This version will be included in the next round of EKS images.

Revision history for this message
Thomas Bechtold (toabctl) wrote :

@Tenzin, all images for all supported EKS versions (1.23 - 1.27 currently) with a serial >= 20230922 do have now aws-cli version 2 installed. So this should be fixed.
Please let us know if there are still problems.

Changed in cloud-images:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.