Ubuntu AMI (ami-06114b38b9273f7c2) failed to join cluster in UAE region due to 403 on pause container

Bug #2002659 reported by Chase Nickels
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-images
Fix Released
High
Thomas Bechtold

Bug Description

Customer is working on a POC to test EKS in the me-central-1 region and they shared EC2 instances based of the Ubuntu EKS Optimized AMI failed to join cluster when using managed node groups.

I've been able to repeat and identify the issue with the following steps:

1. Created a new EKS cluster in me-central-1 with the following cluster configuration:

---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: uae-poc-test
  region: me-central-1

managedNodeGroups:
  - name: custom-ng-2
    minSize: 1
    maxSize: 4
    amiFamily: Ubuntu2004

2. The CloudFormation stack rolls back due to Ubuntu node unable to join cluster. It used this AMI: ami-06114b38b9273f7c2.

3. Looking the Cloud init logs we can see the following 403 error on the pause container:

Cloud-init v. 22.4.2-0ubuntu0~20.04.2 running 'modules:config' at Wed, 11 Jan 2023 14:43:31 +0000. Up 40.30 seconds.
eksctl: running /etc/eks/bootstrap
Aliasing EKS k8s snap commands
Added:
  - kubelet-eks.kubelet as kubelet
Added:
  - kubectl-eks.kubectl as kubectl
Stopping k8s daemons until configured
Stopped.
Cluster "kubernetes" set.
Container runtime is containerd
Attempt 5 of 5
ctr: failed to resolve reference "602401143452.dkr.ecr.me-central-1.amazonaws.com/eks/pause:3.5": pulling from host 602401143452.dkr.ecr.me-central-1.amazonaws.com failed with status code [manifests 3.5]:403 Forbidden

Based on the Amazon container image registries (https://docs.aws.amazon.com/eks/latest/userguide/add-ons-images.html), it looks like it's using the wrong AWS region ECR registry as by specifying the AMI used by the managed node group and overriding --pause-container-account in the bootstrap command as per the below configuration, the node registers as expected.

---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: uae-poc-test
  region: me-central-1

managedNodeGroups:
  - name: custom-ng-3
    ami: ami-06114b38b9273f7c2
    minSize: 1
    maxSize: 4
    overrideBootstrapCommand: |
      #!/bin/bash
      /etc/eks/bootstrap.sh <cluster> --pause-container-account 759879836304

affects: compiz-plugins-main (Ubuntu) → cloud-images
Changed in cloud-images:
status: New → Confirmed
assignee: nobody → Thomas Bechtold (toabctl)
importance: Undecided → High
status: Confirmed → In Progress
tags: added: eks
tags: removed: eks
Revision history for this message
Thomas Bechtold (toabctl) wrote :

Images with the serial 20230118 do fix that problem. Please let us know if you still have problems.

Changed in cloud-images:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.