I discussed this with Samir and had a look through logs. The aws-node pod is in CrashLoopBackOff because the aws-vpc-cni-init container seems stuck waiting for ipamd:
2023-04-26T13:10:24.619437213Z stdout F {"level":"info","ts":"2023-04-26T13:10:24.618Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
2023-04-26T13:10:26.639431058Z stdout F {"level":"info","ts":"2023-04-26T13:10:26.636Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
2023-04-26T13:10:28.649483016Z stdout F {"level":"info","ts":"2023-04-26T13:10:28.648Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
ipamd doesn't log any obvious errors, but does seem to be restarting constantly, where the last thing it logs is:
{"level":"info","ts":"2023-04-26T13:08:52.336Z","caller":"ipamd/ipamd.go:509","msg":"Reading ipam state from CRI"}
{"level":"debug","ts":"2023-04-26T13:08:52.336Z","caller":"datastore/data_store.go:389","msg":"Getting running pod sandboxes from \"unix:///var/run/dockershim.sock\""}
I discussed this with Samir and had a look through logs. The aws-node pod is in CrashLoopBackOff because the aws-vpc-cni-init container seems stuck waiting for ipamd:
2023-04- 26T13:10: 24.619437213Z stdout F {"level" :"info" ,"ts":" 2023-04- 26T13:10: 24.618Z" ,"caller" :"entrypoint. sh","msg" :"Retrying waiting for IPAM-D"} 26T13:10: 26.639431058Z stdout F {"level" :"info" ,"ts":" 2023-04- 26T13:10: 26.636Z" ,"caller" :"entrypoint. sh","msg" :"Retrying waiting for IPAM-D"} 26T13:10: 28.649483016Z stdout F {"level" :"info" ,"ts":" 2023-04- 26T13:10: 28.648Z" ,"caller" :"entrypoint. sh","msg" :"Retrying waiting for IPAM-D"}
2023-04-
2023-04-
ipamd doesn't log any obvious errors, but does seem to be restarting constantly, where the last thing it logs is:
{"level" :"info" ,"ts":" 2023-04- 26T13:08: 52.336Z" ,"caller" :"ipamd/ ipamd.go: 509","msg" :"Reading ipam state from CRI"} :"debug" ,"ts":" 2023-04- 26T13:08: 52.336Z" ,"caller" :"datastore/ data_store. go:389" ,"msg": "Getting running pod sandboxes from \"unix: ///var/ run/dockershim. sock\"" }
{"level"
This looks similar to https:/ /github. com/aws/ amazon- vpc-cni- k8s/issues/ 2133 which was fixed in the Amazon EKS node's bootstrap.sh via https:/ /github. com/awslabs/ amazon- eks-ami/ pull/921/ files
I believe a similar fix needs to be applied to the bootstrap.sh used in Ubuntu EKS nodes.