Activity log for bug #2012689

Date Who What changed Old value New value Message
2023-03-24 02:43:11 DingGGu bug added bug
2023-03-24 02:47:40 DingGGu description We are running multiple clusters. The cluster that frequently scale-in and out sometimes fail to join the cluster. Looking at /var/log/user-data.log, running `snap start kubelet-eks` in /etc/eks/bootstrap.sh returns fail. Looking at journalctl, it seems as that is running without specifying kubelet's arguments at all. Some interesting log for snap does not read argument: kubelet-eks.daemon[935]: cat: /var/snap/kubelet-eks/92/args: No such file or directory kubelet runs fail same errors: kubelet-eks.daemon[889]: I0307 19:24:58.886750 889 util_unix.go:104] "Using this format as endpoint is deprecated, please consider using full url format." deprecatedFormat="" fullURLFormat="unix://" kubelet-eks.daemon[889]: W0307 19:24:58.888995 889 clientconn.go:1331] [core] grpc: Certains logs from journalctl: systemd[1]: Started containerd container runtime. systemd[1]: Started Service for snap application amazon-ssm-agent.amazon-ssm-agent. systemd[1]: Reloading. systemd[1]: Started Service for snap application kubelet-eks.daemon. systemd[1]: Started snap.kubelet-eks.hook.configure.3540f36b-29a1-4974-8c41-31995a6c637e.scope. kubelet-eks.daemon[935]: cat: /var/snap/kubelet-eks/92/args: No such file or directory amazon-ssm-agent.amazon-ssm-agent[833]: Error occurred fetching the seelog config file path: open /etc/amazon/ssm/seelog.xml: no such file or directory amazon-ssm-agent.amazon-ssm-agent[833]: Initializing new seelog logger amazon-ssm-agent.amazon-ssm-agent[833]: New Seelog Logger Creation Complete amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 WARN Error adding the directory '/etc/amazon/ssm' to watcher: no such file or directory amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO Proxy environment variables: amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO https_proxy: amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO http_proxy: amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO no_proxy: amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO Agent will take identity from EC2 amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO [amazon-ssm-agent] using named pipe channel for IPC amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO [amazon-ssm-agent] using named pipe channel for IPC amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO [amazon-ssm-agent] using named pipe channel for IPC amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO [amazon-ssm-agent] amazon-ssm-agent - v3.1.1732.0 amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO [amazon-ssm-agent] OS: linux, Arch: amd64 amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO [CredentialRefresher] Identity does not require credential refresher amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:54 INFO [amazon-ssm-agent] [LongRunningWorkerContainer] [WorkerProvider] Worker ssm-agent-worker is not running, starting worker process amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:54 INFO [amazon-ssm-agent] [LongRunningWorkerContainer] [WorkerProvider] Worker ssm-agent-worker (pid:948) started amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:54 INFO [amazon-ssm-agent] [LongRunningWorkerContainer] Monitor long running worker health every 60 seconds systemd[1]: snap.kubelet-eks.hook.configure.3540f36b-29a1-4974-8c41-31995a6c637e.scope: Succeeded. dbus-daemon[531]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service' requested by ':1.9' (uid=0 pid=539 comm="/usr/lib/snapd/snapd " label="unconfined") systemd[1]: Starting Time & Date Service... dbus-daemon[531]: [system] Successfully activated service 'org.freedesktop.timedate1' systemd[1]: Started Time & Date Service. systemd[1]: Started Kubernetes systemd probe. kubelet-eks.daemon[889]: I0307 19:24:58.858562 889 server.go:399] "Kubelet version" kubeletVersion="v1.24.9" kubelet-eks.daemon[889]: I0307 19:24:58.858619 889 server.go:401] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" kubelet-eks.daemon[889]: I0307 19:24:58.858841 889 server.go:562] "Standalone mode, no API client" systemd[1]: run-r5647ef4c140746af8f68048b9b657df0.scope: Succeeded. kubelet-eks.daemon[889]: I0307 19:24:58.886249 889 server.go:450] "No api server defined - no events will be sent to API server" kubelet-eks.daemon[889]: I0307 19:24:58.886266 889 server.go:648] "--cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /" kubelet-eks.daemon[889]: I0307 19:24:58.886544 889 container_manager_linux.go:262] "Container manager verified user specified cgroup-root exists" cgroupRoot=[] kubelet-eks.daemon[889]: I0307 19:24:58.886618 889 container_manager_linux.go:267] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: KubeletOOMScoreAdj:-999 ContainerRuntime: CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>} {Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerPolicyOptions:map[] ExperimentalTopologyManagerScope:container ExperimentalCPUManagerReconcilePeriod:10s ExperimentalMemoryManagerPolicy:None ExperimentalMemoryManagerReservedMemory:[] ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none} kubelet-eks.daemon[889]: I0307 19:24:58.886635 889 topology_manager.go:133] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container" kubelet-eks.daemon[889]: I0307 19:24:58.886644 889 container_manager_linux.go:302] "Creating device plugin manager" devicePluginEnabled=true kubelet-eks.daemon[889]: I0307 19:24:58.886706 889 state_mem.go:36] "Initialized new in-memory state store" kubelet-eks.daemon[889]: I0307 19:24:58.886750 889 util_unix.go:104] "Using this format as endpoint is deprecated, please consider using full url format." deprecatedFormat="" fullURLFormat="unix://" kubelet-eks.daemon[889]: W0307 19:24:58.888995 889 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to { <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial unix: missing address". Reconnecting... We are running multiple clusters. The cluster that frequently scale-in and out sometimes fail to join the cluster. Looking at /var/log/user-data.log, running `snap start kubelet-eks` in /etc/eks/bootstrap.sh returns fail. Looking at journalctl, it seems as that is running without specifying kubelet's arguments at all. If I manually run /etc/eks/bootstrap.sh after the nodes are orphaned, the cluster joins just fine. I think this is a timing issue related to the snap and argument settings. Using 1.24 AMI with ami-04c00a6fc53487c5a Some interesting log for snap does not read argument: kubelet-eks.daemon[935]: cat: /var/snap/kubelet-eks/92/args: No such file or directory kubelet runs fail same errors: kubelet-eks.daemon[889]: I0307 19:24:58.886750 889 util_unix.go:104] "Using this format as endpoint is deprecated, please consider using full url format." deprecatedFormat="" fullURLFormat="unix://" kubelet-eks.daemon[889]: W0307 19:24:58.888995 889 clientconn.go:1331] [core] grpc: Certains logs from journalctl: systemd[1]: Started containerd container runtime. systemd[1]: Started Service for snap application amazon-ssm-agent.amazon-ssm-agent. systemd[1]: Reloading. systemd[1]: Started Service for snap application kubelet-eks.daemon. systemd[1]: Started snap.kubelet-eks.hook.configure.3540f36b-29a1-4974-8c41-31995a6c637e.scope. kubelet-eks.daemon[935]: cat: /var/snap/kubelet-eks/92/args: No such file or directory amazon-ssm-agent.amazon-ssm-agent[833]: Error occurred fetching the seelog config file path: open /etc/amazon/ssm/seelog.xml: no such file or directory amazon-ssm-agent.amazon-ssm-agent[833]: Initializing new seelog logger amazon-ssm-agent.amazon-ssm-agent[833]: New Seelog Logger Creation Complete amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 WARN Error adding the directory '/etc/amazon/ssm' to watcher: no such file or directory amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO Proxy environment variables: amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO https_proxy: amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO http_proxy: amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO no_proxy: amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO Agent will take identity from EC2 amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO [amazon-ssm-agent] using named pipe channel for IPC amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO [amazon-ssm-agent] using named pipe channel for IPC amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO [amazon-ssm-agent] using named pipe channel for IPC amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO [amazon-ssm-agent] amazon-ssm-agent - v3.1.1732.0 amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO [amazon-ssm-agent] OS: linux, Arch: amd64 amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:53 INFO [CredentialRefresher] Identity does not require credential refresher amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:54 INFO [amazon-ssm-agent] [LongRunningWorkerContainer] [WorkerProvider] Worker ssm-agent-worker is not running, starting worker process amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:54 INFO [amazon-ssm-agent] [LongRunningWorkerContainer] [WorkerProvider] Worker ssm-agent-worker (pid:948) started amazon-ssm-agent.amazon-ssm-agent[833]: 2023-03-07 19:24:54 INFO [amazon-ssm-agent] [LongRunningWorkerContainer] Monitor long running worker health every 60 seconds systemd[1]: snap.kubelet-eks.hook.configure.3540f36b-29a1-4974-8c41-31995a6c637e.scope: Succeeded. dbus-daemon[531]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service' requested by ':1.9' (uid=0 pid=539 comm="/usr/lib/snapd/snapd " label="unconfined") systemd[1]: Starting Time & Date Service... dbus-daemon[531]: [system] Successfully activated service 'org.freedesktop.timedate1' systemd[1]: Started Time & Date Service. systemd[1]: Started Kubernetes systemd probe. kubelet-eks.daemon[889]: I0307 19:24:58.858562 889 server.go:399] "Kubelet version" kubeletVersion="v1.24.9" kubelet-eks.daemon[889]: I0307 19:24:58.858619 889 server.go:401] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" kubelet-eks.daemon[889]: I0307 19:24:58.858841 889 server.go:562] "Standalone mode, no API client" systemd[1]: run-r5647ef4c140746af8f68048b9b657df0.scope: Succeeded. kubelet-eks.daemon[889]: I0307 19:24:58.886249 889 server.go:450] "No api server defined - no events will be sent to API server" kubelet-eks.daemon[889]: I0307 19:24:58.886266 889 server.go:648] "--cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /" kubelet-eks.daemon[889]: I0307 19:24:58.886544 889 container_manager_linux.go:262] "Container manager verified user specified cgroup-root exists" cgroupRoot=[] kubelet-eks.daemon[889]: I0307 19:24:58.886618 889 container_manager_linux.go:267] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: KubeletOOMScoreAdj:-999 ContainerRuntime: CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>} {Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerPolicyOptions:map[] ExperimentalTopologyManagerScope:container ExperimentalCPUManagerReconcilePeriod:10s ExperimentalMemoryManagerPolicy:None ExperimentalMemoryManagerReservedMemory:[] ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none} kubelet-eks.daemon[889]: I0307 19:24:58.886635 889 topology_manager.go:133] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container" kubelet-eks.daemon[889]: I0307 19:24:58.886644 889 container_manager_linux.go:302] "Creating device plugin manager" devicePluginEnabled=true kubelet-eks.daemon[889]: I0307 19:24:58.886706 889 state_mem.go:36] "Initialized new in-memory state store" kubelet-eks.daemon[889]: I0307 19:24:58.886750 889 util_unix.go:104] "Using this format as endpoint is deprecated, please consider using full url format." deprecatedFormat="" fullURLFormat="unix://" kubelet-eks.daemon[889]: W0307 19:24:58.888995 889 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to { <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial unix: missing address". Reconnecting...
2023-04-04 13:34:06 Robby Pocase cloud-images: importance Undecided High
2023-04-28 19:18:51 George Kraft bug added subscriber George Kraft
2023-05-17 11:57:50 Thomas Bechtold cloud-images: status New Fix Released
2023-05-17 11:57:53 Thomas Bechtold cloud-images: assignee Thomas Bechtold (toabctl)
2023-05-18 19:24:43 Robby Pocase cloud-images: status Fix Released In Progress
2023-05-22 13:57:42 Kevin W Monroe merge proposal linked https://code.launchpad.net/~k8s-jenkaas-admins/snap-kubelet/+git/snap-kubelet/+merge/443294
2023-05-30 11:51:25 Robby Pocase cloud-images: status In Progress Fix Released
2023-05-30 13:11:49 Nikita Somikov attachment added Screenshot-1.png https://bugs.launchpad.net/cloud-images/+bug/2012689/+attachment/5676599/+files/Screenshot-1.png
2023-05-30 13:12:02 Nikita Somikov attachment added Screenshot-2.png https://bugs.launchpad.net/cloud-images/+bug/2012689/+attachment/5676600/+files/Screenshot-2.png
2023-05-30 13:12:14 Nikita Somikov attachment added Screenshot-3.png https://bugs.launchpad.net/cloud-images/+bug/2012689/+attachment/5676601/+files/Screenshot-3.png
2023-06-01 07:57:42 Nikita Somikov attachment added snapd.txt https://bugs.launchpad.net/cloud-images/+bug/2012689/+attachment/5677121/+files/snapd.txt
2023-06-01 07:57:56 Nikita Somikov attachment added packer-template.json https://bugs.launchpad.net/cloud-images/+bug/2012689/+attachment/5677122/+files/packer-template.json
2023-06-01 07:58:25 Nikita Somikov attachment added remove-swap.sh https://bugs.launchpad.net/cloud-images/+bug/2012689/+attachment/5677123/+files/remove-swap.sh
2023-06-01 07:58:42 Nikita Somikov attachment added bootstrap.sh https://bugs.launchpad.net/cloud-images/+bug/2012689/+attachment/5677124/+files/bootstrap.sh
2023-06-01 07:58:57 Nikita Somikov attachment added linux-tuning.sh https://bugs.launchpad.net/cloud-images/+bug/2012689/+attachment/5677125/+files/linux-tuning.sh
2023-06-08 07:15:56 Thomas Bechtold cloud-images: assignee Thomas Bechtold (toabctl)
2023-06-08 07:16:09 Thomas Bechtold cloud-images: assignee Thomas Bechtold (toabctl)
2023-06-08 21:10:09 Varun Agarwal bug added subscriber Varun Agarwal
2023-06-12 20:36:01 Fabio Augusto Miranda Martins bug added subscriber Fabio Augusto Miranda Martins
2023-06-16 10:58:02 Piotr Zalewski bug added subscriber Piotr Zalewski