Activity log for bug #2053034

Date Who What changed Old value New value Message
2024-02-13 14:59:49 Amjad Chami bug added bug
2024-02-13 15:15:28 Amjad Chami summary A Calico Pod is stuck waiting during deployment A Calico Pod is stuck waiting during deployment, unable to reach the ip of a deleted node
2024-02-13 15:17:48 Amjad Chami description A Calico pod is stuck waiting during a k8s deployment. The logs show ----------------------------------------------------------- 2024-02-09 08:35:22 ERROR unit.calico/4.juju-log server.go:325 b'resource does not exist: Node(ip-172-31-37-35) with error: <nil>\n' 2024-02-09 08:35:22 ERROR unit.calico/4.juju-log server.go:325 b'null\n' 2024-02-09 08:35:22 ERROR unit.calico/4.juju-log server.go:325 Failed to configure node. Traceback (most recent call last): File "./src/charm.py", line 298, in _configure_node node = self._calicoctl_get("node", node_name) File "./src/charm.py", line 640, in _calicoctl_get output = self.calicoctl(*args) File "./src/charm.py", line 632, in calicoctl return subprocess.check_output(cmd, env=env, stderr=subprocess.PIPE, timeout=timeout) File "/usr/lib/python3.8/subprocess.py", line 415, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/usr/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/opt/calicoctl/calicoctl', 'get', '-o', 'yaml', '--export', 'node', 'ip-172-31-37-35']' returned non-zero exit status 1. 2024-02-09 08:35:22 ERROR unit.calico/4.juju-log server.go:325 Failed to configure Calico, will retry. Traceback (most recent call last): File "./src/charm.py", line 174, in _install_or_upgrade self._configure_calico() File "./src/charm.py", line 125, in _configure_calico self._configure_node() File "./src/charm.py", line 305, in _configure_node raise e File "./src/charm.py", line 298, in _configure_node node = self._calicoctl_get("node", node_name) File "./src/charm.py", line 640, in _calicoctl_get output = self.calicoctl(*args) File "./src/charm.py", line 632, in calicoctl return subprocess.check_output(cmd, env=env, stderr=subprocess.PIPE, timeout=timeout) File "/usr/lib/python3.8/subprocess.py", line 415, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/usr/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/opt/calicoctl/calicoctl', 'get', '-o', 'yaml', '--export', 'node', 'ip-172-31-37-35']' returned non-zero exit status 1. --------------------------------------------------------------------------------------------- and in the cloud manager logs we see I0209 04:26:41.428228 1 node_controller.go:484] Successfully initialized node ip-172-31-42-39.ec2.internal with cloud provider I0209 04:26:41.428263 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider I0209 04:26:41.428621 1 event.go:294] "Event occurred" object="ip-172-31-42-39.ec2.internal" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="Synced" message="Node synced successfully" E0209 04:26:41.519423 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:41.529643 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider E0209 04:26:41.619732 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:41.639955 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider E0209 04:26:41.731775 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:41.772253 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider E0209 04:26:41.857045 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:41.937298 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider E0209 04:26:42.018322 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:42.178671 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider E0209 04:26:42.269708 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:42.590273 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider E0209 04:26:42.677117 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:43.317318 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider E0209 04:26:43.467989 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:44.748498 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider E0209 04:26:44.840428 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:45.070342 1 node_lifecycle_controller.go:164] deleting node since it is no longer present in cloud provider: ip-172-31-37-35 I0209 04:26:45.070488 1 event.go:294] "Event occurred" object="ip-172-31-37-35" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="DeletingNode" message="Deleting node ip-172-31-37-35 because it does not exist in the cloud provider" --------------------------------------------------------------------------------------------------- It seems like the instance was deleted by no new pod was provisioned and the old ip kept on being reused; Crashdumps and more information can be found here: https://oil-jenkins.canonical.com/artifacts/f987f8e9-1490-4812-aed3-07d3ba4623d1/index.html A Calico pod is stuck waiting during a k8s on aws deployment. The logs show ----------------------------------------------------------- 2024-02-09 08:35:22 ERROR unit.calico/4.juju-log server.go:325 b'resource does not exist: Node(ip-172-31-37-35) with error: <nil>\n' 2024-02-09 08:35:22 ERROR unit.calico/4.juju-log server.go:325 b'null\n' 2024-02-09 08:35:22 ERROR unit.calico/4.juju-log server.go:325 Failed to configure node. Traceback (most recent call last):   File "./src/charm.py", line 298, in _configure_node     node = self._calicoctl_get("node", node_name)   File "./src/charm.py", line 640, in _calicoctl_get     output = self.calicoctl(*args)   File "./src/charm.py", line 632, in calicoctl     return subprocess.check_output(cmd, env=env, stderr=subprocess.PIPE, timeout=timeout)   File "/usr/lib/python3.8/subprocess.py", line 415, in check_output     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,   File "/usr/lib/python3.8/subprocess.py", line 516, in run     raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/opt/calicoctl/calicoctl', 'get', '-o', 'yaml', '--export', 'node', 'ip-172-31-37-35']' returned non-zero exit status 1. 2024-02-09 08:35:22 ERROR unit.calico/4.juju-log server.go:325 Failed to configure Calico, will retry. Traceback (most recent call last):   File "./src/charm.py", line 174, in _install_or_upgrade     self._configure_calico()   File "./src/charm.py", line 125, in _configure_calico     self._configure_node()   File "./src/charm.py", line 305, in _configure_node     raise e   File "./src/charm.py", line 298, in _configure_node     node = self._calicoctl_get("node", node_name)   File "./src/charm.py", line 640, in _calicoctl_get     output = self.calicoctl(*args)   File "./src/charm.py", line 632, in calicoctl     return subprocess.check_output(cmd, env=env, stderr=subprocess.PIPE, timeout=timeout)   File "/usr/lib/python3.8/subprocess.py", line 415, in check_output     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,   File "/usr/lib/python3.8/subprocess.py", line 516, in run     raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/opt/calicoctl/calicoctl', 'get', '-o', 'yaml', '--export', 'node', 'ip-172-31-37-35']' returned non-zero exit status 1. --------------------------------------------------------------------------------------------- and in the cloud manager logs we see I0209 04:26:41.428228 1 node_controller.go:484] Successfully initialized node ip-172-31-42-39.ec2.internal with cloud provider I0209 04:26:41.428263 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider I0209 04:26:41.428621 1 event.go:294] "Event occurred" object="ip-172-31-42-39.ec2.internal" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="Synced" message="Node synced successfully" E0209 04:26:41.519423 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:41.529643 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider E0209 04:26:41.619732 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:41.639955 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider E0209 04:26:41.731775 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:41.772253 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider E0209 04:26:41.857045 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:41.937298 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider E0209 04:26:42.018322 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:42.178671 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider E0209 04:26:42.269708 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:42.590273 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider E0209 04:26:42.677117 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:43.317318 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider E0209 04:26:43.467989 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:44.748498 1 node_controller.go:415] Initializing node ip-172-31-37-35 with cloud provider E0209 04:26:44.840428 1 node_controller.go:229] error syncing 'ip-172-31-37-35': failed to get provider ID for node ip-172-31-37-35 at cloudprovider: failed to get instance ID from cloud provider: instance not found, requeuing I0209 04:26:45.070342 1 node_lifecycle_controller.go:164] deleting node since it is no longer present in cloud provider: ip-172-31-37-35 I0209 04:26:45.070488 1 event.go:294] "Event occurred" object="ip-172-31-37-35" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="DeletingNode" message="Deleting node ip-172-31-37-35 because it does not exist in the cloud provider" --------------------------------------------------------------------------------------------------- It seems like the instance was deleted by no new pod was provisioned and the old ip kept on being reused; Crashdumps and more information can be found here: https://oil-jenkins.canonical.com/artifacts/f987f8e9-1490-4812-aed3-07d3ba4623d1/index.html
2024-03-07 22:13:26 Kevin W Monroe bug task added charm-aws-cloud-provider
2024-03-07 22:13:55 Kevin W Monroe charm-aws-cloud-provider: status New Incomplete
2024-03-07 22:13:58 Kevin W Monroe charm-aws-cloud-provider: importance Undecided Medium
2024-03-07 22:14:02 Kevin W Monroe charm-aws-cloud-provider: milestone 1.29+ck1
2024-03-07 22:14:09 Kevin W Monroe charm-calico: status New Invalid
2024-03-07 22:15:15 Kevin W Monroe summary A Calico Pod is stuck waiting during deployment, unable to reach the ip of a deleted node aws cloud controller manager not using fqdn for unitialized nodes
2024-03-25 04:34:45 Kevin W Monroe charm-aws-cloud-provider: milestone 1.29+ck1 1.30
2024-05-31 19:04:12 Adam Dyess bug task added charm-kubernetes-master
2024-05-31 19:04:18 Adam Dyess charm-kubernetes-master: milestone 1.30
2024-05-31 19:04:26 Adam Dyess charm-kubernetes-master: status New Fix Committed
2024-05-31 21:02:24 Adam Dyess charm-kubernetes-master: importance Undecided Medium
2024-06-06 19:23:12 Adam Dyess charm-aws-cloud-provider: status Incomplete Invalid
2024-06-06 19:23:20 Adam Dyess charm-aws-cloud-provider: milestone 1.30
2024-07-11 16:05:35 Adam Dyess charm-kubernetes-master: status Fix Committed Fix Released