aws cloud controller manager not using fqdn for unitialized nodes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
AWS Cloud Provider Charm |
Invalid
|
Medium
|
Unassigned | ||
Calico Charm |
Invalid
|
Undecided
|
Unassigned | ||
Kubernetes Control Plane Charm |
Fix Released
|
Medium
|
Unassigned |
Bug Description
A Calico pod is stuck waiting during a k8s on aws deployment. The logs show
-------
2024-02-09 08:35:22 ERROR unit.calico/
2024-02-09 08:35:22 ERROR unit.calico/
2024-02-09 08:35:22 ERROR unit.calico/
Traceback (most recent call last):
File "./src/charm.py", line 298, in _configure_node
node = self._calicoctl
File "./src/charm.py", line 640, in _calicoctl_get
output = self.calicoctl(
File "./src/charm.py", line 632, in calicoctl
return subprocess.
File "/usr/lib/
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/
raise CalledProcessEr
subprocess.
2024-02-09 08:35:22 ERROR unit.calico/
Traceback (most recent call last):
File "./src/charm.py", line 174, in _install_or_upgrade
self.
File "./src/charm.py", line 125, in _configure_calico
self.
File "./src/charm.py", line 305, in _configure_node
raise e
File "./src/charm.py", line 298, in _configure_node
node = self._calicoctl
File "./src/charm.py", line 640, in _calicoctl_get
output = self.calicoctl(
File "./src/charm.py", line 632, in calicoctl
return subprocess.
File "/usr/lib/
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/
raise CalledProcessEr
subprocess.
-------
and in the cloud manager logs we see
I0209 04:26:41.428228 1 node_controller
I0209 04:26:41.428263 1 node_controller
I0209 04:26:41.428621 1 event.go:294] "Event occurred" object=
E0209 04:26:41.519423 1 node_controller
I0209 04:26:41.529643 1 node_controller
E0209 04:26:41.619732 1 node_controller
I0209 04:26:41.639955 1 node_controller
E0209 04:26:41.731775 1 node_controller
I0209 04:26:41.772253 1 node_controller
E0209 04:26:41.857045 1 node_controller
I0209 04:26:41.937298 1 node_controller
E0209 04:26:42.018322 1 node_controller
I0209 04:26:42.178671 1 node_controller
E0209 04:26:42.269708 1 node_controller
I0209 04:26:42.590273 1 node_controller
E0209 04:26:42.677117 1 node_controller
I0209 04:26:43.317318 1 node_controller
E0209 04:26:43.467989 1 node_controller
I0209 04:26:44.748498 1 node_controller
E0209 04:26:44.840428 1 node_controller
I0209 04:26:45.070342 1 node_lifecycle_
I0209 04:26:45.070488 1 event.go:294] "Event occurred" object=
-------
It seems like the instance was deleted by no new pod was provisioned and the old ip kept on being reused;
Crashdumps and more information can be found here: https:/
summary: |
- A Calico Pod is stuck waiting during deployment + A Calico Pod is stuck waiting during deployment, unable to reach the ip + of a deleted node |
description: | updated |
Changed in charm-aws-cloud-provider: | |
milestone: | 1.29+ck1 → 1.30 |
Changed in charm-kubernetes-master: | |
importance: | Undecided → Medium |
Changed in charm-aws-cloud-provider: | |
status: | Incomplete → Invalid |
milestone: | 1.30 → none |
Changed in charm-kubernetes-master: | |
status: | Fix Committed → Fix Released |
ok, the same calicoctl error also happens in a passing testrun, https:/ /solutions. qa.canonical. com/testruns/ cffca8e5- 1fd3-4eb4- b182-3c69b58dcc b5
Maybe this is not the cause.