Bug #2064145 “One of calico instance stuck in “Configuring Calic...” : Bugs : Calico Charm

Revision history for this message

Michael Fischer (michaelandrewfischer) wrote on 2024-05-16 (last edit on 2024-05-16):

#1

Download full text (3.7 KiB)

Having this issue in revision 105 on jammy. All of the calico nodes are stuck in this infinite loop failing to configure. If I ssh into any of the calico nodes and run the command (/opt/calicoctl/calicoctl get -o yaml --export node novel-bird it does not return null neither does it return a non-zero exit status of 1.

command:
/opt/calicoctl/calicoctl get -o yaml --export node novel-bird

output:
apiVersion: projectcalico.org/v3
kind: Node
metadata:
  annotations:
    projectcalico.org/kube-labels: '{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","juju-application":"kubernetes-control-plane","juju-charm":"kubernetes-control-plane","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"novel-bird","kubernetes.io/os":"linux","node-role.kubernetes.io/control-plane":""}'
  creationTimestamp: null
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    juju-application: kubernetes-control-plane
    juju-charm: kubernetes-control-plane
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: novel-bird
    kubernetes.io/os: linux
    node-role.kubernetes.io/control-plane: ""
  name: novel-bird
spec:
  addresses:
  - address: 192.168.2.38
    type: InternalIP
  orchRefs:
  - nodeName: novel-bird
    orchestrator: k8s
status: {}

error logs:

unit-calico-3: 09:10:37 ERROR unit.calico/3.juju-log b'resource does not exist: Node(novel-bird) with error: <nil>\n'
unit-calico-3: 09:10:37 ERROR unit.calico/3.juju-log b'null\n'
unit-calico-3: 09:10:37 ERROR unit.calico/3.juju-log Failed to configure node.
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 298, in _configure_node
    node = self._calicoctl_get("node", node_name)
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 640, in _calicoctl_get
    output = self.calicoctl(*args)
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 632, in calicoctl
    return subprocess.check_output(cmd, env=env, stderr=subprocess.PIPE, timeout=timeout)
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/opt/calicoctl/calicoctl', 'get', '-o', 'yaml', '--export', 'node', 'novel-bird']' returned non-zero exit status 1.
unit-calico-3: 09:10:37 ERROR unit.calico/3.juju-log Failed to configure Calico, will retry.
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 174, in _install_or_upgrade
    self._configure_calico()
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 125, in _configure_calico
    self._configure_node()
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 305, in _configure_node
    raise e
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 298, in _configure_node
    node = self._calicoctl_get("node", node_name)
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 640, in _calicoctl_get
    output = s...

Having this issue in revision 105 on jammy. All of the calico nodes are stuck in this infinite loop failing to configure. If I ssh into any of the calico nodes and run the command (/opt/calicoctl/calicoctl get -o yaml --export node novel-bird it does not return null neither does it return a non-zero exit status of 1.

command:
/opt/calicoctl/calicoctl get -o yaml --export node novel-bird

output:
apiVersion: projectcalico.org/v3
kind: Node
metadata:
  annotations:
    projectcalico.org/kube-labels: '{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","juju-application":"kubernetes-control-plane","juju-charm":"kubernetes-control-plane","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"novel-bird","kubernetes.io/os":"linux","node-role.kubernetes.io/control-plane":""}'
  creationTimestamp: null
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    juju-application: kubernetes-control-plane
    juju-charm: kubernetes-control-plane
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: novel-bird
    kubernetes.io/os: linux
    node-role.kubernetes.io/control-plane: ""
  name: novel-bird
spec:
  addresses:
  - address: 192.168.2.38
    type: InternalIP
  orchRefs:
  - nodeName: novel-bird
    orchestrator: k8s
status: {}

error logs:

unit-calico-3: 09:10:37 ERROR unit.calico/3.juju-log b'resource does not exist: Node(novel-bird) with error: <nil>\n'
unit-calico-3: 09:10:37 ERROR unit.calico/3.juju-log b'null\n'
unit-calico-3: 09:10:37 ERROR unit.calico/3.juju-log Failed to configure node.
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 298, in _configure_node
    node = self._calicoctl_get("node", node_name)
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 640, in _calicoctl_get
    output = self.calicoctl(*args)
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 632, in calicoctl
    return subprocess.check_output(cmd, env=env, stderr=subprocess.PIPE, timeout=timeout)
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/opt/calicoctl/calicoctl', 'get', '-o', 'yaml', '--export', 'node', 'novel-bird']' returned non-zero exit status 1.
unit-calico-3: 09:10:37 ERROR unit.calico/3.juju-log Failed to configure Calico, will retry.
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 174, in _install_or_upgrade
    self._configure_calico()
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 125, in _configure_calico
    self._configure_node()
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 305, in _configure_node
    raise e
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 298, in _configure_node
    node = self._calicoctl_get("node", node_name)
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 640, in _calicoctl_get
    output = self.calicoctl(*args)
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 632, in calicoctl
    return subprocess.check_output(cmd, env=env, stderr=subprocess.PIPE, timeout=timeout)
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/opt/calicoctl/calicoctl', 'get', '-o', 'yaml', '--export', 'node', 'novel-bird']' returned non-zero exit status 1.

Adam Dyess (addyess) on 2024-06-06

Changed in charm-calico:
milestone:	none → 1.30+ck1
status:	New → Triaged

Adam Dyess (addyess) on 2024-06-18

summary:

- One of calico instace stuck in "Configuring Calico"
+ One of calico instance stuck in "Configuring Calico"

Adam Dyess (addyess) on 2024-06-21

Changed in charm-calico:
milestone:	1.30+ck1 → 1.30

Revision history for this message

Adam Dyess (addyess) wrote on 2024-06-21:

#2

Instrumenting more debug from the charm:

https://github.com/charmed-kubernetes/charm-calico/pull/107

Changed in charm-calico:
milestone:	1.30 → 1.30+ck1

Revision history for this message

Adam Dyess (addyess) wrote on 2024-07-01:

#3

We continue to see this in fresh deployments and it may be associated with LP#2064305

Changed in charm-calico:
status:	Triaged → Fix Committed
importance:	Undecided → Medium
importance:	Medium → High

Revision history for this message

Adam Dyess (addyess) wrote on 2024-07-01:

#4

if this is encountered, there seems to be some collision from having two certificate authority charms in the cluster:

for instance

etcd may be related to easyrsa to provide certificates
calico, kubernetes-control-plane -- are related to vault for certificates

There's really no NEED for two certificate authorities. One should drive to use a SINGLE certificate authority source

Revision history for this message

Adam Dyess (addyess) wrote on 2024-07-09 (last edit on 2024-07-09):

#5

I see in the crashdumps from solqa this awesome folder called `pod-logs` which is super spiffy

Using the provided run [0] from 4/26/2023 and i noticed that one of the calico node pods didn't start after 4 hours. It seems to be blocking the charm from going active.

I wish there was a way see the events on these pods too. I suppose we could add these logs to the charm debug log -- but it might be cool if this was part of the crashdump

```
kubectl events -A -oyaml
```

[0] https://solutions.qa.canonical.com/testruns/18cd9a82-ba5a-4938-98a5-2be7e8d6e5a5

Revision history for this message

Adam Dyess (addyess) wrote on 2024-07-09:

#6

The following link helps to track down examples of this bug in various solqa runs
https://solutions.qa.canonical.com/bugs/2064145

Revision history for this message

Adam Dyess (addyess) wrote on 2024-07-09:

#7

Adding ANOTHER patch to try to catch these failures.

https://github.com/charmed-kubernetes/charm-calico/pull/108

Revision history for this message

Michael Fischer (michaelandrewfischer) wrote on 2024-07-17:

#8

Download full text (15.4 KiB)

Trying again with 1.30 release on maas bare metal; results in endless configuration loop for all calico instances. See debug output below:

unit-calico-3: 00:47:39 INFO unit.calico/3.juju-log cni:17: Configured Calico IP pool.
unit-calico-3: 00:47:39 ERROR unit.calico/3.juju-log cni:17: env={'JUJU_UNIT_NAME': 'calico/3', 'JUJU_VERSION': '3.5.1', 'JUJU_CHARM_HTTP_PROXY': '', 'APT_LISTCHANGES_FRONTEND': 'none', 'JUJU_CONTEXT_ID': 'calico/3-cni-relation-changed-1657795354183363859', 'JUJU_AGENT_SOCKET_NETWORK': 'unix', 'JUJU_API_ADDRESSES': '192.168.2.56:17070', 'JUJU_CHARM_HTTPS_PROXY': '', 'JUJU_AGENT_SOCKET_ADDRESS': '@/var/lib/juju/agents/unit-calico-3/agent.socket', 'JUJU_MODEL_NAME': 'k8s', 'JUJU_DISPATCH_PATH': 'hooks/cni-relation-changed', 'JUJU_AVAILABILITY_ZONE': 'default', 'JUJU_REMOTE_UNIT': 'kubernetes-control-plane/1', 'JUJU_CHARM_DIR': '/var/lib/juju/agents/unit-calico-3/charm', 'TERM': 'tmux-256color', 'JUJU_RELATION': 'cni', 'PATH': '/var/lib/juju/tools/unit-calico-3:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin', 'JUJU_RELATION_ID': 'cni:17', 'JUJU_METER_STATUS': 'AMBER', 'JUJU_HOOK_NAME': 'cni-relation-changed', 'LANG': 'C.UTF-8', 'CLOUD_API_VERSION': '', 'DEBIAN_FRONTEND': 'noninteractive', 'JUJU_SLA': 'unsupported', 'JUJU_MODEL_UUID': '746a6374-c778-49f8-88fe-f85e86ebceab', 'JUJU_MACHINE_ID': '13', 'JUJU_CHARM_FTP_PROXY': '', 'JUJU_METER_INFO': 'not set', 'PWD': '/var/lib/juju/agents/unit-calico-3/charm', 'JUJU_PRINCIPAL_UNIT': 'kubernetes-control-plane/1', 'JUJU_CHARM_NO_PROXY': '127.0.0.1,localhost,::1', 'PYTHONPATH': 'lib:venv', 'CHARM_DIR': '/var/lib/juju/agents/unit-calico-3/charm', 'JUJU_REMOTE_APP': 'kubernetes-control-plane', 'OPERATOR_DISPATCH': '1', 'ETCD_ENDPOINTS': 'https://192.168.2.2:2379,https://192.168.2.40:2379,https://192.168.2.71:2379', 'ETCD_KEY_FILE': '/opt/calicoctl/etcd-key', 'ETCD_CERT_FILE': '/opt/calicoctl/etcd-cert', 'ETCD_CA_CERT_FILE': '/opt/calicoctl/etcd-ca'}
unit-calico-3: 00:47:39 ERROR unit.calico/3.juju-log cni:17: out=null

unit-calico-3: 00:47:39 ERROR unit.calico/3.juju-log cni:17: err=time="2024-07-17T05:47:39Z" level=info msg="Log level set to debug"
time="2024-07-17T05:47:39Z" level=info msg="Executing config command"
time="2024-07-17T05:47:39Z" level=info msg="Config file: /etc/calico/calicoctl.cfg cannot be read - reading config from environment"
time="2024-07-17T05:47:39Z" level=debug msg="Datastore type isn't set, trying to detect it"
time="2024-07-17T05:47:39Z" level=debug msg="EtcdEndpoints specified, detected etcdv3."
time="2024-07-17T05:47:39Z" level=info msg="Loaded client config: apiconfig.CalicoAPIConfigSpec{DatastoreType:\"etcdv3\", EtcdConfig:apiconfig.EtcdConfig{EtcdEndpoints:\"https://192.168.2.2:2379,https://192.168.2.40:2379,https://192.168.2.71:2379\", EtcdDiscoverySrv:\"\", EtcdUsername:\"\", EtcdPassword:\"\", EtcdKeyFile:\"/opt/calicoctl/etcd-key\", EtcdCertFile:\"/opt/calicoctl/etcd-cert\", EtcdCACertFile:\"/opt/calicoctl/etcd-ca\", EtcdKey:\"\", EtcdCert:\"\", EtcdCACert:\"\", EtcdFIPSModeEnabled:false}, KubeConfig:apiconfig.KubeConfig{Kubeconfig:\"\", K8sAPIEndpoint:\"\", K8sKeyFile:\"\", K8sCertFile:\"\", K8sCAFile:\"\", K8sAPIToken:\"...

Trying again with 1.30 release on maas bare metal; results in endless configuration loop for all calico instances. See debug output below:

unit-calico-3: 00:47:39 INFO unit.calico/3.juju-log cni:17: Configured Calico IP pool.
unit-calico-3: 00:47:39 ERROR unit.calico/3.juju-log cni:17: env={'JUJU_UNIT_NAME': 'calico/3', 'JUJU_VERSION': '3.5.1', 'JUJU_CHARM_HTTP_PROXY': '', 'APT_LISTCHANGES_FRONTEND': 'none', 'JUJU_CONTEXT_ID': 'calico/3-cni-relation-changed-1657795354183363859', 'JUJU_AGENT_SOCKET_NETWORK': 'unix', 'JUJU_API_ADDRESSES': '192.168.2.56:17070', 'JUJU_CHARM_HTTPS_PROXY': '', 'JUJU_AGENT_SOCKET_ADDRESS': '@/var/lib/juju/agents/unit-calico-3/agent.socket', 'JUJU_MODEL_NAME': 'k8s', 'JUJU_DISPATCH_PATH': 'hooks/cni-relation-changed', 'JUJU_AVAILABILITY_ZONE': 'default', 'JUJU_REMOTE_UNIT': 'kubernetes-control-plane/1', 'JUJU_CHARM_DIR': '/var/lib/juju/agents/unit-calico-3/charm', 'TERM': 'tmux-256color', 'JUJU_RELATION': 'cni', 'PATH': '/var/lib/juju/tools/unit-calico-3:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin', 'JUJU_RELATION_ID': 'cni:17', 'JUJU_METER_STATUS': 'AMBER', 'JUJU_HOOK_NAME': 'cni-relation-changed', 'LANG': 'C.UTF-8', 'CLOUD_API_VERSION': '', 'DEBIAN_FRONTEND': 'noninteractive', 'JUJU_SLA': 'unsupported', 'JUJU_MODEL_UUID': '746a6374-c778-49f8-88fe-f85e86ebceab', 'JUJU_MACHINE_ID': '13', 'JUJU_CHARM_FTP_PROXY': '', 'JUJU_METER_INFO': 'not set', 'PWD': '/var/lib/juju/agents/unit-calico-3/charm', 'JUJU_PRINCIPAL_UNIT': 'kubernetes-control-plane/1', 'JUJU_CHARM_NO_PROXY': '127.0.0.1,localhost,::1', 'PYTHONPATH': 'lib:venv', 'CHARM_DIR': '/var/lib/juju/agents/unit-calico-3/charm', 'JUJU_REMOTE_APP': 'kubernetes-control-plane', 'OPERATOR_DISPATCH': '1', 'ETCD_ENDPOINTS': 'https://192.168.2.2:2379,https://192.168.2.40:2379,https://192.168.2.71:2379', 'ETCD_KEY_FILE': '/opt/calicoctl/etcd-key', 'ETCD_CERT_FILE': '/opt/calicoctl/etcd-cert', 'ETCD_CA_CERT_FILE': '/opt/calicoctl/etcd-ca'}
unit-calico-3: 00:47:39 ERROR unit.calico/3.juju-log cni:17: out=null

unit-calico-3: 00:47:39 ERROR unit.calico/3.juju-log cni:17: err=time="2024-07-17T05:47:39Z" level=info msg="Log level set to debug"
time="2024-07-17T05:47:39Z" level=info msg="Executing config command"
time="2024-07-17T05:47:39Z" level=info msg="Config file: /etc/calico/calicoctl.cfg cannot be read - reading config from environment"
time="2024-07-17T05:47:39Z" level=debug msg="Datastore type isn't set, trying to detect it"
time="2024-07-17T05:47:39Z" level=debug msg="EtcdEndpoints specified, detected etcdv3."
time="2024-07-17T05:47:39Z" level=info msg="Loaded client config: apiconfig.CalicoAPIConfigSpec{DatastoreType:\"etcdv3\", EtcdConfig:apiconfig.EtcdConfig{EtcdEndpoints:\"https://192.168.2.2:2379,https://192.168.2.40:2379,https://192.168.2.71:2379\", EtcdDiscoverySrv:\"\", EtcdUsername:\"\", EtcdPassword:\"\", EtcdKeyFile:\"/opt/calicoctl/etcd-key\", EtcdCertFile:\"/opt/calicoctl/etcd-cert\", EtcdCACertFile:\"/opt/calicoctl/etcd-ca\", EtcdKey:\"\", EtcdCert:\"\", EtcdCACert:\"\", EtcdFIPSModeEnabled:false}, KubeConfig:apiconfig.KubeConfig{Kubeconfig:\"\", K8sAPIEndpoint:\"\", K8sKeyFile:\"\", K8sCertFile:\"\", K8sCAFile:\"\", K8sAPIToken:\"\", K8sInsecureSkipTLSVerify:false, K8sDisableNodePoll:false, K8sUsePodCIDR:false, KubeconfigInline:\"\", K8sClientQPS:0, K8sCurrentContext:\"\"}}"
time="2024-07-17T05:47:39Z" level=debug msg="Using datastore type 'etcdv3'"
time="2024-07-17T05:47:39Z" level=debug msg="creating a TLS config" BuiltWithBoringCrypto=false fipsMode=false
time="2024-07-17T05:47:39Z" level=debug msg="Processing Get request" model-etcdKey="ClusterInformation(default)" rev=
time="2024-07-17T05:47:39Z" level=debug msg="Calling Get on etcdv3 client" etcdv3-etcdKey=/calico/resources/v3/projectcalico.org/clusterinformations/default model-etcdKey="ClusterInformation(default)" rev=
time="2024-07-17T05:47:39Z" level=debug msg="No results returned from etcdv3 client" etcdv3-etcdKey=/calico/resources/v3/projectcalico.org/clusterinformations/default model-etcdKey="ClusterInformation(default)" rev=
time="2024-07-17T05:47:39Z" level=info msg="Skip version mismatch checking due to ClusterInformation not being present"
time="2024-07-17T05:47:39Z" level=debug msg="Resource: projectcalico.org/v3, Kind=Node"
time="2024-07-17T05:47:39Z" level=debug msg="Data: - apiVersion: projectcalico.org/v3\n  kind: Node\n  metadata:\n    creationTimestamp: null\n    name: sweet-shiner\n  spec: {}\n  status: {}\n"
time="2024-07-17T05:47:39Z" level=info msg="Config file: /etc/calico/calicoctl.cfg cannot be read - reading config from environment"
time="2024-07-17T05:47:39Z" level=debug msg="Datastore type isn't set, trying to detect it"
time="2024-07-17T05:47:39Z" level=debug msg="EtcdEndpoints specified, detected etcdv3."
time="2024-07-17T05:47:39Z" level=info msg="Loaded client config: apiconfig.CalicoAPIConfigSpec{DatastoreType:\"etcdv3\", EtcdConfig:apiconfig.EtcdConfig{EtcdEndpoints:\"https://192.168.2.2:2379,https://192.168.2.40:2379,https://192.168.2.71:2379\", EtcdDiscoverySrv:\"\", EtcdUsername:\"\", EtcdPassword:\"\", EtcdKeyFile:\"/opt/calicoctl/etcd-key\", EtcdCertFile:\"/opt/calicoctl/etcd-cert\", EtcdCACertFile:\"/opt/calicoctl/etcd-ca\", EtcdKey:\"\", EtcdCert:\"\", EtcdCACert:\"\", EtcdFIPSModeEnabled:false}, KubeConfig:apiconfig.KubeConfig{Kubeconfig:\"\", K8sAPIEndpoint:\"\", K8sKeyFile:\"\", K8sCertFile:\"\", K8sCAFile:\"\", K8sAPIToken:\"\", K8sInsecureSkipTLSVerify:false, K8sDisableNodePoll:false, K8sUsePodCIDR:false, KubeconfigInline:\"\", K8sClientQPS:0, K8sCurrentContext:\"\"}}"
time="2024-07-17T05:47:39Z" level=debug msg="Using datastore type 'etcdv3'"
time="2024-07-17T05:47:39Z" level=debug msg="creating a TLS config" BuiltWithBoringCrypto=false fipsMode=false
time="2024-07-17T05:47:39Z" level=info msg="Client: {{{CalicoAPIConfig projectcalico.org/v3} {      0 {{0 0 <nil>}} <nil> <nil> map[] map[] [] [] []} {etcdv3 {https://192.168.2.2:2379,https://192.168.2.40:2379,https://192.168.2.71:2379    /opt/calicoctl/etcd-key /opt/calicoctl/etcd-cert /opt/calicoctl/etcd-ca    false} {      false false false  0 }}} 0xc0004f4050 0xc000062240}"
time="2024-07-17T05:47:39Z" level=debug msg="Processing Get request" model-etcdKey="Node(sweet-shiner)" rev=
time="2024-07-17T05:47:39Z" level=debug msg="Calling Get on etcdv3 client" etcdv3-etcdKey=/calico/resources/v3/projectcalico.org/nodes/sweet-shiner model-etcdKey="Node(sweet-shiner)" rev=
time="2024-07-17T05:47:39Z" level=debug msg="No results returned from etcdv3 client" etcdv3-etcdKey=/calico/resources/v3/projectcalico.org/nodes/sweet-shiner model-etcdKey="Node(sweet-shiner)" rev=
time="2024-07-17T05:47:39Z" level=info msg="results: {FileInvalid:false NumResources:1 NumHandled:0 Err:<nil> SingleKind:Node Resources:[] ResErrs:[resource does not exist: Node(sweet-shiner) with error: <nil>] Client:{config:{TypeMeta:{Kind:CalicoAPIConfig APIVersion:projectcalico.org/v3} ObjectMeta:{Name: GenerateName: Namespace: SelfLink: UID: ResourceVersion: Generation:0 CreationTimestamp:{Time:{wall:0 ext:0 loc:<nil>}} DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[] OwnerReferences:[] Finalizers:[] ManagedFields:[]} Spec:{DatastoreType:etcdv3 EtcdConfig:{EtcdEndpoints:https://192.168.2.2:2379,https://192.168.2.40:2379,https://192.168.2.71:2379 EtcdDiscoverySrv: EtcdUsername: EtcdPassword: EtcdKeyFile:/opt/calicoctl/etcd-key EtcdCertFile:/opt/calicoctl/etcd-cert EtcdCACertFile:/opt/calicoctl/etcd-ca EtcdKey: EtcdCert: EtcdCACert: EtcdFIPSModeEnabled:false} KubeConfig:{Kubeconfig: K8sAPIEndpoint: K8sKeyFile: K8sCertFile: K8sCAFile: K8sAPIToken: K8sInsecureSkipTLSVerify:false K8sDisableNodePoll:false K8sUsePodCIDR:false KubeconfigInline: K8sClientQPS:0 K8sCurrentContext:}}} backend:0xc0004f4050 resources:0xc000062240}}"
resource does not exist: Node(sweet-shiner) with error: <nil>

unit-calico-3: 00:47:39 ERROR unit.calico/3.juju-log cni:17: Failed to configure node.
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 298, in _configure_node
    node = self._calicoctl_get("node", node_name)
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 641, in _calicoctl_get
    output = self.calicoctl(*args)
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 632, in calicoctl
    return subprocess.check_output(cmd, env=env, stderr=subprocess.PIPE, timeout=timeout)
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/opt/calicoctl/calicoctl', '--log-level=debug', 'get', '-o', 'yaml', '--export', 'node', 'sweet-shiner']' returned non-zero exit status 1.
unit-calico-3: 00:47:39 ERROR unit.calico/3.juju-log cni:17: Failed to configure Calico, will retry.
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 174, in _install_or_upgrade
    self._configure_calico()
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 125, in _configure_calico
    self._configure_node()
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 305, in _configure_node
    raise e
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 298, in _configure_node
    node = self._calicoctl_get("node", node_name)
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 641, in _calicoctl_get
    output = self.calicoctl(*args)
  File "/var/lib/juju/agents/unit-calico-3/charm/./src/charm.py", line 632, in calicoctl
    return subprocess.check_output(cmd, env=env, stderr=subprocess.PIPE, timeout=timeout)
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/opt/calicoctl/calicoctl', '--log-level=debug', 'get', '-o', 'yaml', '--export', 'node', 'sweet-shiner']' returned non-zero exit status 1.

Result of running command from one of the calico instances:

ubuntu@rich-poodle:~$  /opt/calicoctl/calicoctl --log-level=debug get -o yaml --export node sweet-shiner
INFO[0000] Log level set to debug                       
INFO[0000] Executing config command                     
INFO[0000] Config file: /etc/calico/calicoctl.cfg cannot be read - reading config from environment 
DEBU[0000] Datastore type isn't set, trying to detect it 
DEBU[0000] No EtcdEndpoints specified, defaulting to kubernetes. 
DEBU[0000] Using default kubeconfig path.                kubeconfig=/home/ubuntu/.kube/config
INFO[0000] Loaded client config: apiconfig.CalicoAPIConfigSpec{DatastoreType:"kubernetes", EtcdConfig:apiconfig.EtcdConfig{EtcdEndpoints:"", EtcdDiscoverySrv:"", EtcdUsername:"", EtcdPassword:"", EtcdKeyFile:"", EtcdCertFile:"", EtcdCACertFile:"", EtcdKey:"", EtcdCert:"", EtcdCACert:"", EtcdFIPSModeEnabled:false}, KubeConfig:apiconfig.KubeConfig{Kubeconfig:"/home/ubuntu/.kube/config", K8sAPIEndpoint:"", K8sKeyFile:"", K8sCertFile:"", K8sCAFile:"", K8sAPIToken:"", K8sInsecureSkipTLSVerify:false, K8sDisableNodePoll:false, K8sUsePodCIDR:false, KubeconfigInline:"", K8sClientQPS:0, K8sCurrentContext:""}} 
DEBU[0000] Using datastore type 'kubernetes'            
DEBU[0000] Calico is configured to use calico-ipam      
DEBU[0000] Performing 'Get' for ClusterInformation(default)  
DEBU[0000] Get custom Kubernetes resource                Key="ClusterInformation(default)" Resource=ClusterInformations Revision=
DEBU[0000] Get custom Kubernetes resource by name        Key="ClusterInformation(default)" Name=default Namespace= Resource=ClusterInformations Revision=
DEBU[0000] Error getting resource                        Key="ClusterInformation(default)" Name=default Namespace= Resource=ClusterInformations Revision= error="the server could not find the requested resource (get ClusterInformations.crd.projectcalico.org default)"
INFO[0000] Skip version mismatch checking due to ClusterInformation not being present 
DEBU[0000] Resource: projectcalico.org/v3, Kind=Node    
DEBU[0000] Data: - apiVersion: projectcalico.org/v3
  kind: Node
  metadata:
    creationTimestamp: null
    name: sweet-shiner
  spec: {}
  status: {} 
INFO[0000] Config file: /etc/calico/calicoctl.cfg cannot be read - reading config from environment 
DEBU[0000] Datastore type isn't set, trying to detect it 
DEBU[0000] No EtcdEndpoints specified, defaulting to kubernetes. 
DEBU[0000] Using default kubeconfig path.                kubeconfig=/home/ubuntu/.kube/config
INFO[0000] Loaded client config: apiconfig.CalicoAPIConfigSpec{DatastoreType:"kubernetes", EtcdConfig:apiconfig.EtcdConfig{EtcdEndpoints:"", EtcdDiscoverySrv:"", EtcdUsername:"", EtcdPassword:"", EtcdKeyFile:"", EtcdCertFile:"", EtcdCACertFile:"", EtcdKey:"", EtcdCert:"", EtcdCACert:"", EtcdFIPSModeEnabled:false}, KubeConfig:apiconfig.KubeConfig{Kubeconfig:"/home/ubuntu/.kube/config", K8sAPIEndpoint:"", K8sKeyFile:"", K8sCertFile:"", K8sCAFile:"", K8sAPIToken:"", K8sInsecureSkipTLSVerify:false, K8sDisableNodePoll:false, K8sUsePodCIDR:false, KubeconfigInline:"", K8sClientQPS:0, K8sCurrentContext:""}} 
DEBU[0000] Using datastore type 'kubernetes'            
DEBU[0000] Calico is configured to use calico-ipam      
INFO[0000] Client: {{{CalicoAPIConfig projectcalico.org/v3} {      0 {{0 0 <nil>}} <nil> <nil> map[] map[] [] [] []} {kubernetes {          false} {/home/ubuntu/.kube/config      false false false  0 }}} 0xc00038d640 0xc0005182d0} 
DEBU[0000] Performing 'Get' for Node(sweet-shiner)      
DEBU[0000] Received Get request on Node type            
INFO[0000] results: {FileInvalid:false NumResources:1 NumHandled:1 Err:<nil> SingleKind:Node Resources:[0xc0005bc1c0] ResErrs:[] Client:{config:{TypeMeta:{Kind:CalicoAPIConfig APIVersion:projectcalico.org/v3} ObjectMeta:{Name: GenerateName: Namespace: SelfLink: UID: ResourceVersion: Generation:0 CreationTimestamp:{Time:{wall:0 ext:0 loc:<nil>}} DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[] OwnerReferences:[] Finalizers:[] ManagedFields:[]} Spec:{DatastoreType:kubernetes EtcdConfig:{EtcdEndpoints: EtcdDiscoverySrv: EtcdUsername: EtcdPassword: EtcdKeyFile: EtcdCertFile: EtcdCACertFile: EtcdKey: EtcdCert: EtcdCACert: EtcdFIPSModeEnabled:false} KubeConfig:{Kubeconfig:/home/ubuntu/.kube/config K8sAPIEndpoint: K8sKeyFile: K8sCertFile: K8sCAFile: K8sAPIToken: K8sInsecureSkipTLSVerify:false K8sDisableNodePoll:false K8sUsePodCIDR:false KubeconfigInline: K8sClientQPS:0 K8sCurrentContext:}}} backend:0xc00038d640 resources:0xc0005182d0}} 
apiVersion: projectcalico.org/v3
kind: Node
metadata:
  annotations:
    projectcalico.org/kube-labels: '{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","juju-application":"kubernetes-control-plane","juju-charm":"kubernetes-control-plane","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"sweet-shiner","kubernetes.io/os":"linux","node-role.kubernetes.io/control-plane":""}'
  creationTimestamp: null
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    juju-application: kubernetes-control-plane
    juju-charm: kubernetes-control-plane
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: sweet-shiner
    kubernetes.io/os: linux
    node-role.kubernetes.io/control-plane: ""
  name: sweet-shiner
spec:
  addresses:
  - address: 192.168.2.72
    type: InternalIP
  orchRefs:
  - nodeName: sweet-shiner
    orchestrator: k8s
status: {}

Adam Dyess (addyess) on 2024-07-24

Changed in charm-calico:
milestone:	1.30+ck1 → 1.30+ck2
status:	Fix Committed → In Progress

Revision history for this message

Michael Fischer (michaelandrewfischer) wrote on 2024-07-25 (last edit on 2024-07-25):

#9

Download full text (5.5 KiB)

This bug, in my case, is caused by kube-proxy failing to start in kubernetes-control-plane and kubernetes-worker nodes. The ipset package is not installed and causes kube-proxy process to crash when kubernetes-control-plane and kubernetes-worker apps are configured with {mode: ipvs, ipvs: {strictARP: true}}

see https://bugs.launchpad.net/charm-calico/+bug/2045651

*****************

kubectl logs calico-node-2glrn -n kube-system -c install-cni:

2024-07-19 14:59:08.896 [INFO][1] cni-installer/<nil> <nil>: Running as a Kubernetes pod
2024-07-19 14:59:08.896 [INFO][1] cni-installer/<nil> <nil>: Installing any TLS assets
2024-07-19 14:59:08.973 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/bandwidth"
2024-07-19 14:59:08.973 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/bandwidth
2024-07-19 14:59:09.158 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/calico"
2024-07-19 14:59:09.158 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/calico
2024-07-19 14:59:09.362 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/calico-ipam"
2024-07-19 14:59:09.362 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/calico-ipam
2024-07-19 14:59:09.367 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/flannel"
2024-07-19 14:59:09.367 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/flannel
2024-07-19 14:59:09.372 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/host-local"
2024-07-19 14:59:09.372 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/host-local
2024-07-19 14:59:09.378 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/loopback"
2024-07-19 14:59:09.378 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/loopback
2024-07-19 14:59:09.385 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/portmap"
2024-07-19 14:59:09.385 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/portmap
2024-07-19 14:59:09.391 [INFO][1] cni-installer/<nil> <nil>: File is already up to date, skipping file="/host/opt/cni/bin/tuning"
2024-07-19 14:59:09.391 [INFO][1] cni-installer/<nil> <nil>: Installed /host/opt/cni/bin/tuning
2024-07-19 14:59:09.391 [INFO][1] cni-installer/<nil> <nil>: Wrote Calico CNI binaries to /host/opt/cni/bin

2024-07-19 14:59:09.494 [INFO][1] cni-installer/<nil> <nil>: CNI plugin version: v3.27.3
2024-07-19 14:59:09.494 [INFO][1] cni-installer/<nil> <nil>: /host/secondary-bin-dir is not writeable, skipping
2024-07-19 14:59:09.494 [WARNING][1] cni-installer/<nil> <nil>: Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2024-07-19 14:59:39.522 [ERROR][1] cni-installer/<nil> <nil>: Unable to create token for CNI kubeconfig error=Post "https://10.152.183.1:443/api/v1/namespaces/kube-system/serviceaccounts/calico-cni-plugin/token": dial tcp 10.152.183.1:443: i/o timeout
2024-07-1...

Calico Charm

One of calico instance stuck in "Configuring Calico"

Bug Description

Other bug subscribers

Remote bug watches