In all cases, the first time the calico-node service is started, it gets stopped before the container comes up:
Apr 5 09:14:12 solqa-lab1-server-12 systemd[1]: Starting calico node...
Apr 5 09:14:12 solqa-lab1-server-12 charm-env[532278]: ctr: container "calico-node" in namespace "default": not found
Apr 5 09:14:12 solqa-lab1-server-12 charm-env[532286]: time="2022-04-05T09:14:12Z" level=error msg="failed to delete container \"calico-node\"" error="container \"calico-node\" in namespace \"default\": not found"
Apr 5 09:14:12 solqa-lab1-server-12 charm-env[532286]: ctr: container "calico-node" in namespace "default": not found
Apr 5 09:14:12 solqa-lab1-server-12 systemd[1]: Started calico node.
Apr 5 09:14:12 solqa-lab1-server-12 systemd[1]: Reloading.
Apr 5 09:14:12 solqa-lab1-server-12 systemd[1]: Reloading.
Apr 5 09:14:13 solqa-lab1-server-12 systemd[1]: Stopping calico node...
Apr 5 09:14:13 solqa-lab1-server-12 charm-env[532611]: ctr: container "calico-node" in namespace "default": not found
Apr 5 09:14:13 solqa-lab1-server-12 charm-env[532618]: time="2022-04-05T09:14:13Z" level=error msg="failed to delete container \"calico-node\"" error="container \"calico-node\" in namespace \"default\": not found"
Apr 5 09:14:13 solqa-lab1-server-12 charm-env[532618]: ctr: container "calico-node" in namespace "default": not found
Apr 5 09:14:15 solqa-lab1-server-12 systemd[1]: calico-node.service: Succeeded.
Apr 5 09:14:15 solqa-lab1-server-12 systemd[1]: Stopped calico node.
After that, all attempts to start calico-node fail:
Apr 5 09:14:15 solqa-lab1-server-12 systemd[1]: Starting calico node...
Apr 5 09:14:15 solqa-lab1-server-12 charm-env[532701]: ctr: container "calico-node" in namespace "default": not found
Apr 5 09:14:15 solqa-lab1-server-12 charm-env[532708]: time="2022-04-05T09:14:15Z" level=error msg="failed to delete container \"calico-node\"" error="container \"calico-node\" in namespace \"default\": not found"
Apr 5 09:14:15 solqa-lab1-server-12 charm-env[532708]: ctr: container "calico-node" in namespace "default": not found
Apr 5 09:14:15 solqa-lab1-server-12 systemd[1]: Started calico node.
Apr 5 09:14:15 solqa-lab1-server-12 systemd[1]: Reloading.
Apr 5 09:14:15 solqa-lab1-server-12 charm-env[532755]: ctr: snapshot "calico-node": already exists
Apr 5 09:14:15 solqa-lab1-server-12 systemd[1]: calico-node.service: Main process exited, code=exited, status=1/FAILURE
Apr 5 09:14:15 solqa-lab1-server-12 systemd[1]: calico-node.service: Failed with result 'exit-code'.
The key error seems to be this:
ctr: snapshot "calico-node": already exists
There's some lingering state that's preventing the container from starting, and that state isn't getting cleaned up. Seems like a containerd or ctr bug of some sort.
Looking at 3 recent occurrences of this...
https:/ /solutions. qa.canonical. com/testruns/ testRun/ e9b7200a- ae31-485e- adbd-1568b1119f 5f /solutions. qa.canonical. com/testruns/ testRun/ 91ea2c66- 21fe-45da- b973-6e13a34c3b 60 /solutions. qa.canonical. com/testruns/ testRun/ 5e99e033- 12d3-4a7f- a52b-038ba1619d e9
https:/
https:/
In all cases, the first time the calico-node service is started, it gets stopped before the container comes up:
Apr 5 09:14:12 solqa-lab1- server- 12 systemd[1]: Starting calico node... server- 12 charm-env[532278]: ctr: container "calico-node" in namespace "default": not found server- 12 charm-env[532286]: time="2022- 04-05T09: 14:12Z" level=error msg="failed to delete container \"calico-node\"" error="container \"calico-node\" in namespace \"default\": not found" server- 12 charm-env[532286]: ctr: container "calico-node" in namespace "default": not found server- 12 systemd[1]: Started calico node. server- 12 systemd[1]: Reloading. server- 12 systemd[1]: Reloading. server- 12 systemd[1]: Stopping calico node... server- 12 charm-env[532611]: ctr: container "calico-node" in namespace "default": not found server- 12 charm-env[532618]: time="2022- 04-05T09: 14:13Z" level=error msg="failed to delete container \"calico-node\"" error="container \"calico-node\" in namespace \"default\": not found" server- 12 charm-env[532618]: ctr: container "calico-node" in namespace "default": not found server- 12 systemd[1]: calico- node.service: Succeeded. server- 12 systemd[1]: Stopped calico node.
Apr 5 09:14:12 solqa-lab1-
Apr 5 09:14:12 solqa-lab1-
Apr 5 09:14:12 solqa-lab1-
Apr 5 09:14:12 solqa-lab1-
Apr 5 09:14:12 solqa-lab1-
Apr 5 09:14:12 solqa-lab1-
Apr 5 09:14:13 solqa-lab1-
Apr 5 09:14:13 solqa-lab1-
Apr 5 09:14:13 solqa-lab1-
Apr 5 09:14:13 solqa-lab1-
Apr 5 09:14:15 solqa-lab1-
Apr 5 09:14:15 solqa-lab1-
After that, all attempts to start calico-node fail:
Apr 5 09:14:15 solqa-lab1- server- 12 systemd[1]: Starting calico node... server- 12 charm-env[532701]: ctr: container "calico-node" in namespace "default": not found server- 12 charm-env[532708]: time="2022- 04-05T09: 14:15Z" level=error msg="failed to delete container \"calico-node\"" error="container \"calico-node\" in namespace \"default\": not found" server- 12 charm-env[532708]: ctr: container "calico-node" in namespace "default": not found server- 12 systemd[1]: Started calico node. server- 12 systemd[1]: Reloading. server- 12 charm-env[532755]: ctr: snapshot "calico-node": already exists server- 12 systemd[1]: calico- node.service: Main process exited, code=exited, status=1/FAILURE server- 12 systemd[1]: calico- node.service: Failed with result 'exit-code'.
Apr 5 09:14:15 solqa-lab1-
Apr 5 09:14:15 solqa-lab1-
Apr 5 09:14:15 solqa-lab1-
Apr 5 09:14:15 solqa-lab1-
Apr 5 09:14:15 solqa-lab1-
Apr 5 09:14:15 solqa-lab1-
Apr 5 09:14:15 solqa-lab1-
Apr 5 09:14:15 solqa-lab1-
The key error seems to be this:
ctr: snapshot "calico-node": already exists
There's some lingering state that's preventing the container from starting, and that state isn't getting cleaned up. Seems like a containerd or ctr bug of some sort.