Comment 1 for bug 1932052

Revision history for this message
George Kraft (cynerva) wrote :

Looking at 3 recent occurrences of this...

https://solutions.qa.canonical.com/testruns/testRun/e9b7200a-ae31-485e-adbd-1568b1119f5f
https://solutions.qa.canonical.com/testruns/testRun/91ea2c66-21fe-45da-b973-6e13a34c3b60
https://solutions.qa.canonical.com/testruns/testRun/5e99e033-12d3-4a7f-a52b-038ba1619de9

In all cases, the first time the calico-node service is started, it gets stopped before the container comes up:

Apr 5 09:14:12 solqa-lab1-server-12 systemd[1]: Starting calico node...
Apr 5 09:14:12 solqa-lab1-server-12 charm-env[532278]: ctr: container "calico-node" in namespace "default": not found
Apr 5 09:14:12 solqa-lab1-server-12 charm-env[532286]: time="2022-04-05T09:14:12Z" level=error msg="failed to delete container \"calico-node\"" error="container \"calico-node\" in namespace \"default\": not found"
Apr 5 09:14:12 solqa-lab1-server-12 charm-env[532286]: ctr: container "calico-node" in namespace "default": not found
Apr 5 09:14:12 solqa-lab1-server-12 systemd[1]: Started calico node.
Apr 5 09:14:12 solqa-lab1-server-12 systemd[1]: Reloading.
Apr 5 09:14:12 solqa-lab1-server-12 systemd[1]: Reloading.
Apr 5 09:14:13 solqa-lab1-server-12 systemd[1]: Stopping calico node...
Apr 5 09:14:13 solqa-lab1-server-12 charm-env[532611]: ctr: container "calico-node" in namespace "default": not found
Apr 5 09:14:13 solqa-lab1-server-12 charm-env[532618]: time="2022-04-05T09:14:13Z" level=error msg="failed to delete container \"calico-node\"" error="container \"calico-node\" in namespace \"default\": not found"
Apr 5 09:14:13 solqa-lab1-server-12 charm-env[532618]: ctr: container "calico-node" in namespace "default": not found
Apr 5 09:14:15 solqa-lab1-server-12 systemd[1]: calico-node.service: Succeeded.
Apr 5 09:14:15 solqa-lab1-server-12 systemd[1]: Stopped calico node.

After that, all attempts to start calico-node fail:

Apr 5 09:14:15 solqa-lab1-server-12 systemd[1]: Starting calico node...
Apr 5 09:14:15 solqa-lab1-server-12 charm-env[532701]: ctr: container "calico-node" in namespace "default": not found
Apr 5 09:14:15 solqa-lab1-server-12 charm-env[532708]: time="2022-04-05T09:14:15Z" level=error msg="failed to delete container \"calico-node\"" error="container \"calico-node\" in namespace \"default\": not found"
Apr 5 09:14:15 solqa-lab1-server-12 charm-env[532708]: ctr: container "calico-node" in namespace "default": not found
Apr 5 09:14:15 solqa-lab1-server-12 systemd[1]: Started calico node.
Apr 5 09:14:15 solqa-lab1-server-12 systemd[1]: Reloading.
Apr 5 09:14:15 solqa-lab1-server-12 charm-env[532755]: ctr: snapshot "calico-node": already exists
Apr 5 09:14:15 solqa-lab1-server-12 systemd[1]: calico-node.service: Main process exited, code=exited, status=1/FAILURE
Apr 5 09:14:15 solqa-lab1-server-12 systemd[1]: calico-node.service: Failed with result 'exit-code'.

The key error seems to be this:

ctr: snapshot "calico-node": already exists

There's some lingering state that's preventing the container from starting, and that state isn't getting cleaned up. Seems like a containerd or ctr bug of some sort.