containerd sporadic timeouts
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
containerd (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Focal |
Invalid
|
Undecided
|
Unassigned | ||
linux (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Critical
|
Thadeu Lima de Souza Cascardo |
Bug Description
This morning I routinely upgraded security upgrades on number of machines.
Containerd has upgraded from `1.5.9-
What happened next:
at some random time on machines with new containerd something happens with containerd tasks and/or cgroups.
This is how it's seen in syslog:
containerd[710]: time="2022-
And some ctr commands:
# ctr --namespace k8s.io task ls|grep 2f5a8376b476809
2f5a8376b476809
See that the status of the task is UNKNOWN (!!!)
# ctr --namespace k8s.io container ls|grep 2f5a8376b476809
2f5a8376b476809
Cgroups:
├─kubepods-
│ ├─cri-container
│ │ └─2677 /csi-node-
│ ├─cri-container
│ │ └─3264 /usr/local/
│ ├─cri-container
│ │ └─2960 /usr/local/
│ └─cri-container
│ └─2414 /pause
# ps auxf|grep 2414 -B 2
root 2279 0.1 0.0 114100 4956 ? Sl Nov15 0:42 /usr/bin/
65535 2414 0.0 0.0 964 4 ? Ss Nov15 0:00 \_ /pause
It happens not immediately - but after some random time. Sometimes it's several minutes, sometimes it's around an hour. But nonetheless - all machines with the new package get into this weird state.
As long as I revert package - it all returns to run as expected.
Changed in linux (Ubuntu): | |
status: | New → Invalid |
Changed in linux (Ubuntu Focal): | |
status: | New → In Progress |
assignee: | nobody → Thadeu Lima de Souza Cascardo (cascardo) |
importance: | Undecided → Critical |
summary: |
- 1.5.9-0ubuntu1~20.04.5 sporadic timeouts + containerd sporadic timeouts |
One extra note: it's not necessary `pause` process, or `cephcsi` pod like in the example above. The pod/process selection also looks random, or at least I couldn't find any logic behind which one get's "UNKNOWN".
On this very node it's
# ctr --namespace k8s.io task ls|grep -i unknown fd8465cc42c842b dd764d981ca7a90 3a2515bbc6bb067 96a9 0 UNKNOWN bf34cf7a179bca5 cc98a04fa7e00b2 9d20c17d3031d40 9f86 0 UNKNOWN 2eb16661e787e85 db3810727909abd 23d69a6a43578c1 dced 0 UNKNOWN b1696b140ca87f9 1422113bb16b27a 8174437cc63b48e 259a 0 UNKNOWN
5f78e0cb957de97
4e063ef0c8f768d
af070f16c1f0ff2
2f5a8376b476809
where as one may see all the tasks belong the same pod.