context deadline exceeded: unknown in containerd with latest runc version
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
runc (Ubuntu) |
Fix Released
|
High
|
Lucas Kanashiro | ||
Bionic |
Fix Released
|
High
|
Lucas Kanashiro | ||
Focal |
Fix Released
|
High
|
Lucas Kanashiro | ||
Groovy |
Fix Released
|
High
|
Lucas Kanashiro | ||
Hirsute |
Fix Released
|
High
|
Lucas Kanashiro |
Bug Description
[Impact]
Several regressions were identified by upstream in version 1.0.0-rc93 and were fixed in version 1.0.0-rc94:
https:/
"This release fixes several regressions found in v1.0.0-rc93. We
recommend users update as soon as possible."
And in version 1.0.0-rc95 we also have the fix for CVE-2021-30465.
[Test Plan]
Per https:/
[Where problems could occur]
As usual, we deliver most benefit to our users by delivering an upstream experience. A risk of regressions is part of that.
[Original Message]
When upgrading runc to the latest version from focal-updates, Kubernetes, using containerd, fails to start new containers after a certain amount of container have been reached a certain amount (more than 100 ~ 150 containers).
With the previous version of runc, I was able to run more than 340 containers on a single server without any issue.
I got those logs in containerd (journalctl -u containerd)
```
May 05 00:48:17 node6 containerd[
May 05 00:48:21 node6 containerd[
May 05 00:48:23 node6 containerd[
May 05 00:48:25 node6 containerd[
May 05 00:48:27 node6 containerd[
May 05 00:48:29 node6 containerd[
May 05 00:48:31 node6 containerd[
```
This version of runc triggered the problem:
```
runc (1.0.0~
* Backport version 1.0.0~rc93-0ubuntu1 from Hirsute (LP: #1919322,
LP: #1916485).
-- Lucas Kanashiro <email address hidden> Tue, 16 Mar 2021 15:34:35 -0300
```
```
# runc -v
runc version spec: 1.0.2-dev
go: go1.13.8
libseccomp: 2.5.1
```
Reverting to the previous version of runc solved the problem, and I was able to run more than 340 pods / containers without any error.
```
apt-get install runc=1.
# runc -v
runc version spec: 1.0.1-dev
```
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: runc 1.0.0~rc93-
ProcVersionSign
Uname: Linux 5.4.0-72-generic x86_64
ApportVersion: 2.20.11-
Architecture: amd64
CasperMD5CheckR
Date: Wed May 5 12:06:30 2021
SourcePackage: runc
UpgradeStatus: No upgrade log present (probably fresh install)
CVE References
description: | updated |
Changed in runc (Ubuntu): | |
assignee: | nobody → Lucas Kanashiro (lucaskanashiro) |
Changed in runc (Ubuntu Focal): | |
assignee: | nobody → Lucas Kanashiro (lucaskanashiro) |
Changed in runc (Ubuntu Groovy): | |
assignee: | nobody → Lucas Kanashiro (lucaskanashiro) |
Changed in runc (Ubuntu Bionic): | |
assignee: | nobody → Lucas Kanashiro (lucaskanashiro) |
Changed in runc (Ubuntu Hirsute): | |
assignee: | nobody → Lucas Kanashiro (lucaskanashiro) |
tags: | added: seg |
tags: | added: sts |
Changed in runc (Ubuntu Bionic): | |
importance: | Undecided → High |
Changed in runc (Ubuntu Focal): | |
importance: | Undecided → High |
Changed in runc (Ubuntu Groovy): | |
importance: | Undecided → High |
Changed in runc (Ubuntu Hirsute): | |
importance: | Undecided → High |
Hello and thanks for this bug report. I had a look a the runc bug reports and changes to see if I could easily spot a relevant change, but I couldn't. I don't have a good hypothesis of what could be wrong, but I'd start checking if the newer runc is causing a higher memory or CPU usage, causing components to timeout ("context deadline exceeded") or OOMs.
Could you please:
- Attach the kernel log (dmesg) to this bug, captured after
hitting those containerd errors?
- Check the system load and memory usage when using the two
different versions of runc. This could be done for example
via something like `vmstat -S M 5`.
If no useful clues are found, then I think we'll have to bisect.