containerd-shim deadlocks, then crashes

Bug #1895647 reported by Marius Gedminas on 2020-09-15

This bug report will be marked for expiration in 51 days if no further activity occurs. (find out why)

8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
containerd (Ubuntu)
Undecided
Unassigned

Bug Description

I'm using docker-compose to wrangle a bunch of containers, and quite often one of my containers hangs.

When I inspect the process with strace, I see it's blocked on write(2, "...").

/proc/$container_pid/fd/2 is a pipe, the other end of which is managed by a containerd-shim process.

strace -p $containerd_shim_pid shows it blocked in futex(0xad3848, FUTEX_WAIT_PRIVATE, 0, NULL).

After enough time passes (around 5 or 10 minutes?) I see that containerd-shim crash with a SIGABRT. This time I had strace still attached:

strace: Process 861273 attached
futex(0xad3848, FUTEX_WAIT_PRIVATE, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGABRT {si_signo=SIGABRT, si_code=SI_USER, si_pid=867057, si_uid=0} ---
nanosleep({tv_sec=0, tv_nsec=1000000}, NULL) = 0
nanosleep({tv_sec=0, tv_nsec=1000000}, NULL) = 0
write(2, "SIGABRT: abort", 14) = 14
write(2, "\n", 1) = 1
write(2, "PC=", 3) = 3
write(2, "0x45c791", 8) = 8
write(2, " m=", 3) = 3
write(2, "0", 1) = 1
write(2, " sigcode=", 9) = 9
write(2, "0", 1) = 1
write(2, "\n", 1) = 1
write(2, "\n", 1) = 1
write(2, "goroutine ", 10) = 10
write(2, "0", 1) = 1
write(2, " [", 2) = 2
write(2, "idle", 4) = 4
write(2, "]:\n", 3) = 3
write(2, "runtime.futex", 13) = 13
...
write(2, "rflags ", 7) = 7
write(2, "0x286", 5) = 5
write(2, "\n", 1) = 1
write(2, "cs ", 7) = 7
write(2, "0x33", 4) = 4
write(2, "\n", 1) = 1
write(2, "fs ", 7) = 7
write(2, "0x0", 3) = 3
write(2, "\n", 1) = 1
write(2, "gs ", 7) = 7
write(2, "0x0", 3) = 3
write(2, "\n", 1) = 1
exit_group(2) = ?
+++ exited with 2 +++

(Full strace log attached, unless I forget)

It would be nice if I could read that Go traceback somewhere instead of looking at truncated strace writes, but I don't know where.

journalctl -u containerd shows only this:

rugs. 15 12:10:32 blynas containerd[1133]: time="2020-09-15T12:10:32.805932666+03:00" level=info msg="shim containerd-shim started" address=/containerd-shim/a65286bda8fa7242d30c2a351a60d90c344ee6d8af60a7487b1efe75014914c3.sock debug=false pid=861273
rugs. 15 12:10:33 blynas containerd[1133]: time="2020-09-15T12:10:33.489963805+03:00" level=info msg="shim containerd-shim started" address=/containerd-shim/863c8b7d5dc91b24ae633dbf6efc21b2d9f4973ee6dd4bdf2cb855556868f9de.sock debug=false pid=861500
rugs. 15 12:10:34 blynas containerd[1133]: time="2020-09-15T12:10:34.197921760+03:00" level=info msg="shim containerd-shim started" address=/containerd-shim/08c9c71a81352069f02b14389fb0a636484071e76a443665cb3eefa781a86f3a.sock debug=false pid=861671
rugs. 15 12:11:03 blynas containerd[1133]: time="2020-09-15T12:11:03.521908472+03:00" level=info msg="shim containerd-shim started" address=/containerd-shim/32b7b8e90d39f2cc59125f5072f7b6012a2e784dc79101694fa749bdfedee8bc.sock debug=false pid=862495
rugs. 15 12:20:09 blynas containerd[1133]: time="2020-09-15T12:20:09.424130178+03:00" level=info msg="shim reaped" id=b76c63912cae0a668aa1f0b9baa2bd74a6c5bbb6a32c1bcae351028cfb101f78
rugs. 15 12:20:09 blynas containerd[1133]: time="2020-09-15T12:20:09.424177796+03:00" level=warning msg="cleaning up after shim dead" id=b76c63912cae0a668aa1f0b9baa2bd74a6c5bbb6a32c1bcae351028cfb101f78 namespace=moby
rugs. 15 12:20:12 blynas containerd[1133]: time="2020-09-15T12:20:12.765586972+03:00" level=info msg="shim reaped" id=4ea9c0c055e7df514cb4cdb7b6aae7db2ecfea299b712907a229e464827ef219

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: containerd 1.3.3-0ubuntu2
ProcVersionSignature: Ubuntu 5.4.0-47.51-generic 5.4.55
Uname: Linux 5.4.0-47-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu27.8
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
Date: Tue Sep 15 12:20:56 2020
EcryptfsInUse: Yes
InstallationDate: Installed on 2019-06-12 (460 days ago)
InstallationMedia: Ubuntu 19.04 "Disco Dingo" - Release amd64 (20190416)
SourcePackage: containerd
UpgradeStatus: Upgraded to focal on 2020-04-24 (143 days ago)

Marius Gedminas (mgedmin) wrote :

Hello Marius,

Thank you for taking the time to file a bug report.

Since there is not enough information in your report to begin triage or to
differentiate between a local configuration problem and a bug in Ubuntu, I
am marking this bug as "Incomplete".

I know you had trouble in getting the strace output already but, if you don't mind, could you please provide us a short reproducer ? This way me or some other person is able to reproduce it locally and chose preferred tools to debug the issue.

Please change status back to New once you're done so this can be triaged again by someone from the Ubuntu Server team.

Thanks a lot.

-rafaeldtinoco

Changed in containerd (Ubuntu):
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers