docker.io - error adding seccomp filter rule for syscall clone3

Bug #1948361 reported by Ian May
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
docker.io (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Encountered the following error using the docker.io package in focal-proposed running the autotest-client-test/ubuntu_performance_deep_learning test.

"docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall clone3: permission denied: unknown."

This test essentially pulls down a nvidia tensorflow docker container, runs the container and triggers the preloaded tests while capturing the output as results.

The failure is seen with the following version of docker.io
Version: 20.10.7-0ubuntu5~20.04.1
APT-Sources: http://archive.ubuntu.com/ubuntu focal-proposed/universe amd64 Packages

Using the focal-updates docker.io the failure cannot be reproduced
Version: 20.10.7-0ubuntu1~20.04.2
APT-Sources: http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages

To reproduce:

enable focal-proposed

git clone --depth=1 git://kernel.ubuntu.com/ubuntu/autotest-client-tests
git clone --depth=1 git://kernel.ubuntu.com/ubuntu/autotest

ln -sf ~/autotest-client-tests autotest/client/tests

AUTOTEST_PATH=/home/ubuntu/autotest sudo -E autotest/client/autotest-local --verbose autotest/client/tests/ubuntu_performance_deep_learning/control

Ian May (ian-may)
description: updated
Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :
Download full text (4.1 KiB)

Hi Ian,

Thanks for the bug report. However, I was not able to reproduce the failure with the steps that you provided. I did the following in a Focal VM with proposed enabled and docker.io/20.10.7-0ubuntu5~20.04.1 installed:

ubuntu@focal:~$ dpkg -l docker.io
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============-========================-============-=================================
ii docker.io 20.10.7-0ubuntu5~20.04.1 amd64 Linux container runtime
ubuntu@focal:~$ git clone --depth=1 git://kernel.ubuntu.com/ubuntu/autotest-client-tests
Cloning into 'autotest-client-tests'...
remote: Counting objects: 1183, done.
remote: Compressing objects: 100% (1064/1064), done.
remote: Total 1183 (delta 206), reused 536 (delta 95)
Receiving objects: 100% (1183/1183), 49.12 MiB | 8.67 MiB/s, done.
Resolving deltas: 100% (206/206), done.
ubuntu@focal:~$ git clone --depth=1 git://kernel.ubuntu.com/ubuntu/autotest
Cloning into 'autotest'...
remote: Counting objects: 1410, done.
remote: Compressing objects: 100% (1290/1290), done.
remote: Total 1410 (delta 102), reused 1078 (delta 70)
Receiving objects: 100% (1410/1410), 14.31 MiB | 6.14 MiB/s, done.
Resolving deltas: 100% (102/102), done.
ubuntu@focal:~$ ln -sf ~/autotest-client-tests autotest/client/tests
ubuntu@focal:~$ AUTOTEST_PATH=/home/ubuntu/autotest sudo -E autotest/client/autotest-local --verbose autotest/client/tests/ubuntu_performance_deep_learning/control
sudo: unable to execute autotest/client/autotest-local: No such file or directory

Here, I had to pass the relative path to autotest-local and after that install python2 to make it "work":

ubuntu@focal:~$ AUTOTEST_PATH=/home/ubuntu/autotest sudo -E ./autotest/client/autotest-local --verbose ./autotest/client/tests/ubuntu_performance_deep_learning/control
10:54:48 ERROR| JOB ERROR: /home/ubuntu/autotest/client/tests/ubuntu_performance_deep_learning/control: control file not found

Checking the autotest directory I was not able to find a control file for ubuntu_performance_deep_learning:

ubuntu@focal:~$ find ./autotest -name control
./autotest/debian/control
./autotest/client/profilers/readprofile/control
./autotest/client/profilers/lttng/control
./autotest/client/profilers/perf/control
./autotest/client/profilers/inotify/control
./autotest/client/profilers/iostat/control
./autotest/client/profilers/cpistat/control
./autotest/client/profilers/blktrace/control
./autotest/client/profilers/ftrace/control
./autotest/client/profilers/powertop/control
./autotest/client/profilers/kvm_modload/control
./autotest/client/profilers/sar/control
./autotest/client/profilers/lockmeter/control
./autotest/client/profilers/mpstat/control
./autotest/client/profilers/vmstat/control
./autotest/client/profilers/systemtap/control
./autotest/client/profilers/catprofile/control
./autotest/client/profilers/cmdprofile/control
./autotest/client/profilers/oprofile/control
./autotest/client/deps/grubby/control
./autotest/client/deps/pgpool/control
./au...

Read more...

Changed in docker.io (Ubuntu):
status: New → Incomplete
Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

After a quick look I noticed the test did not run because of an issue with the symlink created, I blindly copied and pasted the commands you provided. After fixing it, I am able to run the tests but it failed in my VM because there is not enough disk space to run it. I'll be allocating a VM with more disk and try to re-run the test. I'll let you know the outcome.

Changed in docker.io (Ubuntu):
status: Incomplete → New
Revision history for this message
dann frazier (dannf) wrote :

Note that this test runs with a non-Ubuntu docker runtime (runc) from an Nvidia repo, which allows GPUs to be passed through to containers. AIUI, this update required some interface change to our runc - perhaps the same change(s) is needed on their side?

Revision history for this message
dann frazier (dannf) wrote :

I've filed a bug here to get it on Nvidia's radar, in case the fix is required there:

https://github.com/NVIDIA/nvidia-container-runtime/issues/157

Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote (last edit ):

While trying to reproduce this bug on a focal machine, I realized that the tests run through the nvidia-docker wrapper.

running

$ nvidia-docker run --rm -it docker.io/ubuntu:latest /bin/bash

Crashes with the reported error

"docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall clone3: permission denied: unknown."

While running the same command without the wrapper, succeeds.

$ docker run --rm -it docker.io/ubuntu:latest /bin/bash

Adding a "--runtime nvidia" param to the docker call (with the nvidia-container-runtime from the tests resources pointed above installed) also results in failure

Note that /etc/docker/daemon.json was edited to define the nvidia runtime, as done by the test.

```
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
```

Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

Based on the comment above I believe this is a bug in the nvidia-docker stack (coming from a third party PPA) and not in docker itself. I am marking this bug as Invalid.

Changed in docker.io (Ubuntu):
status: New → Invalid
Revision history for this message
Evan Lezar (evanlezar) wrote :

Hi Lucas, Ian,

Just as a matter of interest, what are the versions of the packages of the NVIDIA Container Stack packages installed on the system where the failure occurs?

Could you run:
```
apt list --installed *nvidia*
```

Revision history for this message
Evan Lezar (evanlezar) wrote :

The nvidia-contaier-runtime is a simple shim for the runc installed on the system. It makes a modification to the OCI spec (inserting a prestart hook) before execing to runc directly.

Revision history for this message
dann frazier (dannf) wrote :

I suggest we debug in https://github.com/NVIDIA/nvidia-container-runtime/issues/157 to avoid duplication. Of course, should that debug highlight an issue with Ubuntu's container stack, we can then reopen this LP.

Revision history for this message
dann frazier (dannf) wrote (last edit ):

fyi, this issue goes away if I rebuild docker.io w/ the following Ubuntu patch dropped:

debian/patches/seccomp-add-support-for-clone3-syscall-in-default-policy.patch

So should we reopen this until we understand what is going on?

Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

@dann, the patch you mentioned is really needed to fix the issue reported in LP #1943049. Without this patch we will not be able to launch/build any Impish based (or greater Ubuntu version) container in any of the supported Ubuntu releases (this is because of the glibc version we are using).

The mentioned bug is blocking the OCI Ubuntu images build for 21.10 (it was already released but we still did not publish new OCI images), so I'd say that we need this patch ASAP.

Revision history for this message
dann frazier (dannf) wrote (last edit ):

@lucaskanashiro I don't doubt the importance of fixing bug 1943049[*]. My concern is that doing so with the current fix will knowingly break nvidia-container-runtime users. Of course, if this is due to a bug in nvidia-container-runtime, they should fix that. But at this point (AFAICT) there still seems to be a possibility that this is a docker regression.

[*] Though, fwiw, I'm not able to reproduce bug 1943094 on either a freshly installed focal or impish host (lxd VM) using an ubuntu:impish container verified to include libc6 2.34-0ubuntu3. But maybe I missed some important step. EDIT: nevermind, I see libc6 is temporarily working around it

Changed in docker.io (Ubuntu):
status: Invalid → Opinion
Revision history for this message
dann frazier (dannf) wrote :

Marking Invalid again, as the fix is confirmed to be in nvidia-container-runtime and is pending.

Changed in docker.io (Ubuntu):
status: Opinion → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.