autopkgtests kill ssh

Bug #1576419 reported by Michael Hudson-Doyle
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
docker.io (Ubuntu)
New
Undecided
Unassigned

Bug Description

As reported in bug #1576387, after all the battles to get the docker.io autopkgtests running, they now kill sshd during the run, causing the infrastructure to loop endlessly.

Revision history for this message
Martin Pitt (pitti) wrote :

The test runs fine up to this point:

PASS: docker_api_containers_test.go:902: DockerSuite.TestContainerApiKill 0.420s
PASS: docker_api_containers_test.go:441: DockerSuite.TestContainerApiPause 0.491s
PASS: docker_api_containers_test.go:800: DockerSuite.TestContainerApiPostCreateNull 0.291s
PASS: docker_api_containers_test.go:887: DockerSuite.TestContainerApiRename 0.421s
PASS: docker_api_containers_test.go:916: DockerSuite.TestContainerApiRestart 1.555s
Connection to 10.42.43.169 closed by remote host.^M
adt-run [17:08:10]: ERROR: testbed failure: testbed auxverb failed with exit code 255

Then it apparently kills the remote sshd end of the ssh session, which makes the controller's ssh process exit with 255 (255 is ssh's code for an interrupted connection ).

I checked five different worker logs and it's always the same place where this happens. So whatever the next test after TestContainerApiRestart is is likely the one that causes this.

Revision history for this message
Tianon Gravi (tianon) wrote : Re: [Bug 1576419] Re: autopkgtests kill ssh

This sounds suspiciously like OOM killing sshd, not Docker's tests
directly -- I know the integration tests used to require anywhere from
1 to 4 GB of RAM, and I imagine that's probably still the case. :(

Revision history for this message
Martin Pitt (pitti) wrote :

FTR, this still happens when running the tests in an m1.large instance (8 GB RAM). I was trying to reproduce locally with

  adt-run --apt-pocket=proposed docker.io --- qemu adt-yakkety-amd64-cloud.img

and have ssh open to the VM while it runs (port 10022 on localhost, user:pass ubuntu:ubuntu)

but they hang at

Step 5 : RUN gcc -g -Wall -static userns.c -o /usr/bin/userns-test && gcc -g -Wall -static ns.c -o /usr/bin/ns-test && gcc -g -Wall -static acct.c -o /usr/bin/acct-test
 ---> Running in e064dc8c3809
Container command not found or does not exist.
./hack/make.sh: line 258: local: can only be used in a function

which is much earlier than the point above where it killed ssh. ssh is still working at that point, although /proc is seriously broken (not sure what the test does to that -- "top", "mount" etc. don't work any more).

Revision history for this message
Tianon Gravi (tianon) wrote :

I ran it locally with a 4GB VM and managed to get through all the tests without sshd dying. :(

(obviously we've got some failures that we'll need to figure out, but probably only worth figuring out the failures if we can figure out why sshd is getting killed D: )

....

OOPS: 192 passed, 120 skipped, 894 FAILED, 6 MISSED
--- FAIL: Test (1244.15s)
FAIL
coverage: 47.5% of statements
exit status 1
FAIL _/tmp/adt-run.Y8vt5w/build.Xj2/real-tree/integration-cli 1244.578s
---> Making bundle: .integration-daemon-stop (in bundles/1.10.3/test-integration-cli)
+++++ cat bundles/1.10.3/test-integration-cli/docker.pid
++++ kill 21965
adt-run [16:35:41]: test integration: -----------------------]
adt-run [16:35:42]: test integration: - - - - - - - - - - results - - - - - - - - - -
integration FAIL non-zero exit status 1
adt-run [16:35:42]: @@@@@@@@@@@@@@@@@@@@ summary
integration FAIL non-zero exit status 1
qemu-system-x86_64: terminating on signal 15 from pid 6

Revision history for this message
Martin Pitt (pitti) wrote :

I ran it again locally this morning, and the hang from last night didn't occur. Now the tests actually start to run. At some point an awful lot of them fail with something like

FAIL: docker_cli_build_unix_test.go:85: DockerSuite.TestBuildAddChangeOwnership

docker_cli_build_unix_test.go:120:
    c.Fatalf("build failed to complete for TestBuildAddChangeOwnership: %v", err)
... Error: build failed to complete for TestBuildAddChangeOwnership: failed to build the image: Sending build context to Docker daemon 2.56 kB
Error response from daemon: Untar error on re-exec cmd: fork/exec /proc/self/exe: no such file or directory

which is exactly the same "test breaks /proc" effect that I noticed before. Tianon, do you get these as well? 894 failures certainly leave enough room for those :-)

But it also didn't kill sshd here.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.