autopkgtests kill ssh

Bug #1576419 reported by Michael Hudson-Doyle on 2016-04-28
This bug affects 1 person
Affects Status Importance Assigned to Milestone (Ubuntu)

Bug Description

As reported in bug #1576387, after all the battles to get the autopkgtests running, they now kill sshd during the run, causing the infrastructure to loop endlessly.

Martin Pitt (pitti) wrote :

The test runs fine up to this point:

PASS: docker_api_containers_test.go:902: DockerSuite.TestContainerApiKill 0.420s
PASS: docker_api_containers_test.go:441: DockerSuite.TestContainerApiPause 0.491s
PASS: docker_api_containers_test.go:800: DockerSuite.TestContainerApiPostCreateNull 0.291s
PASS: docker_api_containers_test.go:887: DockerSuite.TestContainerApiRename 0.421s
PASS: docker_api_containers_test.go:916: DockerSuite.TestContainerApiRestart 1.555s
Connection to closed by remote host.^M
adt-run [17:08:10]: ERROR: testbed failure: testbed auxverb failed with exit code 255

Then it apparently kills the remote sshd end of the ssh session, which makes the controller's ssh process exit with 255 (255 is ssh's code for an interrupted connection ).

I checked five different worker logs and it's always the same place where this happens. So whatever the next test after TestContainerApiRestart is is likely the one that causes this.

This sounds suspiciously like OOM killing sshd, not Docker's tests
directly -- I know the integration tests used to require anywhere from
1 to 4 GB of RAM, and I imagine that's probably still the case. :(

Martin Pitt (pitti) wrote :

FTR, this still happens when running the tests in an m1.large instance (8 GB RAM). I was trying to reproduce locally with

  adt-run --apt-pocket=proposed --- qemu adt-yakkety-amd64-cloud.img

and have ssh open to the VM while it runs (port 10022 on localhost, user:pass ubuntu:ubuntu)

but they hang at

Step 5 : RUN gcc -g -Wall -static userns.c -o /usr/bin/userns-test && gcc -g -Wall -static ns.c -o /usr/bin/ns-test && gcc -g -Wall -static acct.c -o /usr/bin/acct-test
 ---> Running in e064dc8c3809
Container command not found or does not exist.
./hack/ line 258: local: can only be used in a function

which is much earlier than the point above where it killed ssh. ssh is still working at that point, although /proc is seriously broken (not sure what the test does to that -- "top", "mount" etc. don't work any more).

Tianon Gravi (tianon) wrote :

I ran it locally with a 4GB VM and managed to get through all the tests without sshd dying. :(

(obviously we've got some failures that we'll need to figure out, but probably only worth figuring out the failures if we can figure out why sshd is getting killed D: )


OOPS: 192 passed, 120 skipped, 894 FAILED, 6 MISSED
--- FAIL: Test (1244.15s)
coverage: 47.5% of statements
exit status 1
FAIL _/tmp/adt-run.Y8vt5w/build.Xj2/real-tree/integration-cli 1244.578s
---> Making bundle: .integration-daemon-stop (in bundles/1.10.3/test-integration-cli)
+++++ cat bundles/1.10.3/test-integration-cli/
++++ kill 21965
adt-run [16:35:41]: test integration: -----------------------]
adt-run [16:35:42]: test integration: - - - - - - - - - - results - - - - - - - - - -
integration FAIL non-zero exit status 1
adt-run [16:35:42]: @@@@@@@@@@@@@@@@@@@@ summary
integration FAIL non-zero exit status 1
qemu-system-x86_64: terminating on signal 15 from pid 6

Martin Pitt (pitti) wrote :

I ran it again locally this morning, and the hang from last night didn't occur. Now the tests actually start to run. At some point an awful lot of them fail with something like

FAIL: docker_cli_build_unix_test.go:85: DockerSuite.TestBuildAddChangeOwnership

    c.Fatalf("build failed to complete for TestBuildAddChangeOwnership: %v", err)
... Error: build failed to complete for TestBuildAddChangeOwnership: failed to build the image: Sending build context to Docker daemon 2.56 kB
Error response from daemon: Untar error on re-exec cmd: fork/exec /proc/self/exe: no such file or directory

which is exactly the same "test breaks /proc" effect that I noticed before. Tianon, do you get these as well? 894 failures certainly leave enough room for those :-)

But it also didn't kill sshd here.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers