adt-run gets stopped when running under timeout

Bug #1415944 reported by Vincent Ladeuil
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
autopkgtest (Ubuntu)
Triaged
Low
Unassigned

Bug Description

 $ mkdir tmp
 $ adt-buildvm-ubuntu-cloud -v -r vivid
 $ bzr branch lp:~canonical-ci-engineering/ubuntu-test-cases/boottest
$ export ADT_VIRT="adt-virt-qemu ./adt-vivid-amd64-cloud.img"
$ boottest/scripts/boottest.sh vivid libpng

This ends with:

adt-run [15:40:11]: version 3.9.3-1~1034~ubuntu14.04.1
adt-run: DBG: $ vserver: adt-virt-qemu /home/vila/ci/adt-vivid-amd64-cloud.img
adt-run: DBG: got reply from testbed: ok
adt-run: DBG: testbed open, scratch=None
adt-run: DBG: sending command to testbed: open
adt-run: DBG: got reply from testbed: ok /tmp/adt-run.dKQGkT
adt-run: DBG: sending command to testbed: print-execute-command
adt-run: DBG: got reply from testbed: ok /tmp/adt-virt-qemu.wuid9js5/runcmd
adt-run: DBG: sending command to testbed: capabilities
adt-run: DBG: got reply from testbed: ok revert revert-full-system root-on-testbed isolation-machine reboot suggested-normal-user=ubuntu
adt-run: DBG: testbed capabilities: ['revert', 'revert-full-system', 'root-on-testbed', 'isolation-machine', 'reboot', 'suggested-normal-user=ubuntu']
adt-run [15:40:30]: @@@@@@@@@@@@@@@@@@@@ test bed setup
adt-run: DBG: testbed command ['sh', '-ec', 'awk \'/^deb(-src)? .*(ubuntu.com|debian.org|ftpmaster)/ { if ($3 !~ /-/) { $3 = $3"-proposed"; print }}\' /etc/apt/sources.list `ls /etc/apt/sources.list.d/*.list 2>/dev/null|| true` > /etc/apt/sources.list.d/proposed.list'], kind short, sout raw, serr pipe, env ['LANG=C.UTF-8']

The boottest.sh scripts includes a 'timeout 600 adt-run ..." and everything gets killed properly after hanging on the above.

Revision history for this message
Martin Pitt (pitti) wrote :

Indeed when you run this, the whole set of processes goes into "T" state, i. e. it received a SIGSTOP:

martin 23439 0.0 0.1 64656 18396 pts/7 T 17:42 0:00 /usr/bin/python3 -u /usr/bin/adt-run --debug --no-built-binaries --unbuilt-tree /tmp/boottest/scripts/../tests/getpkgsrc -o pkgsrc --apt-pocket=proposed --setup-commands=adt-commands --- qemu /srv/vm/adt-vivid-amd64-cloud.img
martin 23440 0.0 0.0 13264 1804 pts/7 T 17:42 0:00 tee /tmp/adt-fifo-log.c9x0c1vs
martin 23441 0.0 0.0 13264 1984 pts/7 T 17:42 0:00 tee /tmp/adt-fifo-log.c9x0c1vs -a /dev/stderr
martin 23442 0.0 0.0 13044 668 pts/7 T 17:42 0:00 cat /tmp/adt-fifo-log.c9x0c1vs
martin 23443 0.0 0.0 48012 14580 pts/7 T 17:42 0:00 /usr/bin/python3 /usr/bin/adt-virt-qemu /srv/vm/adt-vivid-amd64-cloud.img
martin 23446 8.1 2.2 2209192 359176 pts/7 Tl 17:42 0:13 qemu-system-x86_64 -enable-kvm -m 1024 -smp 1 -localtime -nographic -net user -net nic,model=virtio -monitor unix:/tmp/adt-virt-qemu.sf04xpyf/monitor,server,nowait -serial unix:/tmp/adt-virt-qemu.sf04xpyf/ttyS0,server,nowait -serial unix:/tmp/adt-virt-qemu.sf04xpyf/ttyS1,server,nowait -virtfs local,id=autopkgtest,path=/tmp/adt-virt-qemu.sf04xpyf/shared,security_model=none,mount_tag=autopkgtest -drive file=/tmp/adt-virt-qemu.sf04xpyf/overlay.img,cache=unsafe,if=virtio,index=0 -redir tcp:10022::22
martin 23474 0.0 0.0 120700 12104 pts/7 Tl 17:43 0:00 /usr/bin/python3 /tmp/adt-virt-qemu.sf04xpyf/runcmd env LANG=C.UTF-8 sh -ec awk '/^deb(-src)? .*(ubuntu.com|debian.org|ftpmaster)/ { if ($3 !~ /-/) { $3 = $3"-proposed"; print }}' /etc/apt/sources.list `ls /etc/apt/sources.list.d/*.list 2>/dev/null|| true` > /etc/apt/sources.list.d/proposed.list

At first I thought running adt-run under "timeout" would cause this, and suspected some interaction between the timeout program's timer with adt-run's timeouts; but timeout uses Linux hrtimers and shouldn't get in the way with adt-run's alarm().

When I run the command that boottest.sh runs manually, it works fine:

+ timeout 600 adt-run --debug --no-built-binaries --unbuilt-tree /tmp/boottest/scripts/../tests/getpkgsrc -o pkgsrc --apt-pocket=proposed --setup-commands=adt-commands --- qemu /srv/vm/adt-vivid-amd64-cloud.img

So we need to find out what causes adt-run to get STOPed.

Revision history for this message
Martin Pitt (pitti) wrote :

If I drop the "timeout" bit from $ADT_CMD, it works fine as well. When prepending the timeout call in boottest.sh with strace -vvtts1024 -o /tmp/tr, I get

18:07:16.733950 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f11825c3a10) = 26085
18:07:16.734086 rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0
18:07:16.734154 timer_create(CLOCK_REALTIME, {0x2518030, SIGALRM, SIGEV_SIGNAL, {...}}, {0}) = 0
18:07:16.734209 timer_settime(0, 0, {it_interval={0, 0}, it_value={600, 0}}, NULL) = 0
18:07:16.734260 wait4(26085, 0x7fffad6743dc, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
18:07:35.411480 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_STOPPED, si_pid=26085, si_status=SIGTTIN, si_utime=11, si_stime=1} ---
18:07:35.411537 --- SIGTTIN {si_signo=SIGTTIN, si_code=SI_KERNEL, si_value={int=21, ptr=0x15}} ---
18:07:35.411602 wait4(26085,

At that time (18:07:35.) all the child processes get stopped.

This isn't specific to qemu, it also happens with the LXC runner.

summary: - adt-virt-qemu hang
+ adt-run gets stopped when running under timeout
Revision history for this message
Martin Pitt (pitti) wrote :

Note: Running "timeout --foreground 600 adt-run ..." works as well; that might not be precisely what you're after, though.

Revision history for this message
Martin Pitt (pitti) wrote :

Simple reproducer is this command:

$ sh -c 'timeout 600 adt-run -d libpng --- qemu /srv/vm/adt-vivid-amd64-cloud.img'

whereas

$ timeout 600 adt-run -d libpng --- qemu /srv/vm/adt-vivid-amd64-cloud.img

works fine. So interactive shells have a different TTY and/or signal setup than noninteractive ones.

Martin Pitt (pitti)
Changed in autopkgtest (Ubuntu):
status: New → Triaged
importance: Undecided → Low
Revision history for this message
Vincent Ladeuil (vila) wrote :

Right, no need to further debug timeout/adt-run bad interactions indeed.

The initial need was to make sure adt-run will always timeout. If we have a way to do that while all missing timeouts are added to adt-run (if some are still missing...), we're good.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.