autopkgtest-cloud running out of space is not handled well

Bug #2012667 reported by Brian Murray
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Auto Package Testing
New
Undecided
Unassigned

Bug Description

Some autopkgtests were running on autopkgtest-cloud-worker-lrg/7 when it ran out of disk space. From the test log we can see:

autopkgtest [16:44:34]: testing package golang-github-bep-overlayfs version 0.6.0-2
autopkgtest [16:44:34]: ERROR: unexpected error:
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/autopkgtest/runner/autopkgtest", line 843, in main
    process_actions()
  File "/home/ubuntu/autopkgtest/runner/autopkgtest", line 767, in process_actions
    tests_tree = build_source(kind, arg, built_binaries)
  File "/home/ubuntu/autopkgtest/runner/autopkgtest", line 572, in build_source
    f.write('%s %s\n' % (testpkg_name, testpkg_version))
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/autopkgtest/runner/autopkgtest", line 855, in <module>
    main()
  File "/home/ubuntu/autopkgtest/runner/autopkgtest", line 845, in main
    errorcode = print_exception(sys.exc_info(), '')
  File "/home/ubuntu/autopkgtest/runner/autopkgtest", line 248, in print_exception
    adtlog.psummary('quitting: unexpected error, see log')
  File "/home/ubuntu/autopkgtest/lib/adtlog.py", line 101, in psummary
    summary_stream.write(m.encode('UTF-8'))
OSError: [Errno 28] No space left on device

And this the journal from around that time frame:

Mar 22 16:44:30 lrg-root4 /home/ubuntu/autopkgtest-cloud/worker/worker[706687]: INFO: autopkgtest exited with code 20
Mar 22 16:44:30 lrg-root4 sh[706687]: OSError: [Errno 28] No space left on device
Mar 22 16:44:30 lrg-root4 sh[706687]: During handling of the above exception, another exception occurred:
Mar 22 16:44:30 lrg-root4 sh[706687]: Traceback (most recent call last):
Mar 22 16:44:30 lrg-root4 sh[706687]: File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 1123, in <module>
Mar 22 16:44:30 lrg-root4 sh[706687]: main()
Mar 22 16:44:30 lrg-root4 sh[706687]: File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 1116, in main
Mar 22 16:44:30 lrg-root4 sh[706687]: queue.wait()
Mar 22 16:44:30 lrg-root4 sh[706687]: File "/usr/lib/python3/dist-packages/amqplib/client_0_8/abstract_channel.py", line 97, in wait
Mar 22 16:44:30 lrg-root4 sh[706687]: return self.dispatch_method(method_sig, args, content)
Mar 22 16:44:30 lrg-root4 sh[706687]: File "/usr/lib/python3/dist-packages/amqplib/client_0_8/abstract_channel.py", line 117, in dispatc
h_method
Mar 22 16:44:30 lrg-root4 sh[706687]: return amqp_method(self, args, content)
Mar 22 16:44:30 lrg-root4 sh[706687]: File "/usr/lib/python3/dist-packages/amqplib/client_0_8/channel.py", line 2060, in _basic_deliver
Mar 22 16:44:30 lrg-root4 sh[706687]: func(msg)
Mar 22 16:44:30 lrg-root4 sh[706687]: File "/home/ubuntu/autopkgtest-cloud/worker/worker", line 882, in request
Mar 22 16:44:30 lrg-root4 sh[706687]: f.write('%i\n' % code)
Mar 22 16:44:30 lrg-root4 sh[706687]: OSError: [Errno 28] No space left on device
Mar 22 16:44:30 lrg-root4 systemd[1]: <email address hidden>: Main process exited, code=exited, status=1/FAILURE

As I see it there are two problems here:
1) the log files are left in /tmp/autopkgtest-work.$TEMP/out/
2) the autopkgtest process itself can still be left running e.g.

ubuntu 716051 0.0 0.0 45316 20736 ? S Mar22 0:00 /usr/bin/python3 -u /home/ubuntu/autopkgtest/runner/autopkgtest --outpu
t-dir /tmp/autopkgtest-work.d5zzl576/out --timeout-copy=6000 -a i386 --setup-commands /home/ubuntu/autopkgtest-cloud/worker-config-product
ion/setup-canonical.sh --setup-commands /home/ubuntu/autopkgtest/setup-commands/setup-testbed --apt-pocket=proposed=src:golang-golang-x-to
ols --apt-upgrade golang-github-bep-overlayfs --timeout-short=300 --timeout-copy=20000 --timeout-build=20000 --env=ADT_TEST_TRIGGERS=golan
g-golang-x-tools/1:0.5.0+ds-2 -- ssh -s /home/ubuntu/autopkgtest/ssh-setup/nova -- --flavor autopkgtest --security-groups autopkgtest-lrg-
<email address hidden> --name adt-lunar-i386-golang-github-bep-overlayfs-20230322-162349-lrg-root4 --image adt/ubuntu-lunar-amd64-server
--keyname testbed-lrg-root4 --net-id=net_prod-proposed-migration -e TERM=linux -e 'http_proxy=http://squid.internal:3128' -e 'https_proxy=
http://squid.internal:3128' -e 'no_proxy=127.0.0.1,127.0.1.1,localhost,localdomain,novalocal,internal,archive.ubuntu.com,ports.ubuntu.com,
security.ubuntu.com,ddebs.ubuntu.com,changelogs.ubuntu.com,launchpad.net,10.24.0.0/24' --mirror=http://ftpmaster.internal/ubuntu/

Revision history for this message
Brian Murray (brian-murray) wrote :

In case any other autopkgtest sys admin looks at this bug (like I did!) the best way to clean this up is to kill the tee process which the runner is waiting on e.g.:

  ps aux | grep "runner.*jool"

Confirm that $PID is waiting indefinitely - on tee by the way. Then:

  kill $(pstree -p $PID | grep tee | sed -E 's/.*\((.*)\)/\1/')

The runner process will then exit as an underlying process was killed and the test will be restarted as there were no results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.