autopkgtest-cloud cron emails are low signal, don't tell you if there's a problem or provide info needed to figure out if there is

Bug #1773876 reported by Steve Langasek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Auto Package Testing
Triaged
Medium
Unassigned

Bug Description

There have been a lot of emails from the autopkgtest infra over the long (US) holiday weekend, reporting a significant number of dead runners. The emails don't contain enough information to tell why they were killed; e.g:

==== worker <email address hidden> failed ====
[...]
Started autopkgtest worker lcy01-8.
[...]
2018-05-28 03:37:35,394 [26050] WARNING: Testbed failure, retrying in 5 minutes
2018-05-28 03:37:35,394 [26050] ERROR: Three tmpfails in a row, aborting worker. Log follows:
2018-05-28 03:37:35,395 [26050] ERROR: autopkgtest [02:46:11]: git checkout: 5243905 ssh-setup/nova: Add support for keystone v3 auth
autopkgtest [02:46:11]: host juju-prod-ues-proposed-migration-machine-11; command line: /home/ubuntu/autopkgtest/runner/autopkgtest --output-dir /tmp/autopkgtest-work.158kldh_/out --timeout-copy=6000 --setup-commands /home/ubuntu/autopkgtest-cloud/worker-config-production/setup-canonical.sh --setup-commands /home/ubuntu/autopkgtest/setup-commands/setup-testbed --apt-pocket=proposed=src:graphicsmagick --apt-upgrade diaspora-installer --env=ADT_TEST_TRIGGERS=graphicsmagick/1.3.29+hg15665-1 -- ssh -s /home/ubuntu/autopkgtest/ssh-setup/nova -- --flavor autopkgtest --security-groups <email address hidden> --name adt-cosmic-amd64-diaspora-installer-20180528-005242 --image adt/ubuntu-cosmic-amd64-server --keyname testbed-juju-prod-ues-proposed-migration-machine-11 --net-id=net_ues_proposed_migration -e ''"'"'http_proxy=http://squid.internal:3128'"'"'' -e ''"'"'https_proxy=http://squid.internal:3128'"'"'' -e ''"'"'no_proxy=127.0.0.1,127.0.1.1,localhost,localdomain,novalocal,internal,archive.ubuntu.com,security.ubuntu.com,ddebs.ubuntu.com,changelogs.ubuntu.com,ppa.launchpad.net'"'"'' --mirror=http://ftpmaster.internal/ubuntu
autopkgtest [02:46:49]: @@@@@@@@@@@@@@@@@@@@ test bed setup
[...]
autopkgtest [02:47:25]: test command1: preparing testbed
Reading package lists...
Building dependency tree...
Reading state information...
Correcting dependencies...Starting pkgProblemResolver with broken count: 0
Starting 2 pkgProblemResolver with broken count: 0
Done
 Done
Starting pkgProblemResolver with broken count: 0
Starting 2 pkgProblemResolver with broken count: 0
Done
The following additional packages will be installed:
  build-essential cpp cpp-7 dbconfig-common dbconfig-pgsql diaspora-common
  diaspora-installer exim4 exim4-base exim4-config exim4-daemon-light
  fontconfig fontconfig-config fonts-dejavu-core g++ g++-7 gcc gcc-7
  gcc-7-base ghostscript gir1.2-freedesktop gir1.2-gdkpixbuf-2.0
  gir1.2-harfbuzz-0.0 gir1.2-rsvg-2.0 hicolor-icon-theme icu-devtools
  imagemagick imagemagick-6-common imagemagick-6.q16 libasan4 libatomic1
  libavahi-client3 libavahi-common-data libavahi-common3 libbz2-dev libc-ares2
  libc-dev-bin libc6-dev libcairo-gobject2 libcairo-script-interpreter2
  libcairo2 libcairo2-dev libcc1-0 libcilkrts5 libcroco3 libcups2
  libcupsimage2 libcurl4-openssl-dev libdatrie1 libdjvulibre-dev
  libdjvulibre-text libdjvulibre21 libexif-dev libexif12 libexpat1-dev
  libffi-dev libfftw3-double3 libfontconfig1 libfontconfig1-dev
  libfreetype6-dev libgcc-7-dev libgd3 libgdk-pixbuf2.0-0
  libgdk-pixbuf2.0-common libgdk-pixbuf2.0-dev libglib2.0-bin libglib2.0-dev
  libglib2.0-dev-bin libgmp-dev libgmpxx4ldbl libgomp1 libgraphite2-3
  libgraphite2-dev libgs9 libgs9-common libharfbuzz-dev libharfbuzz-gobject0
  libharfbuzz-icu0 libharfbuzz0b libhttp-parser2.8 libice-dev libice6
  libicu-dev libicu-le-hb-dev libicu-le-hb0 libiculx60 libijs-0.35
  libilmbase-dev libilmbase12 libisl19 libitm1 libjbig-dev libjbig0
  libjbig2dec0 libjemalloc1 libjpeg-dev libjpeg-turbo8 libjpeg-turbo8-dev
  libjpeg8 libjpeg8-dev liblcms2-2 liblcms2-dev liblqr-1-0 liblqr-1-0-dev
  liblsan0 libltdl-dev libltdl7 liblzma-dev libmagickcore-6-arch-config
  libmagickcore-6-headers libmagickcore-6.q16-3 libmagickcore-6.q16-3-extra
  libmagickcore-6.q16-dev libmagickwand-6-headers libmagickwand-6.q16-3
  libmagickwand-6.q16-dev libmagickwand-dev libmpc3 libmpx2
  libnginx-mod-http-geoip libnginx-mod-http-image-filter
  libnginx-mod-http-xslt-filter libnginx-mod-mail libnginx-mod-stream
  libopenexr-dev libopenexr22 libpango-1.0-0 libpangocairo-1.0-0
  libpangoft2-1.0-0 libpaper1 libpcre16-3 libpcre3-dev libpcre32-3
  libpcrecpp0v5 libpixman-1-0 libpixman-1-dev libpng-dev libpq-dev libpq5
  libpthread-stubs0-dev libquadmath0 librsvg2-2 librsvg2-common librsvg2-dev
  libruby2.5 libsm-dev libsm6 libssl-dev libstdc++-7-dev libthai-data libthai0
  libtiff-dev libtiff5 libtiff5-dev libtiffxx5 libtsan0 libubsan0 libuv1
  libwebp6 libwmf-dev libwmf0.2-7 libx11-dev libxau-dev libxcb-render0
  libxcb-render0-dev libxcb-shm0 libxcb-shm0-dev libxcb1-dev libxdmcp-dev
  libxext-dev libxml2-dev libxpm4 libxrender-dev libxrender1 libxslt1-dev
  libxt-dev libxt6 linux-libc-dev nginx nginx-common nginx-core nodejs
  pkg-config poppler-data postgresql postgresql-10 postgresql-client
  postgresql-client-10 postgresql-client-common postgresql-common
  python3-distutils python3-lib2to3 rake redis-server redis-tools ruby
  ruby-dev ruby-did-you-mean ruby-diff-lcs ruby-minitest ruby-net-telnet
  ruby-power-assert ruby-rspec ruby-rspec-core ruby-rspec-expectations
  ruby-rspec-mocks ruby-rspec-support ruby-test-unit ruby-thread-order ruby2.5
Deleting existing group {'tenant_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'name': '<email address hidden>', 'description': 'copy <email address hidden> of default (Default security group)', 'security_group_rules': [{'tenant_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'description': '', 'ethertype': 'IPv4', 'id': '15657ed8-c781-4080-9677-8ede743e30cd', 'port_range_min': None, 'created_at': '2018-05-27T09:20:17Z', 'remote_group_id': None, 'security_group_id': '5c87742c-aaa3-4806-932e-9317521587dc', 'revision_number': 1, 'project_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'port_range_max': None, 'protocol': 'icmp', 'remote_ip_prefix': '91.189.90.53/32', 'updated_at': '2018-05-27T09:20:17Z', 'direction': 'ingress'}, {'tenant_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'description': '', 'ethertype': 'IPv4', 'id': '3c9ef2c0-8618-4e2c-b97c-b43ce7068a9e', 'port_range_min': None, 'created_at': '2018-05-27T09:20:18Z', 'remote_group_id': '843e3a73-f1ee-4ecc-bc8c-b2d297df96b9', 'security_group_id': '5c87742c-aaa3-4806-932e-9317521587dc', 'revision_number': 1, 'project_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'port_range_max': None, 'protocol': None, 'remote_ip_prefix': None, 'updated_at': '2018-05-27T09:20:18Z', 'direction': 'ingress'}, {'tenant_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'description': '', 'ethertype': 'IPv4', 'id': '6f392f68-54ca-4c0b-a5ba-daa7a3999d6a', 'port_range_min': 22, 'created_at': '2018-05-27T09:20:17Z', 'remote_group_id': None, 'security_group_id': '5c87742c-aaa3-4806-932e-9317521587dc', 'revision_number': 1, 'project_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'port_range_max': 22, 'protocol': 'tcp', 'remote_ip_prefix': '162.213.33.179/32', 'updated_at': '2018-05-27T09:20:17Z', 'direction': 'ingress'}, {'tenant_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'description': '', 'ethertype': 'IPv4', 'id': '7ac1770e-e6ce-4843-b114-e4f24281f784', 'port_range_min': 22, 'created_at': '2018-05-27T09:20:18Z', 'remote_group_id': None, 'security_group_id': '5c87742c-aaa3-4806-932e-9317521587dc', 'revision_number'
: 1, 'project_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'port_range_max': 22, 'protocol': 'tcp', 'remote_ip_prefix': '91.189.90.53/32', 'updated_at': '2018-05-27T09:20:18Z', 'direction': 'ingress'}, {'tenant_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'description': '', 'ethertype': 'IPv6', 'id': '7bf03f0d-432b-4c46-b2b1-c12af62793f7', 'port_range_min': None, 'created_at': '2018-05-27T09:20:17Z', 'remote_group_id': '843e3a73-f1ee-4ecc-bc8c-b2d297df96b9', 'security_group_id': '5c87742c-aaa3-4806-932e-9317521587dc', 'revision_number': 1, 'project_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'port_range_max': None, 'protocol': None, 'remote_ip_prefix': None, 'updated_at': '2018-05-27T09:20:17Z', 'direction': 'ingress'}, {'tenant_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'description': '', 'ethertype': 'IPv6', 'id': 'ec28f1fe-75cf-49f8-9e2c-e83afc72da24', 'port_range_min': None, 'created_at': '2018-05-27T09:20:18Z', 'remote_group_id': None, 'security_group_id': '5c87742c-aaa3-4806-932e-9317521587dc', 'revision_number': 1, 'project_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'port_range_max': None, 'protocol': None, 'remote_ip_prefix': None, 'updated_at': '2018-05-27T09:20:18Z', 'direction': 'egress'}, {'tenant_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'description': '', 'ethertype': 'IPv4', 'id': 'f6b2100e-f068-4111-9f68-6e86e6250b29', 'port_range_min': None, 'created_at': '2018-05-27T09:20:18Z', 'remote_group_id': None, 'security_group_id': '5c87742c-aaa3-4806-932e-9317521587dc', 'revision_number': 1, 'project_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'port_range_max': None, 'protocol': None, 'remote_ip_prefix': None, 'updated_at': '2018-05-27T09:20:18Z', 'direction': 'egress'}], 'id': '5c87742c-aaa3-4806-932e-9317521587dc', 'updated_at': '2018-05-27T09:20:18Z', 'created_at': '2018-05-27T09:20:17Z', 'project_id': 'afaef86b96dd4828a1ed5ee395ea1421', 'revision_number': 10}
<fin>

The only verbose openstack output here is about deletion of a security group. Nothing in this log directly links back to a particular openstack instance, or gives any hint of why the instance has disappeared. The comments in the source suggest this is due to hitting quotas when trying to run too many huge jobs in parallel; but that should not be a problem with proper quotas, and doesn't explain why there appears to be a recent increase, and none of the output corroborates that this is what is happening.

If we have frequently failing autopkgtest instances, we ought to know why. For that, we need better logging.

If the failures are not interesting, then we should not generate emails for all of them; we should detect that they're not interesting, and avoid sending mail.

Steve Langasek (vorlon)
Changed in auto-package-testing:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Iain Lane (laney) wrote :

One thing I want to see is the test request as well as the worker's name, so you can see straight away if a particular test is causing problems.

Maybe if we had this thing: https://trello.com/c/jaReiQ53/6-ops-debuggability-helper-commands-to-map-systemd-unit-cloud-instance-running-stuck-test - we could put its output in the emails.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.