ssh runner: hook_wait_reboot sometimes times out after package upgrades

Bug #1470209 reported by Martin Pitt
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
autopkgtest (Ubuntu)
Fix Released
Medium
Martin Pitt

Bug Description

I noticed this with several tmpfail runs in the cloud, e. g.

https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-wily/wily/i386/g/glibc/20150630_152453@/log.gz
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-wily/wily/amd64/s/systemd/20150625_131953@/log.gz

adt-run [15:19:50]: rebooting testbed after setup commands that affected boot
invalid command wait-reboot
Exit request sent.
sudo: /tmp/adt-run-wrapper: command not found
Exit request sent.
<VirtSubproc>: failure: timed out waiting for testbed to reboot
adt-run [15:24:53]: ERROR: testbed failed: cannot send to testbed: ['BrokenPipeError: [Errno 32] Broken pipe\n']

Martin Pitt (pitti)
Changed in autopkgtest (Ubuntu):
importance: Undecided → High
status: New → In Progress
assignee: nobody → Martin Pitt (pitti)
Revision history for this message
Martin Pitt (pitti) wrote :

I can reproduce that with autopkgtest/tests/testpkg-simple/. After

  invalid command wait-reboot
  Exit request sent.

there is a long hang, from ssh's hook_wait_reboot():

       try:
            wait_port_down(sshconfig['hostname'], port, 300)
        except VirtSubproc.Timeout:
            VirtSubproc.bomb('timed out waiting for testbed to reboot')

The subsequent "/tmp/adt-run-wrapper: command not found" is just followup from cleanup, but as it never saw the rebooted testbed it doesn't get re-prepared for autopkgtest.

summary: - ssh runner: reboot sometimes fails with "sudo: /tmp/adt-run-wrapper:
- command not found"
+ ssh runner: hook_wait_reboot sometimes times out after package upgrades
Revision history for this message
Martin Pitt (pitti) wrote :

With some added debugging in wait_port_down() to see why it never exits I ran a test in a loop all night (some hundred iterations) without failure :-(

Martin Pitt (pitti)
Changed in autopkgtest (Ubuntu):
status: In Progress → Triaged
Revision history for this message
Martin Pitt (pitti) wrote :

With the workaround I never had this problem. I accidentally removed it this morning, and got this instance again:

https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-utopic/utopic/i386/f/freeradius/20150714_123217@/log.gz

Martin Pitt (pitti)
tags: added: autopkgtest-cloud
Revision history for this message
Martin Pitt (pitti) wrote :

I haven't seen this in a long time with current autopkgtest now, so closing for now.

Changed in autopkgtest (Ubuntu):
importance: High → Medium
assignee: Martin Pitt (pitti) → nobody
assignee: nobody → Martin Pitt (pitti)
status: Triaged → Fix Released
Revision history for this message
Federico Gimenez (fgimenez) wrote :

Hi Martin :) we are getting the same "/tmp/adt-run-wrapper: command not found" error eventually (1/30 more or less) while executing tests that involve a reboot on snappy's scalingstack instances [1] (VPN required), please let me know if I can provide any further info.

Thanks!

[1] http://162.213.35.179:8080/job/github-snappy-integration-tests-cloud/444/consoleFull

Revision history for this message
Federico Gimenez (fgimenez) wrote :

Hi again, here it is again [1], should I reopen the bug or open a new one?

Thanks!

[1] http://162.213.35.179:8080/job/github-snappy-integration-tests-cloud/938/consoleFull

Revision history for this message
Martin Pitt (pitti) wrote :

The "adt-run-wrapper" warning is just a by-product -- the real issue is that the testbed timed out when rebooting. If you have a test bed in that state, can you check if it actually rebooted? Please check "nova console-log" what's up there (if ssh does not work any more).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.