ssh failed less than 1 second after system was reported to be rebooting

Bug #1212437 reported by Chris Gagnon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
UTAH
New
Medium
Max Brustkern

Bug Description

Reproducible: rare less than 1 in 100

Description:
ssh failed less than 1 second after system was reported to be rebooting

2013-08-14 18:07:06,985 root INFO: System under test is rebooting
...
2013-08-14 18:07:07,597 ssh DEBUG: Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/utah/provisioning/ssh.py", line 250, in downloadfiles
    key_filename=config.sshprivatekey)...

Utah version:
Installed: 0.14.8ubuntu1~precise1

steps:
1. provision machine with daily saucy image

expected:
utah never fails after reboot

actual result:
utah failed with error
2013-08-14 18:07:07,597 ssh DEBUG: Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/utah/provisioning/ssh.py", line 250, in downloadfiles
    key_filename=config.sshprivatekey)
  File "/usr/lib/python2.7/dist-packages/paramiko/client.py", line 296, in connect
    sock.connect(addr)
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Download full text (59.9 KiB)

full debug log

2013-08-14 17:56:40,770 root INFO: UTAH Features:
 PROVISIONED_AUTORUN: False
2013-08-14 17:56:40,770 root INFO: UTAH version: 0.14.8ubuntu1~precise1
2013-08-14 17:56:40,770 root DEBUG: Configuration variables:
{
    "arch": "amd64",
    "boot": null,
    "boottimeout": 600,
    "bridge": "eth0",
    "checktimeout": 180,
    "clean": true,
    "client_install_timeout": 60,
    "consoleloglevel": 30,
    "debug": false,
    "debuglog": "/var/lib/jenkins/ps-slaves/ps-intel-3000/workspace/openarena-benchmark-ps-intel-3000/gaueko-debug.log",
    "diskbus": "virtio",
    "disksizes": [
        8
    ],
    "dlcommand": "dl-ubuntu-test-iso",
    "dlpercentincrement": 10,
    "dlretries": 10,
    "emulator": null,
    "fileloglevel": 20,
    "group": "utah",
    "hostname": "gaueko",
    "image": null,
    "initrd": null,
    "installpackages": [
        "openssh-server",
        "gdebi"
    ],
    "installtimeout": 9000,
    "installtype": "mini",
    "isodir": "/var/cache/utah/iso",
    "jobtimeout": null,
    "kernel": null,
    "logfile": "/var/lib/jenkins/ps-slaves/ps-intel-3000/workspace/openarena-benchmark-ps-intel-3000/gaueko.log",
    "logpath": "/var/lib/jenkins/ps-slaves/ps-intel-3000/workspace/openarena-benchmark-ps-intel-3000",
    "logrotation_backupCount": 10,
    "logrotation_interval": 1,
    "logrotation_when": "W0",
    "machinedir": null,
    "machineid": null,
    "machineinfo": null,
    "machinetype": "virtual",
    "machineuuid": null,
    "macs": [],
    "name": null,
    "new": false,
    "nfscommand": [
        "sudo",
        "service",
        "nfs-kernel-server"
    ],
    "nfsconfigfile": "/etc/exports",
    "nfsiface": "eth0",
    "nfsoptions": "*(ro,async,no_root_squash,no_subtree_check)",
    "outputpreseed": false,
    "packagedir": "/usr/share/utah",
    "powercmd": null,
    "powertimeout": 15,
    "preboot": null,
    "prefix": "utah",
    "preseed": "/etc/utah/default-preseed.cfg",
    "pxedir": "/var/lib/tftpboot/pxelinux.cfg",
    "qemupath": "qemu:///system",
    "retry_timeout": 3,
    "rewrite": "all",
    "runtimeout": null,
    "series": "precise",
    "serieschoices": [
        "lucid",
        "precise",
        "quantal",
        "raring",
        "saucy"
    ],
    "sqlite3_db_connection_timeout": 30,
    "ssh_logfile": "/var/lib/jenkins/ps-slaves/ps-intel-3000/workspace/openarena-benchmark-ps-intel-3000/gaueko-ssh.log",
    "sshknownhosts": "/var/lib/jenkins/.ssh/known_hosts",
    "sshprivatekey": "/var/lib/jenkins/.ssh/utah",
    "sshpublickey": "/var/lib/jenkins/.ssh/utah.pub",
    "template": null,
    "template_dir": "/usr/share/utah/templates",
    "user": "jenkins",
    "variant": null,
    "vmpath": "/var/lib/jenkins/vm",
    "wwwdir": "/var/www/utah",
    "wwwiface": "virbr0",
    "xml": "/etc/utah/default-vm.xml"
}
2013-08-14 17:56:40,770 iso INFO: Preparing image: /data/jenkins/Isos/saucy/saucy-desktop-amd64.iso
2013-08-14 17:56:40,775 iso INFO: /data/jenkins/Isos/saucy/saucy-desktop-amd64.iso is locally available as /data/jenkins/Isos/saucy/saucy-desktop-amd64.iso
2013-08-14 17:56:40,775 iso INFO:...

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Download full text (41.0 KiB)

ssh log

2013-08-14 18:04:16,132 ssh INFO: Checking for ssh availability
2013-08-14 18:04:18,348 paramiko.transport DEBUG: starting thread (client mode): 0x3a0fa50L
2013-08-14 18:04:18,364 paramiko.transport INFO: Connected (version 2.0, client OpenSSH_6.2p2)
2013-08-14 18:04:18,365 paramiko.transport DEBUG: kex algos:['ecdh-sha2-nistp256', 'ecdh-sha2-nistp384', 'ecdh-sha2-nistp521', 'diffie-hellman-group-exchange-sha256', 'diffie-hellman-group-exchange-sha1', 'diffie-hellman-group14-sha1', 'diffie-hellman-group1-sha1'] server key:['ssh-rsa', 'ssh-dss', 'ecdsa-sha2-nistp256'] client encrypt:['aes128-ctr', 'aes192-ctr', 'aes256-ctr', 'arcfour256', 'arcfour128', '<email address hidden>', '<email address hidden>', 'aes128-cbc', '3des-cbc', 'blowfish-cbc', 'cast128-cbc', 'aes192-cbc', 'aes256-cbc', 'arcfour', '<email address hidden>'] server encrypt:['aes128-ctr', 'aes192-ctr', 'aes256-ctr', 'arcfour256', 'arcfour128', '<email address hidden>', '<email address hidden>', 'aes128-cbc', '3des-cbc', 'blowfish-cbc', 'cast128-cbc', 'aes192-cbc', 'aes256-cbc', 'arcfour', '<email address hidden>'] client mac:['<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', 'hmac-md5', 'hmac-sha1', '<email address hidden>', '<email address hidden>', 'hmac-sha2-256', 'hmac-sha2-512', 'hmac-ripemd160', '<email address hidden>', 'hmac-sha1-96', 'hmac-md5-96'] server mac:['<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', '<email address hidden>', 'hmac-md5', 'hmac-sha1', '<email address hidden>', '<email address hidden>', 'hmac-sha2-256', 'hmac-sha2-512', 'hmac-ripemd160', '<email address hidden>', 'hmac-sha1-96', 'hmac-md5-96'] client compress:['none', '<email address hidden>'] server compress:['none', '<email address hidden>'] client lang:[''] server lang:[''] kex follows?False
2013-08-14 18:04:18,365 paramiko.transport DEBUG: Ciphers agreed: local=aes128-ctr, remote=aes128-ctr
2013-08-14 18:04:18,365 paramiko.transport DEBUG: using kex diffie-hellman-group1-sha1; server key type ssh-rsa; cipher: local aes128-ctr, remote aes128-ctr; mac: local hmac-sha1, remote hmac-sha1; compression: local none, remote none
2013-08-14 18:04:18,411 paramiko.transport DEBUG: Switch to new keys ...
2013-08-14 18:04:18,451 paramiko.transport DEBUG: userauth is OK
2013-08-14 18:04:23,548 paramiko.transport INFO: Authentication (publickey) successful!
2013-08-14 18:04:23,648 paramiko.transport DEBUG: EOF in transport thread
2013-08-14 18:04:23,797 paramiko.transport DEBUG: starting thread (client mode): 0x4295210L
2013-08-14 18:04:23,800 paramiko.transport INFO: Connected (version 2.0, client OpenSSH_6.2p2)
2013-08-14 18:04:23,801 paramiko.transport DEBUG: kex algos:['ecdh-sha2-nistp256', 'ecdh-sha2-nistp384', 'ecdh-sha2-nistp521', 'diffie-hellman-group-exchange-sha256', 'diffie-he...

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

I ran in to this issue again today, ssh failed less than 1 second after the SUT reported to be rebooting.

from the debug log:

2013-08-20 10:58:55,309 root INFO: System under test is rebooting
2013-08-20 10:58:55,309 root INFO: Checking if UTAH client is finished
2013-08-20 10:58:55,309 ps-radeon-hd7450-le DEBUG: Checking if machine is active
2013-08-20 10:58:55,309 ps-radeon-hd7450-le DEBUG: Checking if machine is provisioned
2013-08-20 10:58:55,309 ps-radeon-hd7450-le DEBUG: Checking if machine is active
2013-08-20 10:58:55,309 ps-radeon-hd7450-le DEBUG: Checking if machine is provisioned
2013-08-20 10:58:55,310 ssh DEBUG: Connecting SSH
2013-08-20 10:58:55,879 ssh DEBUG: Opening SSH session
2013-08-20 10:58:55,880 ssh INFO: Running command through SSH: /usr/share/utah/client/utah-done.py
2013-08-20 10:58:55,918 ssh DEBUG: Closing SSH connection
2013-08-20 10:58:55,918 ssh DEBUG: SSH command (/usr/share/utah/client/utah-done.py) failed with return code: -1
2013-08-20 10:58:55,919 ssh DEBUG: Standard output follows:
2013-08-20 10:58:55,919 ssh DEBUG: Standard error follows:
2013-08-20 10:58:55,919 root INFO: UTAH client is finished
2013-08-20 10:58:55,919 ps-radeon-hd7450-le DEBUG: Checking if machine is active
2013-08-20 10:58:55,919 ps-radeon-hd7450-le DEBUG: Checking if machine is provisioned
2013-08-20 10:58:55,921 ssh DEBUG: Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/utah/provisioning/ssh.py", line 250, in downloadfiles
    key_filename=config.sshprivatekey)
  File "/usr/lib/python2.7/dist-packages/paramiko/client.py", line 296, in connect
    sock.connect(addr)
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused

Revision history for this message
Andy Doan (doanac) wrote :

Thanks for the info Chris. We can't really reproduce, but I think we know what error we need to be looking for now.

Changed in utah:
importance: Undecided → Medium
assignee: nobody → Max Brustkern (nuclearbob)
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Download full text (59.1 KiB)

full debug log

2013-08-20 10:50:13,110 root INFO: UTAH Features:
 PROVISIONED_AUTORUN: False
2013-08-20 10:50:13,110 root INFO: UTAH version: 0.14.8ubuntu1~precise1
2013-08-20 10:50:13,110 root DEBUG: Configuration variables:
{
    "arch": "amd64",
    "boot": null,
    "boottimeout": 600,
    "bridge": "eth0",
    "checktimeout": 180,
    "clean": true,
    "client_install_timeout": 60,
    "consoleloglevel": 30,
    "debug": false,
    "debuglog": "/var/lib/jenkins/ps-slaves/ps-radeon-hd7450-le/workspace/nexuiz-benchmark-ps-radeon-hd7450-le/gaueko-debug.log",
    "diskbus": "virtio",
    "disksizes": [
        8
    ],
    "dlcommand": "dl-ubuntu-test-iso",
    "dlpercentincrement": 10,
    "dlretries": 10,
    "emulator": null,
    "fileloglevel": 20,
    "group": "utah",
    "hostname": "gaueko",
    "image": null,
    "initrd": null,
    "installpackages": [
        "openssh-server",
        "gdebi"
    ],
    "installtimeout": 9000,
    "installtype": "mini",
    "isodir": "/var/cache/utah/iso",
    "jobtimeout": null,
    "kernel": null,
    "logfile": "/var/lib/jenkins/ps-slaves/ps-radeon-hd7450-le/workspace/nexuiz-benchmark-ps-radeon-hd7450-le/gaueko.log",
    "logpath": "/var/lib/jenkins/ps-slaves/ps-radeon-hd7450-le/workspace/nexuiz-benchmark-ps-radeon-hd7450-le",
    "logrotation_backupCount": 10,
    "logrotation_interval": 1,
    "logrotation_when": "W0",
    "machinedir": null,
    "machineid": null,
    "machineinfo": null,
    "machinetype": "virtual",
    "machineuuid": null,
    "macs": [],
    "name": null,
    "new": false,
    "nfscommand": [
        "sudo",
        "service",
        "nfs-kernel-server"
    ],
    "nfsconfigfile": "/etc/exports",
    "nfsiface": "eth0",
    "nfsoptions": "*(ro,async,no_root_squash,no_subtree_check)",
    "outputpreseed": false,
    "packagedir": "/usr/share/utah",
    "powercmd": null,
    "powertimeout": 15,
    "preboot": null,
    "prefix": "utah",
    "preseed": "/etc/utah/default-preseed.cfg",
    "pxedir": "/var/lib/tftpboot/pxelinux.cfg",
    "qemupath": "qemu:///system",
    "retry_timeout": 3,
    "rewrite": "all",
    "runtimeout": null,
    "series": "precise",
    "serieschoices": [
        "lucid",
        "precise",
        "quantal",
        "raring",
        "saucy"
    ],
    "sqlite3_db_connection_timeout": 30,
    "ssh_logfile": "/var/lib/jenkins/ps-slaves/ps-radeon-hd7450-le/workspace/nexuiz-benchmark-ps-radeon-hd7450-le/gaueko-ssh.log",
    "sshknownhosts": "/var/lib/jenkins/.ssh/known_hosts",
    "sshprivatekey": "/var/lib/jenkins/.ssh/utah",
    "sshpublickey": "/var/lib/jenkins/.ssh/utah.pub",
    "template": null,
    "template_dir": "/usr/share/utah/templates",
    "user": "jenkins",
    "variant": null,
    "vmpath": "/var/lib/jenkins/vm",
    "wwwdir": "/var/www/utah",
    "wwwiface": "virbr0",
    "xml": "/etc/utah/default-vm.xml"
}
2013-08-20 10:50:13,111 iso INFO: Preparing image: /data/jenkins/Isos/saucy/saucy-desktop-amd64.iso
2013-08-20 10:50:13,115 iso INFO: /data/jenkins/Isos/saucy/saucy-desktop-amd64.iso is locally available as /data/jenkins/Isos/saucy/saucy-desktop-amd64.i...

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.