An example of this can be found at http://logs.openstack.org/74/57774/2/gate/gate-tempest-devstack-vm-full/e592961/console.html. This test failing seems to cause the tearDownClass failure and the process exit code failure.
Judging by the logs below, the VM is coming up, and the test is connecting to the SSH server (dropbear) running in the VM, but the authentication is failing. It appears that authentication is attempted several times before paramiko gives up causing the test to fail. I think this indicates there isn't a network or compute problem, instead is possible the client doesn't have the correct key or the authorized keys aren't configured properly on the server side. But these are just guesses, I haven't been able to get any concrete data that would support these theories.
2013-11-22 05:36:33.980 | 2013-11-22 05:32:17,029 Adding <SecurityGroupRule from_port=-1, group={}, id=14, ip_protocol=icmp, ip_range={u'cidr': u'0.0.0.0/0'}, parent_group_id=105, to_port=-1> to shared resources of TestMinimumBasicScenario
2013-11-22 05:36:33.980 | 2013-11-22 05:32:34,226 starting thread (client mode): 0x52e7e50L
2013-11-22 05:36:33.980 | 2013-11-22 05:32:34,232 Connected (version 2.0, client dropbear_2012.55)
2013-11-22 05:36:33.981 | 2013-11-22 05:32:34,237 kex algos:['diffie-hellman-group1-sha1', 'diffie-hellman-group14-sha1'] server key:['ssh-rsa', 'ssh-dss'] client encrypt:['aes128-ctr', '3des-ctr', 'aes256-ctr', 'aes128-cbc', '3des-cbc', 'aes256-cbc', 'twofish256-cbc', 'twofish-cbc', 'twofish128-cbc'] server encrypt:['aes128-ctr', '3des-ctr', 'aes256-ctr', 'aes128-cbc', '3des-cbc', 'aes256-cbc', 'twofish256-cbc', 'twofish-cbc', 'twofish128-cbc'] client mac:['hmac-sha1-96', 'hmac-sha1', 'hmac-md5'] server mac:['hmac-sha1-96', 'hmac-sha1', 'hmac-md5'] client compress:['none'] server compress:['none'] client lang:[''] server lang:[''] kex follows?False
2013-11-22 05:36:33.981 | 2013-11-22 05:32:34,238 Ciphers agreed: local=aes128-ctr, remote=aes128-ctr
2013-11-22 05:36:33.981 | 2013-11-22 05:32:34,238 using kex diffie-hellman-group1-sha1; server key type ssh-rsa; cipher: local aes128-ctr, remote aes128-ctr; mac: local hmac-sha1, remote hmac-sha1; compression: local none, remote none
2013-11-22 05:36:33.981 | 2013-11-22 05:32:34,433 Switch to new keys ...
2013-11-22 05:36:33.982 | 2013-11-22 05:32:34,434 Adding ssh-rsa host key for 172.24.4.227: 189c16acb93fe44ae975e1c653f1856c
2013-11-22 05:36:33.982 | 2013-11-22 05:32:34,434 Trying SSH key 9a9afe52a9485c15495a59b94ebca6b6
2013-11-22 05:36:33.982 | 2013-11-22 05:32:34,437 userauth is OK
2013-11-22 05:36:33.982 | 2013-11-22 05:32:35,104 Authentication (publickey) failed.
2013-11-22 05:36:33.982 | 2013-11-22 05:32:36,693 starting thread (client mode): 0x52f9190L
2013-11-22 05:36:33.982 | 2013-11-22 05:32:36,697 Connected (version 2.0, client dropbear_2012.55)
2013-11-22 05:36:33.983 | 2013-11-22 05:32:36,699 kex algos:['diffie-hellman-group1-sha1', 'diffie-hellman-group14-sha1'] server key:['ssh-rsa', 'ssh-dss'] client encrypt:['aes128-ctr', '3des-ctr', 'aes256-ctr', 'aes128-cbc', '3des-cbc', 'aes256-cbc', 'twofish256-cbc', 'twofish-cbc', 'twofish128-cbc'] server encrypt:['aes128-ctr', '3des-ctr', 'aes256-ctr', 'aes128-cbc', '3des-cbc', 'aes256-cbc', 'twofish256-cbc', 'twofish-cbc', 'twofish128-cbc'] client mac:['hmac-sha1-96', 'hmac-sha1', 'hmac-md5'] server mac:['hmac-sha1-96', 'hmac-sha1', 'hmac-md5'] client compress:['none'] server compress:['none'] client lang:[''] server lang:[''] kex follows?False
2013-11-22 05:36:33.983 | 2013-11-22 05:32:36,699 Ciphers agreed: local=aes128-ctr, remote=aes128-ctr
2013-11-22 05:36:33.983 | 2013-11-22 05:32:36,699 using kex diffie-hellman-group1-sha1; server key type ssh-rsa; cipher: local aes128-ctr, remote aes128-ctr; mac: local hmac-sha1, remote hmac-sha1; compression: local none, remote none
2013-11-22 05:36:33.983 | 2013-11-22 05:32:36,903 Switch to new keys ...
2013-11-22 05:36:33.983 | 2013-11-22 05:32:36,904 Trying SSH key 9a9afe52a9485c15495a59b94ebca6b6
2013-11-22 05:36:33.984 | 2013-11-22 05:32:36,906 userauth is OK
2013-11-22 05:36:33.984 | 2013-11-22 05:32:37,438 Authentication (publickey) failed.
2013-11-22 05:36:33.984 | 2013-11-22 05:32:39,035 starting thread (client mode): 0x24c62d0L
2013-11-22 05:36:33.984 | 2013-11-22 05:32:39,043 Connected (version 2.0, client dropbear_2012.55)
2013-11-22 05:36:33.991 | 2013-11-22 05:32:39,045 kex algos:['diffie-hellman-group1-sha1', 'diffie-hellman-group14-sha1'] server key:['ssh-rsa', 'ssh-dss'] client encrypt:['aes128-ctr', '3des-ctr', 'aes256-ctr', 'aes128-cbc', '3des-cbc', 'aes256-cbc', 'twofish256-cbc', 'twofish-cbc', 'twofish128-cbc'] server encrypt:['aes128-ctr', '3des-ctr', 'aes256-ctr', 'aes128-cbc', '3des-cbc', 'aes256-cbc', 'twofish256-cbc', 'twofish-cbc', 'twofish128-cbc'] client mac:['hmac-sha1-96', 'hmac-sha1', 'hmac-md5'] server mac:['hmac-sha1-96', 'hmac-sha1', 'hmac-md5'] client compress:['none'] server compress:['none'] client lang:[''] server lang:[''] kex follows?False
2013-11-22 05:36:33.992 | 2013-11-22 05:32:39,045 Ciphers agreed: local=aes128-ctr, remote=aes128-ctr
2013-11-22 05:36:33.992 | 2013-11-22 05:32:39,045 using kex diffie-hellman-group1-sha1; server key type ssh-rsa; cipher: local aes128-ctr, remote aes128-ctr; mac: local hmac-sha1, remote hmac-sha1; compression: local none, remote none
2013-11-22 05:36:33.992 | 2013-11-22 05:32:39,222 Switch to new keys ...
2013-11-22 05:36:33.992 | 2013-11-22 05:32:39,237 Trying SSH key 9a9afe52a9485c15495a59b94ebca6b6
2013-11-22 05:36:33.992 | 2013-11-22 05:32:39,239 userauth is OK
2013-11-22 05:36:33.993 | 2013-11-22 05:32:39,723 Authentication (publickey) failed.
2013-11-22 05:36:33.993 | 2013-11-22 05:32:41,343 starting thread (client mode): 0x52f9f10L
2013-11-22 05:36:33.993 | 2013-11-22 05:32:41,348 Connected (version 2.0, client dropbear_2012.55)
2013-11-22 05:36:33.993 | 2013-11-22 05:32:41,350 kex algos:['diffie-hellman-group1-sha1', 'diffie-hellman-group14-sha1'] server key:['ssh-rsa', 'ssh-dss'] client encrypt:['aes128-ctr', '3des-ctr', 'aes256-ctr', 'aes128-cbc', '3des-cbc', 'aes256-cbc', 'twofish256-cbc', 'twofish-cbc', 'twofish128-cbc'] server encrypt:['aes128-ctr', '3des-ctr', 'aes256-ctr', 'aes128-cbc', '3des-cbc', 'aes256-cbc', 'twofish256-cbc', 'twofish-cbc', 'twofish128-cbc'] client mac:['hmac-sha1-96', 'hmac-sha1', 'hmac-md5'] server mac:['hmac-sha1-96', 'hmac-sha1', 'hmac-md5'] client compress:['none'] server compress:['none'] client lang:[''] server lang:[''] kex follows?False
2013-11-22 05:36:33.993 | 2013-11-22 05:32:41,350 Ciphers agreed: local=aes128-ctr, remote=aes128-ctr
2013-11-22 05:36:33.994 | 2013-11-22 05:32:41,350 using kex diffie-hellman-group1-sha1; server key type ssh-rsa; cipher: local aes128-ctr, remote aes128-ctr; mac: local hmac-sha1, remote hmac-sha1; compression: local none, remote none
2013-11-22 05:36:33.994 | 2013-11-22 05:32:41,602 Switch to new keys ...
2013-11-22 05:36:33.994 | 2013-11-22 05:32:41,607 Trying SSH key 9a9afe52a9485c15495a59b94ebca6b6
2013-11-22 05:36:33.994 | 2013-11-22 05:32:41,610 userauth is OK
2013-11-22 05:36:33.994 | 2013-11-22 05:32:42,237 Authentication (publickey) failed.
2013-11-22 05:36:33.994 | 2013-11-22 05:32:43,884 starting thread (client mode): 0x24c6350L
2013-11-22 05:36:33.995 | 2013-11-22 05:32:43,917 Connected (version 2.0, client dropbear_2012.55)
2013-11-22 05:36:33.995 | 2013-11-22 05:32:43,919 kex algos:['diffie-hellman-group1-sha1', 'diffie-hellman-group14-sha1'] server key:['ssh-rsa', 'ssh-dss'] client encrypt:['aes128-ctr', '3des-ctr', 'aes256-ctr', 'aes128-cbc', '3des-cbc', 'aes256-cbc', 'twofish256-cbc', 'twofish-cbc', 'twofish128-cbc'] server encrypt:['aes128-ctr', '3des-ctr', 'aes256-ctr', 'aes128-cbc', '3des-cbc', 'aes256-cbc', 'twofish256-cbc', 'twofish-cbc', 'twofish128-cbc'] client mac:['hmac-sha1-96', 'hmac-sha1', 'hmac-md5'] server mac:['hmac-sha1-96', 'hmac-sha1', 'hmac-md5'] client compress:['none'] server compress:['none'] client lang:[''] server lang:[''] kex follows?False
2013-11-22 05:36:33.995 | 2013-11-22 05:32:43,919 Ciphers agreed: local=aes128-ctr, remote=aes128-ctr
2013-11-22 05:36:33.995 | 2013-11-22 05:32:43,919 using kex diffie-hellman-group1-sha1; server key type ssh-rsa; cipher: local aes128-ctr, remote aes128-ctr; mac: local hmac-sha1, remote hmac-sha1; compression: local none, remote none
2013-11-22 05:36:33.995 | 2013-11-22 05:32:44,015 Switch to new keys ...
2013-11-22 05:36:33.996 | 2013-11-22 05:32:44,016 Trying SSH key 9a9afe52a9485c15495a59b94ebca6b6
2013-11-22 05:36:33.996 | 2013-11-22 05:32:44,017 userauth is OK
2013-11-22 05:36:33.996 | 2013-11-22 05:32:44,509 Authentication (publickey) failed.
2013-11-22 05:36:33.996 | 2013-11-22 05:32:46,157 starting thread (client mode): 0x5301c50L
2013-11-22 05:36:33.996 | 2013-11-22 05:32:46,157 Exception: Error reading SSH protocol banner
2013-11-22 05:36:33.997 | 2013-11-22 05:32:46,158 Traceback (most recent call last):
2013-11-22 05:36:33.997 | 2013-11-22 05:32:46,159 File "/usr/local/lib/python2.7/dist-packages/paramiko/transport.py", line 1557, in run
2013-11-22 05:36:33.997 | 2013-11-22 05:32:46,159 self._check_banner()
2013-11-22 05:36:33.997 | 2013-11-22 05:32:46,159 File "/usr/local/lib/python2.7/dist-packages/paramiko/transport.py", line 1683, in _check_banner
2013-11-22 05:36:33.998 | 2013-11-22 05:32:46,159 raise SSHException('Error reading SSH protocol banner' + str(x))
2013-11-22 05:36:33.998 | 2013-11-22 05:32:46,159 SSHException: Error reading SSH protocol banner
2013-11-22 05:36:33.998 | 2013-11-22 05:32:46,159
2013-11-22 05:36:33.998 | 2013-11-22 05:32:46,214 EOF in transport thread
2013-11-22 05:36:33.999 | 2013-11-22 05:32:46,295 EOF in transport thread
2013-11-22 05:36:33.999 | 2013-11-22 05:32:46,396 EOF in transport thread
2013-11-22 05:36:33.999 | 2013-11-22 05:32:46,496 EOF in transport thread
2013-11-22 05:36:34.000 | 2013-11-22 05:32:46,596 EOF in transport thread
2013-11-22 05:36:34.001 | 2013-11-22 05:32:46,635
logstash says this is a persistent flaky bug. We have a number of these, including the console log one, that all have one thing in common: they involve expected activity on the booted vm between when the vm is reported as 'active' and when it is "really booted". I checked in logstash and this never happens in neutron, except for the tests that have been trying to run neutron in parallel. So this is likely an issue of super-slow response from the vm both due to the load on the host and because the vm is nested. I saw that the banner checking code in paramiko allows 15s for the first line to appear but only 2 seconds (hard-wired) for each subsequent line. Also, the code in tempest that does the ssh connection catches socket errors (presumably for when the vm networking is not up yet) and AuthenticationE xception and then retries. But it does not catch SshException.
I will try catching this exception and see if that fixes the problem. The console log bugs might be similar in that they are demanding that 10 lines show up but perhaps sometimes that just takes a really long time. Please tell me if this makes no sense.