reboot_server_hard - can trigger loss of pub key data which fails test
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tempest |
Fix Released
|
Undecided
|
Ken'ichi Ohmichi |
Bug Description
This test as far as I am aware, starts an instance, logs into the instance via auto generated public keys, and once successful, performs a hard reboot immediately after. Finally it performs a second login attempt.
In our case, the hard reboot is successful, the instance is rebooted on the hypervisor, however more often than not, the second login attempt fails (This triggers a python trace of a timeout included below).
This appears to be because the hard reboot was so quick, that in a lot of cases the public key data is not persisted. Being able to persist data this quickly, is a test of the operating system, filesystem and associated configuration, and not strictly the OpenStack API or the associated reboot functionality.
I am able to replicate this by hand in a variety of ways, and I can see a couple of possible ways to improve this.
We could add a 'sync' (or similar) command during the ssh session. I realise this will not cover all Operating Systems, so a second suggestion could be a soft reboot first, this soft reboot will cause the OS to write its data to disk, before rebooting the system, and then moving on to perform the hard reboot
Another option could just be using ping, to make sure the system is back online, rather than logging in, but I appreciate its not as thorough.
Currently as a workaround I am having to set the interval to 30 seconds or so which means a full 'refstack' run takes over an hour!
----------------
tempest.
-------
Captured traceback:
~~~~~~~~~~~~~~~~~~~
Traceback (most recent call last):
File "/home/
File "/home/
File "/home/
boot_secs = self.exec_
File "/home/
return self.ssh_
File "/home/
ssh = self._get_
File "/home/
File "/home/
raise TimeoutException()
fixtures.
2016-05-20 08:55:39,571 3921 INFO [tempest.
2016-05-20 08:55:43,483 3921 INFO [tempest.
2016-05-20 08:55:43,606 3921 INFO [tempest.
2016-05-20 08:55:44,734 3921 INFO [tempest.
2016-05-20 08:55:44,744 3921 INFO [tempest.
2016-05-20 08:55:45,877 3921 INFO [tempest.
2016-05-20 08:55:45,888 3921 INFO [tempest.
2016-05-20 08:56:15,921 3921 INFO [tempest.
2016-05-20 08:56:15,938 3921 INFO [paramiko.
2016-05-20 08:56:15,942 3921 WARNING [tempest.
2016-05-20 08:56:18,455 3921 INFO [paramiko.
2016-05-20 08:56:18,464 3921 WARNING [tempest.
... and so on.
Appreciate any feed back, and please let me know if you need any more information.
description: | updated |
information type: | Public → Public Security |
information type: | Public Security → Public |
I was about to raise this bug. I'm observing the exact same issue with any non-Cirros images servers or with a big flavor Cirros image server.
However, this issue appeared sometimes in the last several months and it used to work without problem.
I didn't have much time to debug this issue but I'm not sure it's a Tempest bug. Even if the use case of the test is really not realistic, I'm not too much in favor of modifying the test. We need to find out what is the root cause of this race condition.