I have reviewed the latest failure provided by Tatyana. The problem is that OSTF test waits for server to be deleted in 30 seconds, while that operation took a little more (around 31 seconds). Here is the time when OSTF test timed out (/var/log/ostf.log from master node):
The instance name was 'ost1_test-boot-volume-instance945151122' and its id was '2546a578-5958-4474-9d8f-d89d1bc64ef6'
One can find that the instance was deleted just a little bit later after 08:30:09. Here is the latest entry from node-2/var/log/nova/nova-compute.log:
2016-08-15 08:30:10.594 30118 DEBUG oslo_concurrency.lockutils [req-4fd05c88-6027-4c7d-8d57-cd0243627d2e 6adacf51939c4e86ad5053da09aaa0d6 b0b0f4335c
1246228d4299fdd4fe48d0 - - -] Lock "2546a578-5958-4474-9d8f-d89d1bc64ef6" released by "nova.compute.manager.do_terminate_instance" :: held 18.887s i
nner /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:282
meaning the instance was deleted around 08:30:10.
Inspecting further one can find that instance deletion took so much time because OpenStack processes had to reconnect to RabbitMQ, which takes some time. For instance, inspect request 'req-ff447d59-a7c4-45ad-b77a-6c25dadd743d' in ./node-3/var/log/neutron/server.log. It is a request to delete the instance's port and it took 13 seconds to complete because neutron-server had to reconnect to RabbitMQ. There is little we can do to improve reconnection - it takes time for to discover that network peer is down, so I suggest to increase timeout for server deletion instead.
Fuel QA team, please increase server deletion timeout to 1 minute in OSTF test. I am pretty sure that this should be enough.
I have reviewed the latest failure provided by Tatyana. The problem is that OSTF test waits for server to be deleted in 30 seconds, while that operation took a little more (around 31 seconds). Here is the time when OSTF test timed out (/var/log/ostf.log from master node):
2016-08-15 08:30:09 ERROR (nose_storage_ plugin) fuel_health. tests.smoke. test_create_ volume. VolumesTest. test_create_ boot_volume
The instance name was 'ost1_test- boot-volume- instance9451511 22' and its id was '2546a578- 5958-4474- 9d8f-d89d1bc64e f6'
One can find that the instance was deleted just a little bit later after 08:30:09. Here is the latest entry from node-2/ var/log/ nova/nova- compute. log:
2016-08-15 08:30:10.594 30118 DEBUG oslo_concurrenc y.lockutils [req-4fd05c88- 6027-4c7d- 8d57-cd0243627d 2e 6adacf51939c4e8 6ad5053da09aaa0 d6 b0b0f4335c 4fe48d0 - - -] Lock "2546a578- 5958-4474- 9d8f-d89d1bc64e f6" released by "nova.compute. manager. do_terminate_ instance" :: held 18.887s i python2. 7/dist- packages/ oslo_concurrenc y/lockutils. py:282
1246228d4299fdd
nner /usr/lib/
meaning the instance was deleted around 08:30:10.
Inspecting further one can find that instance deletion took so much time because OpenStack processes had to reconnect to RabbitMQ, which takes some time. For instance, inspect request 'req-ff447d59- a7c4-45ad- b77a-6c25dadd74 3d' in ./node- 3/var/log/ neutron/ server. log. It is a request to delete the instance's port and it took 13 seconds to complete because neutron-server had to reconnect to RabbitMQ. There is little we can do to improve reconnection - it takes time for to discover that network peer is down, so I suggest to increase timeout for server deletion instead.
Fuel QA team, please increase server deletion timeout to 1 minute in OSTF test. I am pretty sure that this should be enough.