Comment 14 for bug 1607808

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

I have reviewed the latest failure provided by Tatyana. The problem is that OSTF test waits for server to be deleted in 30 seconds, while that operation took a little more (around 31 seconds). Here is the time when OSTF test timed out (/var/log/ostf.log from master node):

2016-08-15 08:30:09 ERROR (nose_storage_plugin) fuel_health.tests.smoke.test_create_volume.VolumesTest.test_create_boot_volume

The instance name was 'ost1_test-boot-volume-instance945151122' and its id was '2546a578-5958-4474-9d8f-d89d1bc64ef6'

One can find that the instance was deleted just a little bit later after 08:30:09. Here is the latest entry from node-2/var/log/nova/nova-compute.log:

2016-08-15 08:30:10.594 30118 DEBUG oslo_concurrency.lockutils [req-4fd05c88-6027-4c7d-8d57-cd0243627d2e 6adacf51939c4e86ad5053da09aaa0d6 b0b0f4335c
1246228d4299fdd4fe48d0 - - -] Lock "2546a578-5958-4474-9d8f-d89d1bc64ef6" released by "nova.compute.manager.do_terminate_instance" :: held 18.887s i
nner /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:282

meaning the instance was deleted around 08:30:10.

Inspecting further one can find that instance deletion took so much time because OpenStack processes had to reconnect to RabbitMQ, which takes some time. For instance, inspect request 'req-ff447d59-a7c4-45ad-b77a-6c25dadd743d' in ./node-3/var/log/neutron/server.log. It is a request to delete the instance's port and it took 13 seconds to complete because neutron-server had to reconnect to RabbitMQ. There is little we can do to improve reconnection - it takes time for to discover that network peer is down, so I suggest to increase timeout for server deletion instead.

Fuel QA team, please increase server deletion timeout to 1 minute in OSTF test. I am pretty sure that this should be enough.