I was trying to reproduce that issue today but wasn't able to reproduce and investigate that issue. When I run it on test patch, tempest ended up for me in about 4300 seconds
======
Totals
======
Ran: 1425 tests in 4297.8985 sec.
- Passed: 1303
- Skipped: 121
- Expected Fail: 0
- Unexpected Success: 0
- Failed: 1
Sum of execute time for each test: 8342.5592 sec.
2021-09-12 12:12:53.866 313000 INFO tempest.lib.common.rest_client [req-c8d718d1-8c44-4126-9a4c-2e6e66081953 ] Request (AttachInterfacesTestJSON:test_reassign_port_between_servers): 200 GET https://[2001:db8:fd00:1000::5]:13774/v2.1/servers/fa0a5e01-18b7-4d8c-bb5e-5676bd255623 21.053s
But when I looked at the same request in the nova api log I saw:
2021-09-12 12:12:53.858 12 INFO nova.api.openstack.requestlog [req-c8d718d1-8c44-4126-9a4c-2e6e66081953 dc8b761741ad45feaaa79fb89a76ba48 1b908d45bc8a42ce82de2aa2ce61c30f - default default] fd00:fd00:fd00:2000::1c2 "GET /v2.1/servers/fa0a5e01-18b7-4d8c-bb5e-5676bd255623" status: 200 len: 1664 microversion: 2.1 time: 2.400599
So for me the issue looks like it is somewhere in the haproxy maybe? As it seems that query was processed by Nova fast but was "stucked" somewhere in the middle.
I was trying to reproduce that issue today but wasn't able to reproduce and investigate that issue. When I run it on test patch, tempest ended up for me in about 4300 seconds
======
Totals
======
Ran: 1425 tests in 4297.8985 sec.
- Passed: 1303
- Skipped: 121
- Expected Fail: 0
- Unexpected Success: 0
- Failed: 1
Sum of execute time for each test: 8342.5592 sec.
I also checked builds history https:/ /review. rdoproject. org/zuul/ builds? job_name= periodic- tripleo- ci-centos- 8-ovb-3ctlr_ 1comp-featurese t035-wallaby and I see that time of that job varies from about 3h 14min (https:/ /review. rdoproject. org/zuul/ build/0f8c934c6 3864203a669348b 086a2882) to more than 4 hours (and sometimes timeout, like e.g. https:/ /review. rdoproject. org/zuul/ build/9d2881833 e4d4ff3903afa4a 3501ecc3/ logs)
Next I compared time of the test execution in the fast (https:/ /review. rdoproject. org/zuul/ build/0f8c934c6 3864203a669348b 086a2882/ logs) run and in the timed out one (https:/ /review. rdoproject. org/zuul/ build/9d2881833 e4d4ff3903afa4a 3501ecc3/ logs). Looking at top 20 slowest tests in both run we see:
fast run
('tempest. api.compute. servers. test_server_ actions. ServerActionsTe stJSON. test_rebuild_ server_ with_volume_ attached' , 101.336682) api.compute. admin.test_ create_ server. ServersWithSpec ificFlavorTestJ SON.test_ verify_ created_ server_ ephemeral_ disk', 104.953735) scenario. test_network_ basic_ops. TestNetworkBasi cOps.test_ update_ instance_ port_admin_ state', 113.117529) api.compute. volumes. test_attach_ volume. AttachVolumeTes tJSON.test_ attach_ detach_ volume' , 113.68504) scenario. test_network_ basic_ops. TestNetworkBasi cOps.test_ network_ basic_ops' , 115.171857) api.compute. volumes. test_attach_ volume. AttachVolumeShe lveTestJSON. test_attach_ volume_ shelved_ or_offload_ server' , 115.596492) scenario. test_network_ v6.TestGettingA ddress. test_slaac_ from_os' , 116.36545) scenario. test_network_ v6.TestGettingA ddress. test_dhcp6_ stateless_ from_os' , 118.156141) api.compute. admin.test_ servers. ServersAdminTes tJSON.test_ rebuild_ server_ in_error_ state', 129.538088) scenario. test_network_ v6.TestGettingA ddress. test_dualnet_ slaac_from_ os', 130.889002) scenario. test_network_ v6.TestGettingA ddress. test_multi_ prefix_ dhcpv6_ stateless' , 131.692391) scenario. test_network_ basic_ops. TestNetworkBasi cOps.test_ connectivity_ between_ vms_on_ different_ networks' , 133.453555) scenario. test_network_ v6.TestGettingA ddress. test_multi_ prefix_ slaac', 135.368074) api.compute. servers. test_attach_ interfaces. AttachInterface sTestJSON. test_create_ list_show_ delete_ interfaces_ by_network_ port', 143.339424) scenario. test_network_ v6.TestGettingA ddress. test_dualnet_ dhcp6_stateless _from_os' , 148.011558) scenario. test_network_ v6.TestGettingA ddress. test_dualnet_ multi_prefix_ slaac', 156.358172) api.compute. servers. test_attach_ interfaces. AttachInterface sTestJSON. test_reassign_ port_between_ servers' , 157.718676) api.compute. servers. test_device_ tagging. TaggedBootDevic esTest. test_tagged_ boot_devices' , 161.264332) scenario. test_network_ v6.TestGettingA ddress. test_dualnet_ multi_prefix_ dhcpv6_ stateless' , 164.44343) api.compute. servers. test_device_ tagging. TaggedBootDevic esTest_ v242.test_ tagged_ boot_devices' , 164.663563)
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
and timed out one:
('tempest. api.compute. servers. test_attach_ interfaces. AttachInterface sTestJSON. test_create_ list_show_ delete_ interfaces_ by_fixed_ ip', 167.733248) scenario. test_network_ basic_ops. TestNetworkBasi cOps.test_ update_ instance_ port_admin_ state', 176.180869) scenario. test_network_ v6.TestGettingA ddress. test_dhcp6_ stateless_ from_os' , 186.90833) scenario. test_network_ basic_ops. TestNetworkBasi cOps.test_ network_ basic_ops' , 191.16799) api.compute. admin.test_ create_ server. ServersWithSpec ificFlavorTestJ SON.test_ verify_ created_ server_ ephemeral_ disk', 192.309398) scenario. test_network_ basic_ops. TestNetworkBasi cOps.test_ connectivity_ between_ vms_on_ different_ networks' , 192.523786) scenario. test_network_ v6.TestGettingA ddress. test_slaac_ from_os' , 194.261026) scenario. test_network_ v6.TestGettingA ddress. test_multi_ prefix_ slaac', 205.108607) scenario. test_network_ v6.TestGettingA ddress. test_multi_ prefix_ dhcpv6_ stateless' , 209.214145) api.compute. volumes. test_attach_ volume. AttachVolumeTes tJSON.test_ attach_ detach_ volume' , 222.186212) scenario. test_network_ v6.TestGettingA ddress. test_dualnet_ slaac_from_ os', 222.788042) api.compute. volumes. test_attach_ volume. AttachVolumeShe lveTestJSON. test_detach_ volume_ shelved_ or_offload_ server' , 223.802673) api.compute. volumes. test_attach_ volume. AttachVolumeShe lveTestJSON. test_attach_ volume_ shelved_ or_offload_ server' , 229.714189) scenario. test_network_ v6.TestGettingA ddress. test_dualnet_ multi_prefix_ slaac', 230.452579) scenario. test_network_ v6.TestGettingA ddress. test_dualnet_ dhcp6_stateless _from_os' , 230.810108) api.compute. servers. test_attach_ interfaces. AttachInterface sTestJSON. test_create_ list_show_ delete_ interfaces_ by_network_ port', 245.107367) scenario. test_network_ v6.TestGettingA ddress. test_dualnet_ multi_prefix_ dhcpv6_ stateless' , 246.453286) api.compute. servers. test_device_ tagging. TaggedBootDevic esTest. test_tagged_ boot_devices' , 248.856564) api.compute. servers. test_device_ tagging. TaggedBootDevic esTest_ v242.test_ tagged_ boot_devices' , 266.036853) api.compute. servers. test_attach_ interfaces. AttachInterface sTestJSON. test_reassign_ port_between_ servers' , 284.01886)
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
('tempest.
So it seems that in those slow runs all tests are running much more slower. There is no one strict test (or group of tests) which is breaking things.
While I was comparing those "slow" and "fast" run of tests I checked one of them in the "slow" run to see what is slowing it down as much. sTestJSON: test_reassign_ port_between_ servers from https:/ /logserver. rdoproject. org/openstack- periodic- integration- stable1/ opendev. org/openstack/ tripleo- ci/master/ periodic- tripleo- ci-centos- 8-ovb-3ctlr_ 1comp-featurese t035-wallaby/ 9d28818/ logs/undercloud /home/zuul/ tempest/ tempest. log.txt. gz
I checked test AttachInterface
One of the longest API requests there was:
2021-09-12 12:12:53.866 313000 INFO tempest. lib.common. rest_client [req-c8d718d1- 8c44-4126- 9a4c-2e6e660819 53 ] Request (AttachInterfac esTestJSON: test_reassign_ port_between_ servers) : 200 GET https://[2001:db8: fd00:1000: :5]:13774/ v2.1/servers/ fa0a5e01- 18b7-4d8c- bb5e-5676bd2556 23 21.053s
But when I looked at the same request in the nova api log I saw: openstack. requestlog [req-c8d718d1- 8c44-4126- 9a4c-2e6e660819 53 dc8b761741ad45f eaaa79fb89a76ba 48 1b908d45bc8a42c e82de2aa2ce61c3 0f - default default] fd00:fd00: fd00:2000: :1c2 "GET /v2.1/servers/ fa0a5e01- 18b7-4d8c- bb5e-5676bd2556 23" status: 200 len: 1664 microversion: 2.1 time: 2.400599
2021-09-12 12:12:53.858 12 INFO nova.api.
So for me the issue looks like it is somewhere in the haproxy maybe? As it seems that query was processed by Nova fast but was "stucked" somewhere in the middle.