[CI] gate functional and fullstack timeouts without reports of the causative test case
Bug #1860774 reported by
Rodolfo Alonso
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Invalid
|
Medium
|
Unassigned |
Bug Description
A common problem in the CI jobs (mainly in the gate), in the test suite timeout without having information about the offender(s) test case. In order to avoid blocking a whole test suite, a test case execution timeout should be set for both FT and fullstack CI jobs. Although the test suite can fail because of a non-passing test case, at least we'll have this information.
Example: [1]. The test suite fails but no test case is reported to fail. We can see worker {3} stops executing jobs very soon in the logs.
The variable OS_TEST_TIMEOUT seems not to be working properly.
description: | updated |
Changed in neutron: | |
importance: | Undecided → Medium |
Changed in neutron: | |
status: | Confirmed → Invalid |
To post a comment you must log in.
As I see OS_TEST_TIMEOUT is set to 180s in case of functional tests (See https:/ /opendev. org/openstack/ neutron/ src/branch/ master/ tox.ini# L36) and to 600s in case of fullstack (see https:/ /opendev. org/openstack/ neutron/ src/branch/ master/ tox.ini# L74) in tox.ini.
I don't know about zuul settings, perhaps there are some common/ancestor yamls that override these.
From the example you linked as I see none of the tests' execution time is over 180s:
neutron. tests.functiona l.agent. l3.extensions. test_gateway_ ip_qos_ extension. TestRouterGatew ayIPQosAgentExt ensionDVR. test_dvr_ ha_router_ failover_ with_gw_ and_floatingip 97.022904 tests.functiona l.agent. l3.extensions. test_port_ forwarding_ extension. TestL3AgentFipP ortForwardingEx tensionDVR. test_dvr_ ha_router_ unbound_ from_agents 90.15016 tests.functiona l.agent. l3.extensions. qos.test_ fip_qos_ extension. TestL3AgentFipQ osExtensionDVR. test_dvr_ ha_router_ unbound_ from_agents 82.2691 tests.functiona l.agent. l3.test_ dvr_router. TestDvrRouter. test_dvr_ ha_router_ failover_ with_gw_ and_floatingip 80.532779 tests.functiona l.agent. l3.extensions. test_gateway_ ip_qos_ extension. TestRouterGatew ayIPQosAgentExt ensionDVR. test_dvr_ ha_router_ failover_ with_gw 80.438136 tests.functiona l.agent. l3.extensions. test_port_ forwarding_ extension. TestL3AgentFipP ortForwardingEx tensionDVR. test_dvr_ ha_router_ failover_ with_gw 80.29684 tests.functiona l.db.migrations .test_2e0d7a8a1 586_add_ binding_ index_to_ routerl3agentbi nding.TestHARou terPortMigratio nMysql. test_walk_ versions 77.645509 tests.functiona l.agent. l3.extensions. qos.test_ fip_qos_ extension. TestL3AgentFipQ osExtensionDVR. test_dvr_ ha_router_ failover_ with_gw_ and_floatingip 75.95222 tests.functiona l.agent. l3.extensions. test_gateway_ ip_qos_ extension. TestRouterGatew ayIPQosAgentExt ensionDVR. test_dvr_ non_ha_ router_ update 71.40815 tests.functiona l.db.migrations .test_3b935b28e 7a0_migrate_ to_pluggable_ ipam.TestMigrat ionToPluggableI pamMysql. test_walk_ versions 71.041897
neutron.
neutron.
neutron.
neutron.
neutron.
neutron.
neutron.
neutron.
neutron.
The total test execution time is ~11252s (just dumbly adding together the numbers after each test from the job-output)
I hope my counting is correct.
The functional timeout (here: https:/ /opendev. org/openstack/ neutron/ src/branch/ master/ zuul.d/ base.yaml# L5) is 7800 so I suppose this should be increased, or am I missing something?