[FT] ``neutron-functional`` job has timed out 4 times in the last 2 days

Bug #2110004 reported by Rodolfo Alonso
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
High
yatin

Bug Description

As seen in [1], the job ``neutron-functional`` has timed out 4 times in the last 2 days.

We should:
* Improve and reduce the execution time of some tests.
* Increase the job timeout.

[1]https://zuul.opendev.org/t/openstack/builds?job_name=neutron-functional&result=TIMED_OUT&skip=0

Revision history for this message
Miro Tomaska (mtomaska) wrote :

I guess the easiest thing would be to just raise the current job timeout of 7800s [1].

I will scan the test execution times to see if I can spot some low hanging fruit to reduce the time.

[1] https://opendev.org/openstack/neutron/src/commit/123bd115f3b65ba09560685ad6cf68c6934a6535/zuul.d/base.yaml#L5

tags: added: functional-tests
Changed in neutron:
importance: Undecided → High
Revision history for this message
Miro Tomaska (mtomaska) wrote :

I did some rough analysis on jobs that succeed and ones that time out.
One common theme I see is that some jobs have pretty constant execution times while some can vary a lot.
For instance:

neutron.tests.functional.db.test_migrations.TestModelsMigrations.test_check_mysql_engine takes around 43s but at times it can take 80s to run.
Or
neutron.tests.functional.db.test_migrations.TestWalkMigrations.test_walk_versions takes around 40s but at times it takes 74s.
All these spikes can add up to the overall dsvm-functional job execution time.

They are migration related bugs so it kinda make sense that their execution time will vary based on the data.

Maybe that is where we can spend some time to stabalize those tests?
Or
Move them to a separate job? (just an idea)

PS: It looks like when a job times out the test results will not be uploaded and available in OpenSearch. If we could include individual test execution times in OpenSearch it will be nice to see that histogram.

Revision history for this message
yatin (yatinkarel) wrote :

I checked a few timeouts and looks increasing job timeout will not help based on this, as looks test run is just stuck. Examples:-

1) https://0b7e0022f9f9c40dd7e3-ebffe97f8b5c0e4c161da97d302aa202.ssl.cf1.rackcdn.com/openstack/1b8c360de1724daeb343b6fc714ef102/job-output.txt
single threaded stuck
2025-05-12 06:09:26.117392 | controller | {0} neutron.tests.functional.agent.common.test_ovs_lib.BaseOVSTestCase.test_update_minimum_bandwidth_queue_no_qos_no_queue [0.273364s] ... ok
2025-05-12 07:52:10.931735 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/openstack/neutron/playbooks/run_functional_job.yaml@master]

2) https://zuul.opendev.org/t/openstack/build/bff4a7b544354108a59e377e842df590
multi threaded stuck
2025-05-05 20:49:03.434675 | controller | {2} neutron.tests.functional.test_server.TestWsgiServer.test_restart_wsgi_on_sighup_multiple_workers ... SKIPPED: neutron.tests.functional.test_server.TestWsgiServer.test_restart_wsgi_on_sighup_multiple_workers was marked as unstable because of bug 1930367, failure was:
2025-05-05 22:31:51.332821 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/openstack/neutron/playbooks/run_functional_job.yaml@master]

3) https://zuul.opendev.org/t/openstack/build/2b3b857b7e2a4a388ea6139c7e61d350
multi thread stuck
2025-05-05 00:37:04.715890 | controller | {0} neutron.tests.functional.test_server.TestPluginWorker.test_start [2.036194s] ... ok
2025-05-05 02:28:30.466971 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/openstack/neutron/playbooks/run_functional_job.yaml@master]

There are some cases where timeout increase may help i.e where job running with 1 concurrency even for multi thread case. But we may handle that differently. The above issue above related to tests stuck looks more important.
https://zuul.opendev.org/t/openstack/build/5c198c2b4bf7423e85ec0e9ce33c31e3
2025-05-05 11:12:27.282868 | controller | {0} neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverSsl.test_ovn_nb_sync_off [72.166968s] ... ok
2025-05-05 11:12:59.504449 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/openstack/neutron/playbooks/run_functional_job.yaml@master]

yatin (yatinkarel)
Changed in neutron:
assignee: nobody → yatin (yatinkarel)
Revision history for this message
Miro Tomaska (mtomaska) wrote :

@yatin
I saw that output as well, you mean that the job is stuck in executing some particular test? (i.e. infine loop type of a thing) That was not clear to me from the logs.

I did however noticed very long execution times for the tests that run already as I mentioned in my comment#1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/950303

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/950303
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.