starlingx/config: unit test exceptions & timeouts

Bug #2064660 reported by Davlet Panech
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Undecided
Leonardo Fagundes Luz Serrano

Bug Description

Brief Description
-----------------

Unit tests in starlingx/config have problems:

1) They show many exceptions similar to:

2024-05-01 22:31:32.001464 | debian-bullseye | AttributeError: 'Semaphore' object has no attribute '_at_fork_reinit'
2024-05-01 22:31:32.001518 | debian-bullseye | Exception ignored in: <function _after_at_fork_child_reinit_locks at 0x7f1d55910790>
2024-05-01 22:31:32.001689 | debian-bullseye | Traceback (most recent call last):
2024-05-01 22:31:32.001715 | debian-bullseye | File "/usr/lib/python3.9/logging/__init__.py", line 251, in _after_at_fork_child_reinit_locks
2024-05-01 22:31:32.001734 | debian-bullseye | handler._at_fork_reinit()
2024-05-01 22:31:32.002157 | debian-bullseye | File "/usr/lib/python3.9/logging/__init__.py", line 890, in _at_fork_reinit
2024-05-01 22:31:32.002204 | debian-bullseye | self.lock._at_fork_reinit()
2024-05-01 22:31:32.002296 | debian-bullseye | File "/usr/lib/python3.9/threading.py", line 126, in _at_fork_reinit
2024-05-01 22:31:32.002316 | debian-bullseye | self._block._at_fork_reinit()

at least when executed by Zuul in Gerrit.

2) They take too long to run, and sometimes cause Zuul jobs to fail. This may be a side effect of problem 1.

Problem has been observed in Zuul. I don't know if it happens when you run these tests manually.

Example review: https://review.opendev.org/c/starlingx/config/+/915922
Failed pipeline: https://zuul.opendev.org/t/openstack/build/b7e405a4bbd34110bc28e83f79a6b219 - eventually succeeded when I forced it to re-run itself.

Severity
--------
Major

Steps to Reproduce
------------------
Problem observed in Zuul, see above

Expected Behavior
------------------
All Zuul tasks should succeed within 30 minutes

Actual Behavior
----------------
Zuul fails intermittently

Reproducibility
---------------
Intermittent

System Configuration
--------------------
N/A

Branch/Pull Time/Commit
-----------------------
master/2024-05-01

Last Pass
---------
N/A

Timestamp/Logs
--------------
See TIMED_OUT jobs in https://review.opendev.org/c/starlingx/config/+/915922 . I will also attach those logs separately to this LP.

Test Activity
-------------
[Sanity, Feature Testing, Regression Testing, Developer Testing, Evaluation, Other - Please specify]

Workaround
----------
Describe workaround if available

Revision history for this message
Davlet Panech (dpanech) wrote :
Revision history for this message
Davlet Panech (dpanech) wrote :
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/923576

Changed in starlingx:
assignee: nobody → Leonardo Fagundes Luz Serrano (lfagunde)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/923531
Committed: https://opendev.org/starlingx/config/commit/4f42df1fb6053033ecddbf0728bc1c3a5ff0512a
Submitter: "Zuul (22348)"
Branch: master

commit 4f42df1fb6053033ecddbf0728bc1c3a5ff0512a
Author: Leonardo Fagundes Luz Serrano <email address hidden>
Date: Thu Jul 4 15:03:40 2024 -0300

    Zuul: Increase sysinv-tox-py39 timeout

    The sysinv-tox-py39 zuul job sometimes fails due
    to reaching the default timeout of 30 min.

    This commit increases timeout for this job to 45 min.

    Ref:
    https://review.opendev.org/c/openstack/glance/+/923486

    Test Plan:
    pass - Patchset 1 increases job duration to above 30 min
           and Zuul doesn't timeout

    Partial-Bug: 2064660

    Change-Id: I29b41ef51c1b7f1c437aba13b318c72487664cc5
    Signed-off-by: Leonardo Fagundes Luz Serrano <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to root (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/root/+/923655

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/923576
Committed: https://opendev.org/starlingx/config/commit/3e76df1dec7a55fb6483eb9ceae4f13d00bdf072
Submitter: "Zuul (22348)"
Branch: master

commit 3e76df1dec7a55fb6483eb9ceae4f13d00bdf072
Author: Leonardo Fagundes Luz Serrano <email address hidden>
Date: Fri Jul 5 11:15:00 2024 -0300

    Reduce kubernetes test runtime

    One of the tests in the config repo is taking a bit
    over 3 min to complete [1]:

    tests.common.test_kubernetes.TestKubeOperator.test_kube_get_control_plane_versions_missing_component

    This commit reduces runtime by removing the 10s
    delay between retries, based on the assumption this
    particular test does not benefit from this delay.
    With this change the test completes in under 2s

    Ref:
    [1] https://023f272a3bf7658d777e-6f226cfa9bcb6b71ca161c2acdd8e2fb.ssl.cf1.rackcdn.com/923519/4/check/sysinv-tox-py39/ac4fc82/job-output.txt

    Test Plan:
    pass - tox -e py39 -c sysinv/sysinv/sysinv/tox.ini

    Partial-Bug: 2064660

    Change-Id: Icb4794adc57a14ee09c6a65d8829b5e67a869af4
    Signed-off-by: Leonardo Fagundes Luz Serrano <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to root (master)

Reviewed: https://review.opendev.org/c/starlingx/root/+/923655
Committed: https://opendev.org/starlingx/root/commit/696d2dfd2164e813bea7b6e3de35d1eb27c43843
Submitter: "Zuul (22348)"
Branch: master

commit 696d2dfd2164e813bea7b6e3de35d1eb27c43843
Author: Leonardo Fagundes Luz Serrano <email address hidden>
Date: Mon Jul 8 11:49:20 2024 -0300

    Tox: Fix Semaphore object AttributeError

    Some tests, such as sysinv-tox-py39 on the config repo,
    are spamming this error message in the test logs [1]:

    AttributeError: 'Semaphore' object has no attribute '_at_fork_reinit'

    This issue is fixed on eventlet 0.30.0

    In addition, eventlet 0.32.0 has a dnspython v2 compatibility fix.
    Since dnspython 2.0.0 is the box version in bullseye, this update
    is included as well.

    Changes between versions 0.26.1 and 0.32.0 are few
    and should not affect test results.

    Ref:
    [1] Logs from a config repo review:
    https://00a1c49e3c0449afb12b-dfb7731ce6789292a31228a6fdf28206.ssl.cf1.rackcdn.com/923576/2/check/sysinv-tox-py39/507dc4f/job-output.txt
    [2] Eventlet changelog:
    https://github.com/eventlet/eventlet/blob/master/NEWS

    Test Plan:
    pass - On the config repo, run:
           tox -e py39 -c sysinv/sysinv/sysinv/tox.ini
           with UPPER_CONSTRAINTS_FILE env variable set
           to the updated upper-constraints.txt

    Closes-Bug: 2064660

    Change-Id: I2727b2f6bd955dcf9d917308f4795594f5a063e0
    Signed-off-by: Leonardo Fagundes Luz Serrano <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.