CENGN has been unable to complete a build since March 3.

Bug #2009722 reported by Scott Little
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Critical
Dostoievski Albino Batista

Bug Description

Brief Description
-----------------
CENGN has been unable to complete a build since March 3.

Since then, four builds were attempted, and all builds hung within the post-build unit tests of python3.9_3.9.2-1.stx.1.

One build was hung for nearly 48 hours.

The logs are not specific on which test is hanging.

Killing the hung unit test results in the overall build of the python3.9 package build failing, but a retry loop attempts to rebuild, and it again hangs in the unit tests.

There is nothing in the change logs that directly affect this package. The only build system changes relate to secure boot. signing and as the python package is not one that requires signing, I'm currently discounting those as a cause.

An equivalent build using an internal WindRiver build machine has so far not hit this issue. The main difference being that CENGN uses minikube, and the internal server is using kubernetes directly.

One theory was that something change upstream that affects the content within the build containers. However, both CENGN and the internal build server rebuild the build containers each time. If there was a change upstream, I would expect both builds to see it.

Most designers are likely using minikube, and so far I've seen no complaints from designers on this topic. Perhaps designers are using a build environment created on or before March 3, and haven't seen it yet.

Severity
--------
Critical

Steps to Reproduce
------------------
build-pkgs

Expected Behavior
------------------
build succeeds

Actual Behavior
----------------
build hangs

Reproducibility
---------------
Reproducible on CENGN only so far

System Configuration
--------------------
N/A

Branch/Pull Time/Commit
-----------------------
March 7

Last Pass
---------
March 3

Timestamp/Logs
--------------
1:24:00 load avg: 0.91 running: test_builtin (1 hour 24 min)
1:24:30 load avg: 0.85 running: test_builtin (1 hour 24 min)
1:25:00 load avg: 0.92 running: test_builtin (1 hour 25 min)
1:25:30 load avg: 0.71 running: test_builtin (1 hour 25 min)
1:26:00 load avg: 0.61 running: test_builtin (1 hour 26 min)
1:26:30 load avg: 0.37 running: test_builtin (1 hour 26 min)
1:27:00 load avg: 0.30 running: test_builtin (1 hour 27 min)
1:27:30 load avg: 0.18 running: test_builtin (1 hour 27 min)
1:28:00 load avg: 0.37 running: test_builtin (1 hour 28 min)
1:28:30 load avg: 0.58 running: test_builtin (1 hour 28 min)
...

Test Activity
-------------
Build

Workaround
----------
disable python3.9's unit tests

Revision history for this message
Scott Little (slittle1) wrote :

Reproducible from an new independent minikube build environment running on the CENGN build server.

Revision history for this message
Scott Little (slittle1) wrote :

No change to the packages used to satisfy the build dependencies within the chroot.

Revision history for this message
Scott Little (slittle1) wrote :

Two changes to the packages that contribute to the build containers between March 3 (success) and March 5 (hang) ...

pulpcore_client 3.22.2 -> 3.22.3
charset_normalizer 3.0.1 -> 3.1.0

Revision history for this message
Scott Little (slittle1) wrote :

Briefly worked after a reboot of the build server. Now it is hanging again. Caching effect?

Recommend disabling the Python3.9 unit tests until this issue is better understood.

Scott Little (slittle1)
Changed in starlingx:
importance: Undecided → Critical
assignee: nobody → Dostoievski Albino Batista (dalbinob)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/877955

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/877955
Committed: https://opendev.org/starlingx/integ/commit/4a092412c56f7a65195fe4abaa1649b7714c0e1f
Submitter: "Zuul (22348)"
Branch: master

commit 4a092412c56f7a65195fe4abaa1649b7714c0e1f
Author: Dostoievski Batista <email address hidden>
Date: Mon Mar 20 09:35:17 2023 -0300

    python3.9: disable unit tests

    When building python3.9 the process get stuck on
    running self-test process. Will disable it as we
    investigate further.

    Partial-Bug: 2009722

    Signed-off-by: Dostoievski Batista <email address hidden>
    Change-Id: I868e53fc2aa5b8f769ccea4d4cb14470213cfcf7

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/878903

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/878903
Committed: https://opendev.org/starlingx/integ/commit/36c011813d0a04ff04e5931cc951d6ef39adce03
Submitter: "Zuul (22348)"
Branch: master

commit 36c011813d0a04ff04e5931cc951d6ef39adce03
Author: Dostoievski Batista <email address hidden>
Date: Wed Mar 29 10:38:36 2023 -0300

    python3.9: disable unit tests

    When building python3.9 the process still getting
    stuck on running unit tests process. This change
    disable test_builtin and test_openpty that have
    been the ones hanging in the latest build runs.

    Test Plan:
    PASS: Build the package with build-pkgs -c -p python3.9

    Partial-Bug: 2009722

    Signed-off-by: Dostoievski Batista <email address hidden>
    Change-Id: I27a6a01f35345bc5353bf8041a45d5f2a500dded

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/879548

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/879548
Committed: https://opendev.org/starlingx/integ/commit/14affb58705c58873e8e8f1ad8d686da18f91ef3
Submitter: "Zuul (22348)"
Branch: master

commit 14affb58705c58873e8e8f1ad8d686da18f91ef3
Author: Dostoievski Batista <email address hidden>
Date: Wed Apr 5 09:28:25 2023 -0300

    python3.9: Add timeout regrtest

    Even after disabling python3.9 specific test we are
    seeing the build process getting stuck on the test run again.
    On previous run all the test took less then 3 minute, this
    change aim to set a timeout of 4 minutes to every test
    run by regrtest.py.

    Test Plan:
    PASS: Build the package with build-pkgs -c -p python3.9 successfully

    Partial-Bug: 2009722

    Signed-off-by: Dostoievski Batista <email address hidden>
    Change-Id: Ibf286223e5c2cd6616f7cc1c98b6953808d774b1

Ghada Khalil (gkhalil)
tags: added: stx.9.0 stx.build
Revision history for this message
Scott Little (slittle1) wrote :
Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.