periodic-tripleo-ci-centos-8-9-multinode-mixed-os is repprting to the wrong dlrn hash

Bug #1990012 reported by Ronelle Landy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Marios Andreou

Bug Description

periodic-tripleo-ci-centos-8-9-multinode-mixed-os is repprting to the wrong dlrn hash.

Consider the following buildset: reporting to "aggregate_hash": "5b62e82eebd71259ed086d27d7a2abae":

https://review.rdoproject.org/zuul/buildset/db137665da154a91a62ac59bd7415a12

periodic-tripleo-ci-centos-8-9-multinode-mixed-os reports to hash: "aggregate_hash": "164135bf6866a2568241e186f56b903b".

The rr tool reports:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Pending running jobs ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ periodic-tripleo-ci-centos-8-9-multinode-mixed-os │
└──────────────────────────────────────────────────────────────────────────────────┘
but the job ran and passed:

periodic-tripleo-ci-centos-8-9-multinode-mixed-os openstack/tripleo-ci master openstack-periodic-integration-stable1-cs8 2 hrs 4 mins 14 secs 2022-09-16 14:36:38 SUCCESS

Ronelle Landy (rlandy)
Changed in tripleo:
milestone: none → zed-1
importance: Undecided → Critical
status: New → Triaged
tags: added: promotion-blocker
Revision history for this message
Marios Andreou (marios-b) wrote :

been digging at this a bit

this only hits the 8-9 job, the 9-8 is OK. For example at [1] for 9-8 you can see

2022-09-18 23:46:21.556277 | primary | "aggregate_hash": "93a6958b4b3d2772324fb241e308943a",
and

2022-09-18 18:11:34.531974 | primary | TASK [repo-setup : tripleo-get-hash tripleo-ci-testing for wallaby on centos9 from https://trunk.rdoproject.org] ***
2022-09-18 18:11:34.531997 | primary | Sunday 18 September 2022 18:11:34 -0400 (0:00:00.105) 0:00:08.708 ******
2022-09-18 18:11:36.762907 | primary | ok: [subnode-1]
2022-09-18 18:11:36.781803 | primary |
2022-09-18 18:11:36.781836 | primary | TASK [repo-setup : Set fact dlrn_hash->93a6958b4b3d2772324fb241e308943a] *******

I think the problem is in the conditionals from the pre plays at [2] (around ansible_distribution_major_version)
We will need some tweak there or [3].

The dlrn reporting happens in [4] in post plays

[1] https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-8-multinode-mixed-os/182500d/job-output.txt
[2] https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/infra-setup/roles/get_hash/tasks/get_hash.yaml
[3] https://github.com/rdo-infra/review.rdoproject.org-config/blob/master/playbooks/tripleo-ci-periodic-base/pre.yaml
[4] https://github.com/rdo-infra/review.rdoproject.org-config/blob/master/ci-scripts/tripleo-upstream/dlrnapi_report.sh

Revision history for this message
Marios Andreou (marios-b) wrote (last edit ):

...found some time to dig into this some more.

The problem is definitely at [1] where based on the ansible_distribution_major_version of the node running that code (undercloud) we query the relevant dlrn endpoint and fetch the hash from it.

In the 8-9 job, we need to report for centos8, but the node creating the hash_info and reporting to delorean is centos9.

One relatively simple way to fix it is to just override 'distro' at [2]. I have proposed [3] to do that. The problem is that [3] is in a config repo so we cannot test it before merging.

I propose that we merge [3] and I'll run testproject for those jobs as soon as it lands. Depending on the result we can keep it or revert it immediately.

[1] https://github.com/rdo-infra/ci-config/blob/43d194a98c0aafe40559706d85a538bdf60e9dff/ci-scripts/infra-setup/roles/get_hash/defaults/main.yml#L51
[2] https://github.com/rdo-infra/review.rdoproject.org-config/blob/cb5636cf022718bc2f612c49a22d51269d4bfaae/playbooks/tripleo-ci-periodic-base/pre.yaml#L7-L9
[3] https://review.rdoproject.org/r/c/config/+/45259

Revision history for this message
chandan kumar (chkumar246) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

adding update here as i was asked on irc earlier - @Chandan we still have further work here.

Test results from https://review.rdoproject.org/r/c/config/+/45259 are at https://review.rdoproject.org/r/c/testproject/+/44234/11#message-36c8132cdce80a6f80e9147c8b5b22750deec22b and show that the job is not yet reporting as expected.

I posted https://review.rdoproject.org/r/c/config/+/45283 to tweak the conditional (I suspect we cannot use job. in the config repo ?)

I'll try and test that in the next couple of days (again we need to merge and test cos trusted repo)

Revision history for this message
Marios Andreou (marios-b) wrote :
Download full text (4.0 KiB)

The fix mentioned in comment #4 https://review.rdoproject.org/r/c/config/+/45283 has worked and fixed the 8-9 job (this bug) but it has broken the 9-8 job. I'll be fixing that today before we close out this bug.

Some results from test at https://review.rdoproject.org/r/c/testproject/+/44234/11#message-7767c70714a4c6da6d546ddaa2a1e142e9571c05 - the standalone and 8-9 job are working OK but the 9-8 fails

*=* 10:42:34 *=*=*= "standalone "

        * https://logserver.rdoproject.org/34/44234/11/check/periodic-tripleo-ci-centos-9-standalone-wallaby/d998e8c/job-output.txt

"tripleo-ci-testing hash 2c487be725843634ed15a3919dd5eafa "

        * 2022-09-27 11:02:15.103483 | primary | TASK [repo-setup : tripleo-get-hash tripleo-ci-testing for wallaby on centos9 from https://trunk.rdoproject.org] ***
        * 2022-09-27 11:02:17.247124 | primary | TASK [repo-setup : Set fact dlrn_hash->2c487be725843634ed15a3919dd5eafa] *******
        * 2022-09-27 15:59:17.675019 | TASK [Report to DLRN]
        * 2022-09-27 15:59:18.188162 | primary | ++ export FULL_HASH=2c487be725843634ed15a3919dd5eafa

"right hash used for reporting"

        * 2022-09-27 15:59:19.901577 | primary | + dlrnapi --url https://trunk.rdoproject.org/api-centos9-wallaby report-result --agg-hash 2c487be725843634ed15a3919dd5eafa --job-id periodic-tripleo-ci-centos-9-standalone-wallaby --info-url https://logserver.rdoproject.org/34/44234/11/check/periodic-tripleo-ci-centos-9-standalone-wallaby/d998e8c --timestamp 1664294359 --success True

"old task executed new one skipped (correct):"

        * 2022-09-27 14:58:51.291583 | TASK [Create hash_info file for dlrn reporting]
2022-09-27 14:58:51.366373 | primary | ok

        * 2022-09-27 14:58:57.237099 | TASK [Create hash_info file for dlrn reporting - Mixed OS jobs]
2022-09-27 14:58:57.296283 | primary | skipping: Conditional result was False

*=* 10:46:19 *=*=*= " 8-9 job: "

"tripleo-ci-testing hash 68695c2658c36fd91fde9f91365c7246 "

        * https://logserver.rdoproject.org/34/44234/11/check/periodic-tripleo-ci-centos-8-9-multinode-mixed-os/bac52c4/job-output.txt
        * 2022-09-27 10:52:59.627571 | primary | TASK [repo-setup : tripleo-get-hash tripleo-ci-testing for wallaby on centos8 from https://trunk.rdoproject.org] ***
        * 2022-09-27 10:53:01.501557 | primary | TASK [repo-setup : Set fact dlrn_hash->68695c2658c36fd91fde9f91365c7246] *******

"right hash used for reporting"

        * 2022-09-27 16:31:37.704985 | TASK [Report to DLRN]
        * 2022-09-27 16:31:38.250206 | primary | ++ export FULL_HASH=68695c2658c36fd91fde9f91365c7246
        * 2022-09-27 16:31:39.788966 | primary | + dlrnapi --url https://trunk.rdoproject.org/api-centos8-wallaby report-result --agg-hash 68695c2658c36fd91fde9f91365c7246 --job-id periodic-tripleo-ci-centos-8-9-multinode-mixed-os --info-url https://logserver.rdoproject.org/34/44234/11/check/periodic-tripleo-ci-centos-8-9-multinode-mixed-os/bac52c4 --timestamp 1664296299 --success True

"old task skipped new one executed (correct):"

        * 2022-09-27 14:50:30.987650 | TASK [Create hash_info file for dlrn reporting]
2022-09-27 14:50:31.050808 | primary | skipping: Conditional result w...

Read more...

Revision history for this message
Marios Andreou (marios-b) wrote :

final patch at https://review.rdoproject.org/r/c/config/+/45323 has merged

now testing with https://review.rdoproject.org/r/c/testproject/+/44234/11#message-d6a73429839bbc51ac4798db6746bd8944d93639

it hasn't reported yet but watching the console and initial results look good

Changed in tripleo:
assignee: nobody → Marios Andreou (marios-b)
Revision history for this message
Marios Andreou (marios-b) wrote (last edit ):

we are good now so I'm moving the bug to fix-released

details on the test results (per comment #6 above) at https://review.rdoproject.org/r/c/config/+/45323/2/playbooks/tripleo-ci-periodic-base/pre.yaml

[EDIT]: edited to add the correct link to test results

Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
Marios Andreou (marios-b) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

also somewhat related so adding for context

Adds mixed_os_stable_version for component mixed os jobs https://review.rdoproject.org/r/c/rdo-jobs/+/45377

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.