Reproducer doesn't reproduce job

Bug #1897155 reported by Sergii Golovatiuk
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
yatin

Bug Description

Reproducer script is supposed to reproduce the job but it doesn't

How to reproduce
1. Get a job you want to reproduce (In my case it's https://review.opendev.org/#/c/739457/)

2. Get the logs/reproducer directory to your libvirt server (In my case it's https://2302f195cca0bd2fbfcf-1aafb108750144ecd9565eaf1429d9e4.ssl.cf1.rackcdn.com/753807/1/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/666dcd1/logs/reproducer-quickstart/)

I got them all using

wget -r -np -nd -R "index.html*" https://2302f195cca0bd2fbfcf-1aafb108750144ecd9565eaf1429d9e4.ssl.cf1.rackcdn.com/753807/1/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/666dcd1/logs/reproducer-quickstart/

3. Start reproducer script

bash ./reproducer-zuul-based-quickstart.sh -w /var/tmp/reproduce -l -e @../extra.yaml -e os_autohold_node=true -e zuul_build_sshkey_cleanup=false -e container_mode=docker -e upstream_gerrit_user=holser -e rdo_gerrit_user=holser

The script will finish without any issues with the following info

TASK [ansible-role-tripleo-ci-reproducer : Print gerrit info] *********************************************************************************************************************************
task path: /var/tmp/reproduce/roles/ansible-role-tripleo-ci-reproducer/tasks/launch-job.yaml:145
ok: [localhost] => {
    "msg": "change I903a195f7fd5c5d098adfbc8721001432c8f45eb\n project: test1\n branch: stable/ussuri\n id: I903a195f7fd5c5d098adfbc8721001432c8f45eb\n number: 1001\n subject: Add job to launch Depends-On: https://review.opendev.org/753807\n owner:\n name: Administrator\n email: <email address hidden>\n username: admin\n url: http://localhost:8080/c/test1/+/1001\n commitMessage: Add job to launch\n Depends-On: https://review.opendev.org/753807\n \n Change-Id: I903a195f7fd5c5d098adfbc8721001432c8f45eb\n createdOn: 2020-09-24 18:27:42 UTC\n lastUpdated: 2020-09-24 18:32:17 UTC\n open: true\n status: NEW\n comments:\n timestamp: 2020-09-24 18:27:42 UTC\n reviewer:\n name: Administrator\n email: <email address hidden>\n username: admin\n message: Uploaded patch set 1.\n comments:\n timestamp: 2020-09-24 18:32:17 UTC\n reviewer:\n name: Zuul\n username: zuul\n message: Patch Set 1: Verified+1\n \n Build succeeded.\n \n - tripleo-ci-centos-8-standalone-upgrade-ussuri-dlrn-hash-tag http://localhost:8000/01/1001/1/check/tripleo-ci-centos-8-standalone-upgrade-ussuri-dlrn-hash-tag/41a4a55/ : FAILURE in 2m 29s (non-voting)\n\ntype: stats\nrowCount: 1\nrunTimeMilliseconds: 3\nmoreChanges: false"
}
META: ran handlers
META: ran handlers

PLAY RECAP ************************************************************************************************************************************************************************************
localhost : ok=154 changed=58 unreachable=0 failed=0 skipped=90 rescued=0 ignored=7

Actual result
=============

However, the job is not done even if it shows success status. If we look closer at job-output.txt
we'll see the following

2020-09-24 18:31:45.863152 | LOOP [run-test : Check overridable settings]
2020-09-24 18:31:45.917407 | primary | ERROR: Item: standalone_custom_env_files
2020-09-24 18:31:45.917647 | primary | {
2020-09-24 18:31:45.917699 | primary | "ansible_loop_var": "item",
2020-09-24 18:31:45.917736 | primary | "item": "standalone_custom_env_files",
2020-09-24 18:31:45.917768 | primary | "msg": "ERROR: standalone_custom_env_files is not overridable."
2020-09-24 18:31:45.917798 | primary | }
2020-09-24 18:31:45.918307 | primary | skipping: Conditional result was False
2020-09-24 18:31:45.932273 |
2020-09-24 18:31:45.932340 | PLAY RECAP
2020-09-24 18:31:45.932401 | primary | ok: 2 changed: 0 unreachable: 0 failed: 1 skipped: 5 rescued: 0 ignored: 0
2020-09-24 18:31:45.932441 |
2020-09-24 18:31:46.105925 | RUN END RESULT_NORMAL: [untrusted : opendev.org/openstack/tripleo-ci/playbooks/tripleo-ci/run-v3.yaml@master]
2020-09-24 18:31:46.106131 | POST-RUN START: [untrusted : opendev.org/openstack/tripleo-ci/playbooks/tripleo-ci/post.yaml@master]
2020-09-24 18:31:47.166159 |

Expected result
===============
fully working job.

summary: - Reproducer doesn
+ Reproducer doesn't reproduce job
Changed in tripleo:
status: New → Confirmed
importance: Undecided → Critical
assignee: nobody → Sergii Golovatiuk (sgolovatiuk)
Revision history for this message
Marios Andreou (marios-b) wrote :

o/ Sergii - thanks for filing this, there is definitely a bug here.

The error can be seen in this file (from your logs in the description): https://2302f195cca0bd2fbfcf-1aafb108750144ecd9565eaf1429d9e4.ssl.cf1.rackcdn.com/753807/1/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/666dcd1/logs/reproducer-quickstart/featureset-override.yaml

The standalone_custom_env_files: should not be in there it should be only the standalone_environment_files

The problem comes from there https://opendev.org/openstack/tripleo-ci/src/commit/820e3fe31e6988ae6070c319dbf8372ab1489a3b/roles/run-test/tasks/main.yaml#L43-L62 ... somewhere - perhaps you can dig a bit more with this pointer?

Hope it helps for now.

Revision history for this message
Marios Andreou (marios-b) wrote :

for a workaround in your environment if you prefer you can just manually remove the standalone_custom_env_files should work

11:43 < marios> o/ holser i had a look earlier and added the comment in https://bugs.launchpad.net/tripleo/+bug/1897155/comments/1 - I
                suspect the issue comes from
https://opendev.org/openstack/tripleo-ci/src/commit/820e3fe31e6988ae6070c319dbf8372ab1489a3b/roles/run-test/tasks/main.yaml#L43-L62
11:43 <@openstack> Launchpad bug 1897155 in tripleo "Reproducer doesn't reproduce job" [Critical,Confirmed] - Assigned to Sergii
                   Golovatiuk (sgolovatiuk)
11:44 < marios> holser: if you change in your environment that featureset-override.yaml it should work fine after that
11:44 < marios> holser: i.e.
https://2302f195cca0bd2fbfcf-1aafb108750144ecd9565eaf1429d9e4.ssl.cf1.rackcdn.com/753807/1/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/666dcd1/logs/reproducer-quickstart/featureset-override.yaml remove the standalone_custom_env_files:
11:44 < holser> yeah
11:44 < holser> that's what I think
11:44 < holser> marios thanks

Revision history for this message
Marios Andreou (marios-b) wrote :

13:14 < holser> hey marios
13:14 < holser> https://opendev.org/openstack/tripleo-ci/src/commit/820e3fe31e6988ae6070c319dbf8372ab1489a3b/roles/run-test/tasks/main.yaml#L43-L52
13:14 < holser> I can remove all that
13:15 < holser> but we use standalone_custom_env_files in different places
13:15 < holser> so
https://opendev.org/openstack/tripleo-quickstart-extras/src/branch/master/roles/standalone/templates/standalone.sh.j2#L9 won't work
13:16 < holser> so I still think that the right way to do is https://review.opendev.org/#/c/752959/2
13:17 < holser> otherwise I will need to refactor a lot of your code
13:17 < holser> the problem is we mess up with standalone_environment_files in one places and standalone_custom_env_files in another one
13:18 < holser> so we should either combine them to one env variable or allow override standalone_custom_env_files
13:19 < holser> just make sure the job returns a success result thus in many cases CI is broken just giving +1 to jobs

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart-extras (master)

Fix proposed to branch: master
Review: https://review.opendev.org/754363

Changed in tripleo:
assignee: Sergii Golovatiuk (sgolovatiuk) → yatin (yatinkarel)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.opendev.org/754363
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=fa93a6a3bc2727f56bd7c5c5c69e69600e6e15cd
Submitter: Zuul
Branch: master

commit fa93a6a3bc2727f56bd7c5c5c69e69600e6e15cd
Author: yatinkarel <email address hidden>
Date: Fri Sep 25 17:38:18 2020 +0530

    Use job's featureset_override directly

    featureset_override_file_output contain some additional vars
    like standalone_custom_env_files.
    To reproduce a job we want to have same set of override's
    given in a job so let's use job.featureset_override instead, if
    standalone_environment_files is part of featureset_override
    then standalone_custom_env_files will be calculated within
    standalone role.

    Closes-Bug: #1897155
    Change-Id: I7dbd8f1e34cabd41504150c886cbe9ff166b50ea

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.