Reproducer is broken when tries to apply jobs changes to itself

Bug #1790611 reported by Sagi (Sergey) Shnaidman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Sagi (Sergey) Shnaidman

Bug Description

download and run reproducer of this job:

http://logs.openstack.org/22/597222/3/check/tripleo-ci-centos-7-scenario003-multinode-oooq-container/8199c5f/logs/reproducer-quickstart.sh

+ [[ multinode-1ctlr-featureset018 == *\o\v\b\-\f\a\k\e\h\a\-\c\a\s\e\r\v\e\r* ]]
+ [[ multinode-1ctlr-featureset018 == *\o\v\b* ]]
+ set -x
+ export WORKSPACE
+ export ZUUL_CHANGES
+ mkdir -p /home/sshnaidm/tmp/bugsrepro
+ cd /home/sshnaidm/tmp/bugsrepro
+ rm -rf tripleo-quickstart tripleo-quickstart-extras
+ git clone https://github.com/openstack/tripleo-quickstart
Cloning into 'tripleo-quickstart'...
remote: Counting objects: 13600, done.
remote: Compressing objects: 100% (40/40), done.
remote: Total 13600 (delta 12), reused 21 (delta 2), pack-reused 13558
Receiving objects: 100% (13600/13600), 2.26 MiB | 6.57 MiB/s, done.
Resolving deltas: 100% (8297/8297), done.
+ git clone https://github.com/openstack/tripleo-quickstart-extras
Cloning into 'tripleo-quickstart-extras'...
remote: Counting objects: 12962, done.
remote: Compressing objects: 100% (9/9), done.
remote: Total 12962 (delta 1), reused 3 (delta 0), pack-reused 12953
Receiving objects: 100% (12962/12962), 2.77 MiB | 5.06 MiB/s, done.
Resolving deltas: 100% (8200/8200), done.
+ for CR in ${ZUUL_CHANGES//^/ }
+ [[ openstack/tripleo-quickstart-extras:master:refs/changes/25/597525/1 =~ ^openstack/(tripleo-quickstart|tripleo-quickstart-extras):[^:]+:(refs/changes/[[:digit:]]+/[[:digit:]]+/[[:digit:]]+) ]]
+ pushd /home/sshnaidm/tmp/bugsrepro/tripleo-quickstart-extras
+ git fetch https://git.openstack.org/openstack/tripleo-quickstart-extras refs/changes/25/597525/1
remote: Counting objects: 11, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 6 (delta 5), reused 4 (delta 3)
Unpacking objects: 100% (6/6), done.
From https://git.openstack.org/openstack/tripleo-quickstart-extras
 * branch refs/changes/25/597525/1 -> FETCH_HEAD
+ git cherry-pick --allow-empty FETCH_HEAD
error: could not apply 66f3f1e... Log collection: sorted rpm -qa for containers
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add <paths>' or 'git rm <paths>'
hint: and commit the result with 'git commit'

patch https://review.openstack.org/#/c/582963/ introduced a bug in reproducer design. Now when running reproducer script it tries to apply patch changes (like https://git.openstack.org/openstack/tripleo-quickstart-extras refs/changes/25/597525/1 in this case) to itself - to repos when reproducing code is executing from.
Needless to say that this is not required for testing quickstart and extras code in job, we tested it always and cloned everything here: https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/nodepool-setup/tasks/patch_repos.yml
This change is completely redundant and has sense only in cases we want to test especially reproducing code (for example nodepool-setup role) and NOT a job. In this case one can just insert change line after cloning extras, like:

# Clone quickstart and quickstart-extras
git clone https://github.com/openstack/tripleo-quickstart
git clone https://github.com/openstack/tripleo-quickstart-extras
pushd tripleo-quickstart-extras
git fetch https://git.openstack.org/openstack/tripleo-quickstart-extras refs/changes/63/582963/5 && git checkout FETCH_HEAD
popd

In all other cases we MUST not to use changes from job in reproducers code, as it:
a) breaks design of reproducer which is external tool for running builds
b) will cause bugs as above, when we can't reproduce passed job
c) will not allow us to run up to date reproducer in jobs
d) will not allow us to reproduce jobs which were running in time when reproducer was broken, because the patch above will bring all these breakages to reproducer script again.

Tags: quickstart
Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :
Changed in tripleo:
assignee: nobody → Sagi (Sergey) Shnaidman (sshnaidm)
Revision history for this message
Sorin Sbarnea (ssbarnea) wrote :

I am against this because I see the generated reproducer script as a self-contained solution for replicating what happened with a build on CI.

I support this based on the first two paragraphs from https://docs.openstack.org/tripleo-docs/latest/contributor/reproduce-ci.html -- or at least is how I interepret them.

I do remember that in the past when I tried to reproduce a change I discovered that it was *NOT* using the tripleo-quickstart(-extras) changes listed in ZUUL_CHANGES, but zuul was using them.... meaning that reproducer would not match the codebase used by the CI build. To test with same code we had to hack reproducer script and disable the cloning and manually cherry pick those changes.

If for some reason you want not to use the CRs code from the original build, you can easily override ZUUL_CHANGES and remove the CRs affecting these two repos which are cloned by the script.

Also, there is another aspect related to "reproductibility" which is essential as this also being in the name of the script: If I run the same script today or one month from now, I do expect it to use the same code and test the same thing as in the past. If it would point to master it would no longer test the same build, it would be just another execution which would test different code (chasing a moving target).

On the other hand I do understand when Sagi said that at some point in time the reproducing
code inside our repos was broken and preventing the full execution of the script. Well, if this is true these bugs should be fixed and the CI build retriggered in order to generate a new reproducer script.

For these reasons I proposed that instead of reverting the bugfix to implement another *optional*
CLI parameter which would tell reproducer to skip cherry picking changes related to tripleo-quickstart(-extras) after cloning. Having this, user could be able to see what would happen if the
same job would run with more recent code in 3o repos.

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

@Sorin, it does NOT replicate anything in job. It does FAIL trying to reproduce, while job itself passed. Plase see logs above.

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

The job from reproducer passed in CI. So when running reproducer I expect it to run correctly and pass, exactly as in CI. It doesn't happen because this bug in reproducer. That's it. Changing the job itself won't help here at all.

Revision history for this message
Sorin Sbarnea (ssbarnea) wrote :

This fails only because it attempts to cherry pick an older revision of a change that was merged.
597525/3 was merged but it tries to cherry pick 597525/1,... so I am not really sure this is a problem. I am more curious to find out what happens with the one that was merged.

We can alternativbely ignore the failure to cherry pick.

Revision history for this message
Sorin Sbarnea (ssbarnea) wrote :

This fails only because it attempts to cherry pick an older revision of a change that was merged.
597525/3 was merged but it tries to cherry pick 597525/1,... so I am not really sure this is a problem. I am more curious to find out what happens with the one that was merged.

We can alternativbely ignore the failure to cherry pick, and diplays a warning when it happens.

Sagi, would you find this behavior acceptable?

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

@Sorin, I made a list of why it's wrong to do, it's not about a single cherrypick. Jobs code should NOT be applied to reproducer.

Revision history for this message
Sorin Sbarnea (ssbarnea) wrote :
Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.openstack.org/599201
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=b82212cbe5b6dc1829b6d48083642a24075e4dfb
Submitter: Zuul
Branch: master

commit b82212cbe5b6dc1829b6d48083642a24075e4dfb
Author: Sagi Shnaidman <email address hidden>
Date: Sat Sep 1 21:58:38 2018 +0000

    Revert "reproducer: cherry pick zuul changes"

    This reverts commit c348e84ce1a3411707c02ca08ac606e08339277a.

    Local reproducer role should be from master, not from change. If we run
    reproducer from a month ago, it will run reproducer role from a month
    ago and most likely will fail or will have not up to date settings.
    If one want to test changes to reproducer role, it's possible to edit
    reproducer script manually.
    It doens't make sense to use reproducer code from same data that patch
    itself, if reproducer was broken then you just can't run it even now.

    Close-Bug: #1790611
    Change-Id: Ic699159da53ca6676bf4d1d845816d333893e8ec

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.