build-stx-images.sh retry mechanism broken

Bug #1891189 reported by Don Penney
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Don Penney

Bug Description

Brief Description
-----------------
The retry mechanism used by build-stx-images.sh, the "with_retries" function in utils.sh, has been broken by a recent update, due to a pre-existing bug in with_retries:
https://opendev.org/starlingx/root/src/commit/b21cacbffce83179dd6e3c84141cdd1c581329f2/build-tools/build-wheels/utils.sh#L24

The "let -i" syntax is not supported. While -i is valid for "declare" and "local", it is not for "let". If a $i variable is defined when "let -i" is used, it tries to use the $i value.

A recent update to build-stx-images.sh added a function that used "i" as a variable in a loop, but did not define it as a "local".
https://review.opendev.org/740920

As a result, the last value assigned to i bleeds over into the with_retries calls, and we see errors like:

/home/localdisk/designer/jenkins/master-containers/cgcs-root/build-tools/build-docker-images/../build-wheels/utils.sh: line 24: let: wheels_alternate=http://mirror.starlingx.cengn.ca/mirror/starlingx/master/centos/stx-centos-py2_stable-wheels.tar: syntax error in expression (error token is "://mirror.starlingx.cengn.ca/mirror/starlingx/master/centos/stx-centos-py2_stable-wheels.tar")

As a result, the "attempt" variable in with_retries is never incremented.

If an error occurs that would have caused a retry, it results in infinite retries because the counter is never incremented.

Severity
--------
Major

Reproducibility
---------------
Reproducible

Branch/Pull Time/Commit
-----------------------
master, as of July 29, 2020

Test Activity
-------------
Build

Don Penney (dpenney)
Changed in starlingx:
assignee: nobody → Don Penney (dpenney)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to root (master)

Fix proposed to branch: master
Review: https://review.opendev.org/745699

Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
tags: added: stx.build
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to root (master)

Reviewed: https://review.opendev.org/745699
Committed: https://git.openstack.org/cgit/starlingx/root/commit/?id=f36db7e20726255be58ea7331922e047bda27e73
Submitter: Zuul
Branch: master

commit f36db7e20726255be58ea7331922e047bda27e73
Author: Don Penney <email address hidden>
Date: Tue Aug 11 10:36:43 2020 -0400

    Fix use of 'let -i' in scripts

    Unlike "declare -i" and "local -i", the bash "let" does not support a
    "-i" option. Rather, it takes it as a variable reference. If no "i"
    variable is defined in scope, it does not cause an issue. If "i" has
    been defined somewhere, however, it may cause a syntax issue, as the i
    is evaluated.

    A recent update to build-stx-images.sh added a loop that defines an
    "i" variable without limiting its scope. In a current image build,
    this loop ends with having "i" defined as a URL. As a result, a
    "syntax error in expression" occurs, causing the "with_retries"
    function to fail to increment the counter. Should a build error occur,
    the "with_retries" will never hit the retry limit, looping until it
    has a successful result.

    This update removes the -i from all "let -i" occurrences in the build
    scripts.

    Change-Id: I34ad49f8872a81659ff4caf8087b256ea9fb3d32
    Closes-Bug: 1891189
    Signed-off-by: Don Penney <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to root (r/stx.4.0)

Fix proposed to branch: r/stx.4.0
Review: https://review.opendev.org/746233

Revision history for this message
Ghada Khalil (gkhalil) wrote :

As per Don Penney, this is an issue is r/stx.4.0 as well as stx master. Tagging for both releases.

Changed in starlingx:
importance: Undecided → High
tags: added: not-yet-in-r-stx40 stx.4.0 stx.5.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to root (r/stx.4.0)

Reviewed: https://review.opendev.org/746233
Committed: https://git.openstack.org/cgit/starlingx/root/commit/?id=b58bd1cc5a2cbd114b166b31dc176e7e3b4af4d6
Submitter: Zuul
Branch: r/stx.4.0

commit b58bd1cc5a2cbd114b166b31dc176e7e3b4af4d6
Author: Don Penney <email address hidden>
Date: Tue Aug 11 10:36:43 2020 -0400

    Fix use of 'let -i' in scripts

    Unlike "declare -i" and "local -i", the bash "let" does not support a
    "-i" option. Rather, it takes it as a variable reference. If no "i"
    variable is defined in scope, it does not cause an issue. If "i" has
    been defined somewhere, however, it may cause a syntax issue, as the i
    is evaluated.

    A recent update to build-stx-images.sh added a loop that defines an
    "i" variable without limiting its scope. In a current image build,
    this loop ends with having "i" defined as a URL. As a result, a
    "syntax error in expression" occurs, causing the "with_retries"
    function to fail to increment the counter. Should a build error occur,
    the "with_retries" will never hit the retry limit, looping until it
    has a successful result.

    This update removes the -i from all "let -i" occurrences in the build
    scripts.

    Change-Id: I34ad49f8872a81659ff4caf8087b256ea9fb3d32
    Closes-Bug: 1891189
    Signed-off-by: Don Penney <email address hidden>
    (cherry picked from commit f36db7e20726255be58ea7331922e047bda27e73)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.