tools: Dockerfile: yum install silently ignores errors

Bug #1912682 reported by Davlet Panech
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Davlet Panech

Bug Description

Brief Description
-----------------
tb.sh sometimes silently ignores "yum install" errors in its Dockerfile; the symptoms are different depending on which package(s) failed to install. This is caused by a (mis-)feature of rpm/yum: when installing multiple packages in one transaction, it reports success if at least one package installed successfully.

In particular, it occasionally fails to install packages from CENGN mirrors, if that site is down.

Severity
--------
Minor

Steps to Reproduce
------------------
Run "tb.sh create"

Expected Behavior
------------------
tb.sh succeeds

Actual Behavior
----------------
Depending on which packages failed to install:
- tb.sh seemingly succeeds, but some packages are missing in the container
- tb.sh fails with seemingly unrelated errors stemming from subsequent commands in Dockerfile

Reproducibility
---------------
Intermittent

System Configuration
--------------------
N/A

Branch/Pull Time/Commit
-----------------------
Branch: master
Date: 2021-01-21 13:00:00 -0500

Last Pass
---------
N/A

Timestamp/Logs
--------------
The following docker step seemingly succeeds (but not really)
...
Step 15/49 : RUN groupadd -g 751 cgts && echo "mock:x:751:root" >> /etc/group && echo "mockbuild:x:9001:" >> /etc/group && yum install -y anaconda anaconda-runtime autoconf-archive autogen automake bc bind bind-utils bison cpanminus createrepo createrepo_c deltarpm docker-client expat-devel flex isomd5sum gcc gettext git libguestfs-tools libtool libxml2 lighttpd lighttpd-fastcgi lighttpd-mod_geoip net-tools mkisofs http://mirror.starlingx.cengn.ca/mirror/centos/epel/dl.fedoraproject.org/pub/epel/7/x86_64/Packages/m/mock-1.4.16-1.el7.noarch.rpm http://mirror.starlingx.cengn.ca/mirror/centos/epel/dl.fedoraproject.org/pub/epel/7/x86_64/Packages/m/mock-core-configs-31.6-1.el7.noarch.rpm mongodb mongodb-server pax perl-CPAN python-deltarpm python-pep8 python-pip python-psutil python2-psutil python36-psutil python3-devel python-sphinx python-subunit python-testrepository python-tox python-yaml python2-ruamel-yaml postgresql qemu-kvm quilt rpm-build rpm-sign rpm-python squashfs-tools sudo systemd syslinux udisks2 vim-enhanced wget
...

followed by this error:

Step 31/49 : RUN useradd -s /sbin/nologin -u 9001 -g 9001 mockbuild && rmdir /var/lib/mock && ln -s /localdisk/loadbuild/mock /var/lib/mock && rmdir /var/cache/mock && ln -s /localdisk/loadbuild/mock-cache /var/cache/mock && echo "config_opts['use_nspawn'] = False" >> /etc/mock/site-defaults.cfg && echo "config_opts['rpmbuild_networking'] = True" >> /etc/mock/site-defaults.cfg && echo >> /etc/mock/site-defaults.cfg
 ---> Running in a5bbe1a983a4
rmdir: failed to remove '/var/lib/mock': No such file or directory

Test Activity
-------------
Build

Workaround
----------
Retry the build

Ghada Khalil (gkhalil)
tags: added: stx.build
removed: build
Revision history for this message
Ghada Khalil (gkhalil) wrote :

minor - This tracks better build return errors if the mirror is down.

Changed in starlingx:
importance: Undecided → Low
status: New → Triaged
assignee: nobody → Davlet Panech (dpanech)
Revision history for this message
Davlet Panech (dpanech) wrote :
Changed in starlingx:
status: Triaged → In Progress
status: In Progress → Fix Committed
Revision history for this message
Davlet Panech (dpanech) wrote :

This sometimes causes build errors that are difficult to fix. Retrying the build doesn't help, because the failed "yum install" gets cached by docker.

The real workaround is to re-run "docker build" with "--no-cache" option, but tb.sh doesn't support this.

Other caveats: this command:

   yum install ... http://host/path/to/file.rpm ...

Fails if /etc/yum.conf has "skip_missing_names_on_install=0" --AND-- the reason for the error is HTTP 404. But "host" doesn't respond at all, it's not considered an error.

So a RUN step from dockerfile may get cached and subsequent build attempts will keep failing.

Suggest to increase Importance of this bug.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tools (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/tools/+/786140

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tools (f/centos8)

Reviewed: https://review.opendev.org/c/starlingx/tools/+/786140
Committed: https://opendev.org/starlingx/tools/commit/b1119360cb90b186ea6e3c030c96dd924eaa9c08
Submitter: "Zuul (22348)"
Branch: f/centos8

commit b1119360cb90b186ea6e3c030c96dd924eaa9c08
Author: Davlet Panech <email address hidden>
Date: Wed Jan 27 19:06:41 2021 -0500

    Dockerfile: fail in "yum install" on missing packages

    By default "yum install" ignores packages that can't be downloaded and
    return 0 to the shell.

    In a Dockerfile such falsely successful commands are cached by
    "docker build", so that the next time we run "docker build" the
    "yum install" is not even attempted:

      # this "succeeds" and gets cached during the first "docker build",
      # even if the URL returns an error (eg DNS error or HTTP 404).
      # During a second "docker build" this bringis back the FS layer
      # from the cache and doesn't attempt to re-download the package
      # from that URL
      RUN yum install python3 http://some/url.rpm
      ...

    This patch avoids the problem in case CENGN mirror is down.

    Closes-Bug: 1912682
    Signed-off-by: Davlet Panech <email address hidden>
    (cherry picked from commit 511859ef18129b3ac830c9f609f0324996e7c432)
    Change-Id: Ia4abcce3e2bc697e01aa194ebbf3146959f827e6

tags: added: in-f-centos8
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tools (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/tools/+/792229

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tools (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/tools/+/792229
Reason: Updated merge coming

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tools (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/tools/+/793627

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tools (f/centos8)
Download full text (30.4 KiB)

Reviewed: https://review.opendev.org/c/starlingx/tools/+/793627
Committed: https://opendev.org/starlingx/tools/commit/d701c6f896dfe440566cc942e3dd71be1f19ae5d
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 7b5f3a45e663866a3c0ca3ca86eb3c92bc7f0210
Author: Scott Little <email address hidden>
Date: Wed May 5 09:56:33 2021 -0400

    fix bad flockflock url pt 2

    A stray '}' character found it's way into my prior update
    titled 'fix bad flockflock url' after testing. The result was
    the following error

    sed: -e expression #1, char 15: unexpected `}'

    This removes the unwanted '}', restoring the prior update
    to its intended form.

    Closes-bug: 1926987
    Signed-off-by: Scott Little <email address hidden>
    Change-Id: I48f4721ccaf121679916b01747243deedf5836cd

commit ac05493480f6df6f31d071d29380c1b4f35b70a9
Author: Scott Little <email address hidden>
Date: Tue May 4 12:42:36 2021 -0400

    fix git-review within docker build environment

    'tb create' fails to create a build environment since
    upstream git-review was updated of Apr 26.

    Fix is to install/update pbr ahead of git-review.

    Also, to reduce the likelyhood of this recurring, lock
    down specific versions of the pypi supplied tools we
    know to work.

    Closes-bug: 1927137
    Signed-off-by: Scott Little <email address hidden>
    Change-Id: Ib9fe6fd33de4d637f254ac421cc0427ee6131b65

commit b96ebc83d859a4a7802a462504817ecec6182a7b
Author: Scott Little <email address hidden>
Date: Mon May 3 13:16:53 2021 -0400

    fix bad flockflock url

    download_mirror.sh fails due to a bad path containing
    ‘stx-tools/centos-mirror-tools/config/centos/flockflock’

    The path is constructed, and the trigger is when an EOL is missing
    from a centos_build_layer.cfg file, causing 'cat' to merge the last
    line of the offending file with the first line of the next file.

    Switch 'cat' to 'grep', which will always ensure an EOL is present.
    Along the way, we can filter out empty lines and comments.

    Closes-bug: 1926987
    Signed-off-by: Scott Little <email address hidden>
    Change-Id: I2404b3415f0f3e2f395c2bcb7a527aa01a488f61

commit 4c3ee114bcbff710c2049626044dd1ddc756cbd9
Author: Joe Slater <email address hidden>
Date: Tue Apr 27 18:50:53 2021 -0400

    screen: fix CVE-2021-26937 segfault

    Advance to screen-4.1.0-0.27.20120314git3c2946.el7_9.x86_64.rpm.

    Closes-bug: 1926372
    Change-Id: I41834e7b1e16153b0632751f59f7ac9f503389da
    Signed-off-by: Joe Slater <email address hidden>

commit e31e0dda7a4c09143d41cd518ab97ea6112d7fb5
Author: Li Zhou <email address hidden>
Date: Tue Apr 13 04:53:50 2021 -0400

    systemd: Upgrade to version 219-78.el7_9.3

    Refer the lst entries to the new version.

    Partial-Bug: #1924691
    Signed-off-by: Li Zhou <email address hidden>
    Change-Id: I557eff6a47f341cc67de02fd59024b28bb6cac84

commit 26db2859dd3a5c060c337b886fd16c4d2d9f93af
Author: Scott Little <email address hidden>
Date: Mon Apr 12 11:21:31 2021 -0400

    Replace basearch references in y...

Ghada Khalil (gkhalil)
Changed in starlingx:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.