intel-fpga/intel-gpu/intel-qat: docker images build errors

Bug #1927153 reported by Davlet Panech
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Davlet Panech

Bug Description

Brief Description
-----------------
The following images fail to build in stx/5.0

intel-fpga-admissionwebhook
intel-fpga-initcontainer
intel-fpga-plugin
intel-gpu-plugin
intel-qat-plugin

It seems that the SSL certs used in the base docker image, clearlinux/golang:latest, are expired:

<email address hidden>: Get "https://proxy.golang.org/github.com/fsnotify/fsnotify/@v/v1.4.7.mod": x509: certificate signed by unknown authority

Its unclear whether this affects the master branch.

Severity
--------
Critical

Steps to Reproduce
------------------
Try to build docker images using build build-stx-images.sh

Expected Behavior
-----------------
The build succeeds

Actual Behavior
----------------
The build fails

Reproducibility
---------------
Reproducible

System Configuration
--------------------
N/A

Branch/Pull Time/Commit
-----------------------
stx.5.0/May 4, 2021 10:16:43 AM

Last Pass
---------
stx.5.0/Apr 20, 2021 8:45:29 PM

Timestamp/Logs
--------------

From here: http://mirror.starlingx.cengn.ca/mirror/starlingx/rc/5.0/centos/containers/20210504T133811Z/logs/jenkins-STX_build_docker_flock_images-371.log.html

Running: bash build-intel-device-plugins-image.sh intel-qat-plugin jenkins/intel-qat-plugin:rc-5.0-distroless-stable-build
Sending build context to Docker daemon 17.89MB
Step 1/17 : ARG CLEAR_LINUX_BASE=clearlinux/golang:latest
Step 2/17 : FROM ${CLEAR_LINUX_BASE} as builder
latest: Pulling from clearlinux/golang
Digest: sha256:de04096642acfc3e9836b20e459bfb2fda59ccec4980e9619a6cccc8c5eaf20f
Status: Image is up to date for clearlinux/golang:latest
---> ab0a8596031f
Step 3/17 : ARG CLEAR_LINUX_VERSION=
---> Using cache
---> 7fb7360b2dd8
Step 4/17 : ARG TAGS_KERNELDRV=
---> Using cache
---> 743beab3f19e
Step 5/17 : RUN swupd update --no-boot-update ${CLEAR_LINUX_VERSION}
---> Using cache
---> ea3ca093bb39
Step 6/17 : RUN mkdir /install_root && swupd os-install ${CLEAR_LINUX_VERSION} --path /install_root --statedir /swupd-state --bundles=os-core$(test -z "${TAGS_KERNELDRV}" || echo ",libstdcpp") --no-boot-update && rm -rf /install_root/var/lib/swupd/*
---> Using cache
---> 9cbd239f31a4
Step 7/17 : ARG QAT_DRIVER_RELEASE="qat1.7.l.4.6.0-00025"
---> Using cache
---> 89a2cd242c09
Step 8/17 : RUN test -z "${TAGS_KERNELDRV}" || ( swupd bundle-add wget c-basic && mkdir -p /usr/src/qat && cd /usr/src/qat && wget https://01.org/sites/default/files/downloads/${QAT_DRIVER_RELEASE}.tar.gz && tar xf *.tar.gz && cd /usr/src/qat/quickassist/utilities/adf_ctl && make KERNEL_SOURCE_DIR=/usr/src/qat/quickassist/qat && install -D adf_ctl /install_root/usr/local/bin/adf_ctl )
---> Using cache
---> 69a6d29c2059
Step 9/17 : ARG DIR=/intel-device-plugins-for-kubernetes
---> Using cache
---> 351bf399705e
Step 10/17 : WORKDIR $DIR
---> Using cache
---> 76a71b00d168
Step 11/17 : COPY . .
---> Using cache
---> d2461c5f9106
Step 12/17 : RUN cd cmd/qat_plugin; echo "build tags: ${TAGS_KERNELDRV}"; go install -tags "${TAGS_KERNELDRV}"
---> Running in 202ab9e47742
build tags:
go: <email address hidden>: Get "https://proxy.golang.org/github.com/fsnotify/fsnotify/@v/v1.4.7.mod";: x509: certificate signed by unknown authority
The command '/bin/sh -c cd cmd/qat_plugin; echo "build tags: ${TAGS_KERNELDRV}"; go install -tags "${TAGS_KERNELDRV}"' returned a non-zero code: 1
make: *** [intel-qat-plugin] Error 1
Failed to make intel-qat-plugin. Aborting...
Command (bash) failed, attempt 5 of 5.
Max command attempts reached. Aborting...
Failed to build intel-qat-plugin... Aborting

Test Activity
-------------
Build

Workaround
----------
None

Revision history for this message
Davlet Panech (dpanech) wrote :

There seems to be yet another bug in clearlix/golang: https://github.com/clearlinux/distribution/issues/2349 . Not sure if this is related to the SSL certs problem.

Revision history for this message
Davlet Panech (dpanech) wrote :

Looks like the error goes away when using clearlinux/golang:1.15.10 docker image, rather than latest

Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.6.0 stx.build
Revision history for this message
Ghada Khalil (gkhalil) wrote :

screening: stx.6.0 / medium priority - the failing images are not required as there are no code changes since the end of 2019. Older image tags are used successfully. Therefore, this will not hold up stx.5.0, but should be fixed in the active branch for the next release in case these images have future code changes and need to be re-built.

Ghada Khalil (gkhalil)
summary: - intel-fpga: docker images build errors
+ intel-fpga/intel-gpu/intel-qat: docker images build errors
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/789849

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/789849
Committed: https://opendev.org/starlingx/integ/commit/b1ac60470315153dc9bc03f7f0bb1bfb221f6c5d
Submitter: "Zuul (22348)"
Branch: master

commit b1ac60470315153dc9bc03f7f0bb1bfb221f6c5d
Author: Davlet Panech <email address hidden>
Date: Wed May 5 10:42:56 2021 -0400

    Pin clearlinux/golang to v1.15.10 in Dockerfiles

    Upstream Dockerfiles use clearlinux/golang:latest as the base, which is
    broken as of now. Solution: change it to last known working tag before
    building.

    Closes-Bug: 1927153
    Signed-off-by: Davlet Panech <email address hidden>
    Change-Id: Ic13973c0518eeab74ec86884036d08c2b8a4961f

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Davlet Panech (dpanech)
Revision history for this message
Davlet Panech (dpanech) wrote :

Ghada: should we cherry pick this to 5.0?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (r/stx.5.0)

Fix proposed to branch: r/stx.5.0
Review: https://review.opendev.org/c/starlingx/integ/+/790074

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Davlet, Yes please go ahead. Although this is not strictly impacting the stx.5.0 release, it would be good to have clean docker image builds from the release branch

tags: added: stx.5.0 stx.cherrypickneeded
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (r/stx.5.0)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/790074
Committed: https://opendev.org/starlingx/integ/commit/daf542a9ad22bc65baef1315461782a83f22ea7c
Submitter: "Zuul (22348)"
Branch: r/stx.5.0

commit daf542a9ad22bc65baef1315461782a83f22ea7c
Author: Davlet Panech <email address hidden>
Date: Wed May 5 10:42:56 2021 -0400

    Pin clearlinux/golang to v1.15.10 in Dockerfiles

    Upstream Dockerfiles use clearlinux/golang:latest as the base, which is
    broken as of now. Solution: change it to last known working tag before
    building.

    Closes-Bug: 1927153
    Signed-off-by: Davlet Panech <email address hidden>
    Change-Id: Ic13973c0518eeab74ec86884036d08c2b8a4961f
    (cherry picked from commit b1ac60470315153dc9bc03f7f0bb1bfb221f6c5d)

Bill Zvonar (billzvonar)
tags: added: in-r-stx50
removed: stx.cherrypickneeded
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/integ/+/793754

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (f/centos8)
Download full text (37.0 KiB)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/793754
Committed: https://opendev.org/starlingx/integ/commit/a13966754d4e19423874ca31bf1533f057380c52
Submitter: "Zuul (22348)"
Branch: f/centos8

commit b310077093fd567944c6a46b7d0adcabe1f2b4b9
Author: Mihnea Saracin <email address hidden>
Date: Sat May 22 18:19:54 2021 +0300

    Fix resize of filesystems in puppet logical_volume

    After system reinstalls there is stale data on the disk
    and puppet fails when resizing, reporting some wrong filesystem
    types. In our case docker-lv was reported as drbd when
    it should have been xfs.

    This problem was solved in some cases e.g:
    when doing a live fs resize we wipe the last 10MB
    at the end of partition:
    https://opendev.org/starlingx/stx-puppet/src/branch/master/puppet-manifests/src/modules/platform/manifests/filesystem.pp#L146

    Our issue happened here:
    https://opendev.org/starlingx/stx-puppet/src/branch/master/puppet-manifests/src/modules/platform/manifests/filesystem.pp#L65
    Resize can happen at unlock when a bigger size is detected for the
    filesystem and the 'logical_volume' will resize it.
    To fix this we have to wipe the last 10MB of the partition after the
    'lvextend' cmd in the 'logical_volume' module.

    Tested the following scenarios:

    B&R on SX with default sizes of filesystems and cgts-vg.

    B&R on SX with with docker-lv of size 50G, backup-lv also 50G and
    cgts-vg with additional physical volumes:

    - name: cgts-vg
        physicalVolumes:
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 50
        type: partition
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 30
        type: partition
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-3.0
        type: disk

    B&R on DX system with backup of size 70G and cgts-vg
    with additional physical volumes:

    physicalVolumes:
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 50
        type: partition
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 30
        type: partition
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-3.0
        type: disk

    Closes-Bug: 1926591
    Change-Id: I55ae6954d24ba32e40c2e5e276ec17015d9bba44
    Signed-off-by: Mihnea Saracin <email address hidden>

commit 3225570530458956fd642fa06b83360a7e4e2e61
Author: Mihnea Saracin <email address hidden>
Date: Thu May 20 14:33:58 2021 +0300

    Execute once the ceph services script on AIO

    The MTC client manages ceph services via ceph.sh which
    is installed on all node types in
    /etc/service.d/{controller,worker,storage}/ceph.sh

    Since the AIO controllers have both controller and worker
    personalities, the MTC client will execute the ceph script
    twice (/etc/service.d/worker/ceph.sh,
    /etc/service.d/controller/ceph.sh).
    This behavior will generate some issues.

    We fix this by exiting the ceph script if it is the one from
    /etc/services.d/worker on AIO systems.

    Closes-Bug: 1928934
    Change-Id: I3e4dc313cc3764f870b8f6c640a60338...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.