Some stx-built docker images are not properly tagged

Bug #1854869 reported by Ghada Khalil
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Don Penney

Bug Description

Brief Description
-----------------
There are some stx-built docker images that are not properly tagged in docker hub which can result in the wrong incompatible image being pulled.

This was reported on the stx-discuss mailing list with the k8s-plugins-sriov-network-device image, but an investigation should be done if other images have the same issue. This can be an issue with any image driven by the ansible bootstrap using a "latest" tag instead of a specific versioned tag.

-----------
From: Chiing-Ting Huang [mailto:<email address hidden>]
Sent: Monday, November 04, 2019 10:17 PM
To: <email address hidden>
Subject: [Starlingx-discuss] sriov-network-device container issue

Dear StarlingX Team

I have installed StarlingX R2.0 in September, and kube-sriov-device-plugin can get VF at that time.

I think the team updated starlingx/k8s-plugins-sriov-network-device docker image in October, so I cannot get VF now.

So I roll back to starlingx/k8s-plugins-sriov-network-device:rc-2.0-centos-stable-20190826T233000Z.0
-----------

Severity
--------
Major - Released stx loads can load an incompatible docker image

Steps to Reproduce
------------------
Install StarlingX 2.0 and configure the k8s sriov plugin

Expected Behavior
------------------
Expect the compatible image to be pulled by the load

Actual Behavior
----------------
In the case of stx.2.0, the wrong image was pulled and had to be rolled back manually.

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Any

Branch/Pull Time/Commit
-----------------------
Seen in stx.2.0, but expected to happen in stx.3.0 and in master.

Last Pass
---------
Not applicable

Timestamp/Logs
--------------
N/A - issue is understood

Test Activity
-------------
Other - reported on the mailing list

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Adding response from Don Penney:
-------------------------------------
From: Penney, Don [mailto:<email address hidden>]
Sent: Wednesday, November 13, 2019 1:13 PM
To: Chiing-Ting Huang; <email address hidden>
Subject: Re: [Starlingx-discuss] sriov-network-device container issue

There are two references in r/stx.2.0 that configure the image tag to be used for the k8s-plugins-sriov-network-device image:
https://opendev.org/starlingx/ansible-playbooks/src/branch/r/stx.2.0/playbookconfig/src/playbooks/bootstrap/roles/bringup-essential-services/templates/sriov-plugin.yaml.j2#L43
https://opendev.org/starlingx/config/src/branch/r/stx.2.0/puppet-manifests/src/modules/platform/templates/sriovdp-daemonset.yaml.erb#L43

It may only be the ansible-playbooks reference that has any effect, as the puppet-manifests reference may be stale code.

The k8s-cni-sriov and k8s-plugins-sriov-network-device images are built as part of the regular build, and tagged with a versioned tag and a latest tag, as with the other stx images. However, these are loaded as part of the built-in platform-integ-app and therefore uses the static image tag that’s in the repo.

What this means is that every load is referencing the master-stable-centos-latest tag when you initially bring up the system, which would be the latest weekly build at that time. This may not always work, if there is something incompatible in the new image. This is presumably what’s happened here with 2.0.

My suggestion would be to treat these images, built from upstream sources, specially and manually tag them with versions as needed, updating the static references to those tags.

In r/stx.2.0, we could update these static tags to the appropriate tag for that build - presumably the one referenced below by Ting.

Going forward, we can manually tag the image in docker hub with whatever format we choose - such as 1.0.0, or maybe whatever version corresponds to the upstream repo at that time (ie. v3.0.0). Once we’ve manually tagged the image, we push an update into the repo to update the static reference to use the new tag.

So for example:

k8s-cni-sriov: Currently building from https://github.com/intel/sriov-cni/commit/365c8f8cc1204df84f3e976ea30f113e733ca665, which looks like it’s tagged as v2.2. We could retag the current build as starlingx/k8s-cni-sriov:v2.2, then update ansible-playbooks with the new tag.

k8s-plugins-sriov-network-device: Building from: https://github.com/intel/sriov-network-device-plugin/commit/000db15405f3ce3b7c2f9feb180c3051aa3f7aea, tagged as v3.1. We could retag the current build as starlingx/k8s-plugins-sriov-network-device:v3.1, then update ansible-playbooks with the new tag.

Whenever there’s new content in these images we want to pick up, we retag the new build, and push updates to the static references.

-------------------------------------

description: updated
description: updated
tags: added: stx.config stx.containers
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Problematic Image tags:
ansible-playbooks/playbookconfig/src/playbooks/roles/bootstrap/bringup-essential-services/vars/main.yaml:
sriov_cni_img: docker.io/starlingx/k8s-cni-sriov:master-centos-stable-latest
sriov_network_device_img: docker.io/starlingx/k8s-plugins-sriov-network-device:master-centos-stable-latest

Not sure if these images in stx.3.0 pose the same issue:
ansible-playbooks/playbookconfig/src/playbooks/roles/bootstrap/plugins/templates/intel-fpga-plugin.yaml.j2:
image: "{{ docker_registry.url }}/starlingx/intel-fpga-plugin:master-distroless-stable-latest"

ansible-playbooks/playbookconfig/src/playbooks/roles/bootstrap/plugins/templates/intel-gpu-plugin.yaml.j2
image: "{{ docker_registry.url }}/starlingx/intel-gpu-plugin:master-distroless-stable-latest"

ansible-playbooks/playbookconfig/src/playbooks/roles/bootstrap/plugins/templates/intel-qat-plugin.yaml.j2
image: "{{ docker_registry.url }}/starlingx/intel-qat-plugin:master-distroless-stable-latest"

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Don Penney confirmed that the plugin images should also be updated with a specific tag instead of using -latest

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0; will likely need to be addressed in an upcoming mtce release.
Note: This is also an issue with stx.2.0. TBD if we pursue a fix for that release as well.

tags: added: stx.3.0
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Don Penney (dpenney)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)

Fix proposed to branch: master
Review: https://review.opendev.org/707462

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
Don Penney (dpenney) wrote :
Download full text (4.7 KiB)

This bug exposed an issue we have with some of our stx-built images. Certain images are referenced statically in platform-installed charts or files that use the master-centos-stable-latest tag, which is updated every time CENGN builds images, whereas the managed applications are able to use the specific image versions from build information.

To address this, I’d like to propose a tag management system to complement the image build setup.

Workflow:

We decide we want to use new functionality introduced in recent build of image Y:
• Create a Launchpad to update the image being used
• Determine an appropriate version tag to use for the image. For example, if the main project of the image is based on an upstream repo, check the version associated with the SHA used for building the image. If the upstream commit is tagged as v2.2, the image tag to use could be stx.4.0-v2.2. Include this info in the Launchpad.
• Update tag management config file, with Partial-Bug: XXXXXX referencing Launchpad.
  o Include reference to source build tag from loadbuild, using the versioned tag, such as master-centos-stable-20191203T153530Z.0
  o Include reference to upstream commit, if appropriate
  o Include new requested tag
• Once tag management config file update is reviewed and merged, tag management utility can be run - as part of CENGN loadbuild, maybe, or perhaps a separate CENGN job that can be triggered by the merge
• Once the new tag has been pushed to the docker hub, a follow-up commit can be posted to update the chart or other reference to move to the new image tag, with Closes-Bug in commit message

The only cases where we should see charts/manifests or other references to master-centos-stable-latest tag would be files that are inputs to the application build that would be replacing those tags with the specific versioned tags from the loadbuild and are therefore outside the scope of this procedure.

This ensures a load is locked down to a specific version of such images, rather than floating to use latest, avoiding compatibility issues that may arise (as in the case of LP 1854869).

Example tag management yaml file for stx-4.0:

images:
  - name: docker.io/starlingx/k8s-cni-sriov
    src_build_tag: master-centos-stable-20191203T153530Z.0
    src_ref: https://opendev.org/starlingx/integ/commit/dac417bd31ed36d455e94db4aabe5916367654d4
    # Tag determined based on release tag associated with upstream commit
    tag: stx.4.0-v2.2
  - name: docker.io/starlingx/k8s-plugins-sriov-network-device
    src_build_tag: master-centos-stable-20191203T153530Z.0
    src_ref: https://opendev.org/starlingx/integ/commit/dac417bd31ed36d455e94db4aabe5916367654d4
    # Tag determined based on release tag associated with upstream commit
    tag: stx.4.0-v3.1
  - name: docker.io/starlingx/intel-fpga-admissionwebhook
    src_build_tag: master-distroless-stable-20191203T153530Z.0
    src_ref: https://opendev.org/starlingx/integ/commit/5f72ddb26a38d41fef919060585daaafae677433
    # Version determined by running 'git describe --tags' in clone of upstream repo
    tag: stx.4.0-v0.11.0-103-g4f28657
  - name: docker.io/starlingx/intel-fpga-initcontainer
    src_build_tag: master-dist...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to root (master)

Fix proposed to branch: master
Review: https://review.opendev.org/707465

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/707462
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=f4baa947f7643d0d437e63c83725c4c9c24f27c8
Submitter: Zuul
Branch: master

commit f4baa947f7643d0d437e63c83725c4c9c24f27c8
Author: Don Penney <email address hidden>
Date: Wed Feb 12 14:48:15 2020 -0500

    Replace image stable-latest tags with static versions

    In order to avoid compatiblity issues from using a moving
    stable-latest tag on referenced images, the playbooks are updated to
    use static version tags for those images built by CENGN.

    Change-Id: I336ec16ec6c8d7581f547efa77bd75f2c1b4726b
    Partial-Bug: 1854869
    Signed-off-by: Don Penney <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to root (master)

Reviewed: https://review.opendev.org/707465
Committed: https://git.openstack.org/cgit/starlingx/root/commit/?id=921cb526ffd9fe9d6dddcf7253bf0812d7299350
Submitter: Zuul
Branch: master

commit 921cb526ffd9fe9d6dddcf7253bf0812d7299350
Author: Don Penney <email address hidden>
Date: Wed Feb 12 14:51:52 2020 -0500

    Add tag-management utility

    This commit introduces a tag management utility for images to allow
    retagging of specific images. This allows the StarlingX ansible
    playbooks and packaged charts to use a static image tag rather than the
    moving stable-latest tag.

    This utility can only be run by a StarlingX build team member with
    publishing capability on the docker hub, to allow for updated tags to
    be available publicly, or as part of a CENGN build job.

    Change-Id: I7659695c701ccf3bfe8255b24ad0b1896b59cc23
    Partial-Bug: 1854869
    Signed-off-by: Don Penney <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to root (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/716131

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/716133

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (f/centos8)
Download full text (12.5 KiB)

Reviewed: https://review.opendev.org/716133
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=ddcb11f4b773f4b3190663defe3ba0f3ec4201c8
Submitter: Zuul
Branch: f/centos8

commit bf103f3c54eb45c26d52a43c35339d1d863a42de
Author: Mihnea Saracin <email address hidden>
Date: Fri Mar 27 18:19:02 2020 +0200

    Fix B&R when the controller needs to be unlocked

    After running the restore playbook, all the applications
    should be in an uploaded state. But they are in an
    applied state instead, making the controller-0
    unable to unlock.

    Closes-Bug: 1869403
    Change-Id: I8bd9c51e250969cc334d52b78c616f9ad082afd8
    Signed-off-by: Mihnea Saracin <email address hidden>

commit 6e875971afeaf1378c2c8aeb845359459838ce30
Author: Stefan Dinescu <email address hidden>
Date: Sat Mar 21 16:57:57 2020 +0200

    Fix Netapp port conflict

    By default, the Trident Netapp service opens port 8443 for
    HTTPS REST api usage. This conflicts with the port the
    Horizon dashboard uses on an HTTPS enabled setup (the port
    is also 8443).

    In order to fix this, we change the default port from 8443
    to 8678, but also make it configurable through ansible
    overrides.

    The Trident service also opens port 8001 for metrics usage.
    While that doesn't currently conflict with any other service
    on the system, I also made that configurable through
    ansible overrides, in case such a conflict appears in the
    future.

    Change-Id: I08db939acac6082f82b9e12e932d8289c7cecdeb
    Closes-bug: 1868382
    Signed-off-by: Stefan Dinescu <email address hidden>

commit 5a9ba6786e393f2cd93bfae8c3a8f09f0cf9eb26
Author: Robert Church <email address hidden>
Date: Thu Mar 19 19:08:17 2020 -0400

    Upversion Multus to 3.4

    Updates the Multus configuration to align with version 3.4

    Change-Id: Ifc236ccbbe4e559987d7ef522902f638062348ca
    Depends-On: https://review.opendev.org/#/c/714024/
    Story: 2006999
    Task: 39110
    Signed-off-by: Robert Church <email address hidden>

commit 6a261463f9ac0f81d9c7f054dd3cb10a51934d4a
Author: Robert Church <email address hidden>
Date: Wed Mar 18 22:01:03 2020 -0400

    Upversion Calico from 3.6 to 3.12

    Updates the Calico configuration to align with version 3.12. This
    introduces support for a Flex Volume Driver which requires enabling the
    --volume-plugin-dir option for kubelet, the --flex-volume-plugin-dir
    option for kube-controller-manager, and pulling the pod2daemon-flexvol
    image used by calico-node pods.

    Change-Id: I74bc5c53ffcb16c8e3c06cebf20eac296b9ccc65
    Story: 2006999
    Task: 39109
    Depends-On: https://review.opendev.org/#/c/714023
    Signed-off-by: Robert Church <email address hidden>

commit b35387f8bc40714e9633e6191267284b8af8ccee
Author: Stefan Dinescu <email address hidden>
Date: Thu Mar 19 18:13:26 2020 +0200

    Netapp: Fix handling of IPv6 addresses

    Using bash process subtitution to pass the file parameter
    to the "create backend" command doesn't work as the bash
    variable expansion...

tags: added: in-f-centos8
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to root (f/centos8)
Download full text (11.6 KiB)

Reviewed: https://review.opendev.org/716131
Committed: https://git.openstack.org/cgit/starlingx/root/commit/?id=e5faf8bd5cebfa275823de8cc10be1abbdc2f8f8
Submitter: Zuul
Branch: f/centos8

commit b31cd4b01ee0e6c02cc35094b17aef0edd4f6071
Author: Don Penney <email address hidden>
Date: Wed Mar 25 15:19:07 2020 -0400

    Update tag for stx-oidc-client to stx.4.0-v1.0.2

    Change-Id: I6deab3e81b2210968c93f82d667003e5805a1136
    Partial-Bug: 1868895
    Signed-off-by: Don Penney <email address hidden>

commit bd0d9015fa095948443e29f40a701bc3f81c328f
Author: Don Penney <email address hidden>
Date: Wed Mar 18 10:56:08 2020 -0400

    Add managed tag for stx-platformclients image

    Change-Id: Id37f2eaf5bacaad941a7abaf835d3c7204d7ca97
    Story: 2006711
    Task: 39096
    Signed-off-by: Don Penney <email address hidden>

commit f5985aa83ad6c5275f97db53db9a095a1f4a1bb1
Author: Davlet Panech <email address hidden>
Date: Wed Mar 18 16:29:40 2020 -0400

    Use private yum cache dir in build scripts

    YUM & friends use a cache directory named /tmp/yum-$USER-xxx or
    similar, even when used concurrently e.g. when bulding unrelated
    source trees. This causes interference between CI pipelines.

    This patch sets TMPDIR=$MY_WORKSPACE/tmp when calling repoquery,
    etc.

    Change-Id: Ieeb4d6fd9447f1c2988380c3427975893be365a5
    Closes-Bug: 1867817
    Signed-off-by: Davlet Panech <email address hidden>

commit 7dc6b23b517d4d14cad38f29d47af64e66e105b9
Author: Stefan Dinescu <email address hidden>
Date: Wed Mar 18 16:23:59 2020 +0200

    Support custom docker registries for remote cli

    The build script for remote cli was receiving as parameters
    only the desired tag for the clients docker images; the docker
    images were always pulled from docker.io/starlinx.

    Nowthe build script take the full link of the docker image
    as parameter, so we can use any docker registry as source
    for the client images.

    Change-Id: Ibc4584a9401c1ff30f8e657ca27d31b35c4e94d5
    Depends-On: https://review.opendev.org/#/c/713665/
    Story: 2006711
    Task: 39095
    Signed-off-by: Stefan Dinescu <email address hidden>

commit 3b0e2dd4ea92b4fe5b53b1fd4b38a5e673e9c92c
Author: Davlet Panech <email address hidden>
Date: Mon Mar 9 14:54:10 2020 -0400

    Fixed broken setuptools import

    pyScss tries to import a module called "Feature" from setuptools,
    which was removed from setuptools 46.0.0 (mar 8 2020). This causes
    compilation erros. The fix updates "pip install" in Dockerfile
    to install setuptools < 46.0.0

    Change-Id: Ib5c00aafb934efaf1413e72ede30638f2dc35230
    Signed-off-by: Davlet Panech <email address hidden>
    Closes-Bug: 1866699

commit 3a95ab163f4cb4371fa4c6a94f5f64e27b1980f2
Author: Don Penney <email address hidden>
Date: Wed Mar 4 10:59:37 2020 -0500

    Add tag for rvmc image

    Update tag management file to publish a static tag for the rvmc image.

    Change-Id: I459e4f676ffa0a28c2c1fca5d61df16f499ded93
    Story: 2006980
    Task: 38935
    Signed-off-by: Don Penney <don.penney...

Revision history for this message
Ghada Khalil (gkhalil) wrote :

This has been addressed in stx master through the image tagging mechanism introduced by Don Penney. Therefore, this LP should be marked as Fix Released.

The only follow-up is to check if the required changes were ported in the r/stx.3.0 branch or not yet.

Changed in starlingx:
status: In Progress → Fix Committed
status: Fix Committed → In Progress
Ghada Khalil (gkhalil)
Changed in starlingx:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (r/stx.3.0)

Fix proposed to branch: r/stx.3.0
Review: https://review.opendev.org/746616

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (r/stx.3.0)

Reviewed: https://review.opendev.org/746616
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=99bee502b39311ca0f036b834d8a7accd7086261
Submitter: Zuul
Branch: r/stx.3.0

commit 99bee502b39311ca0f036b834d8a7accd7086261
Author: Don Penney <email address hidden>
Date: Wed Feb 12 14:48:15 2020 -0500

    Replace image stable-latest tags with static versions

    In order to avoid compatiblity issues from using a moving
    stable-latest tag on referenced images, the playbooks are updated to
    use static version tags for those images built by CENGN.

    Conflicts:
            playbookconfig/src/playbooks/roles/common/push-docker-images/vars/k8s-v1.16.2/system-images.yml

    Change-Id: I4add70fbc5b621e84d06044ef30f0a0bff6ca3f8
    Partial-Bug: 1854869
    Signed-off-by: Don Penney <email address hidden>
    (cherry picked from commit f4baa947f7643d0d437e63c83725c4c9c24f27c8)

Don Penney (dpenney)
Changed in starlingx:
status: Fix Committed → Fix Released
Ghada Khalil (gkhalil)
tags: added: in-r-stx30
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.