Some stx-built docker images are not properly tagged

Bug #1854869 reported by Ghada Khalil on 2019-12-02
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
High
Don Penney

Bug Description

Brief Description
-----------------
There are some stx-built docker images that are not properly tagged in docker hub which can result in the wrong incompatible image being pulled.

This was reported on the stx-discuss mailing list with the k8s-plugins-sriov-network-device image, but an investigation should be done if other images have the same issue. This can be an issue with any image driven by the ansible bootstrap using a "latest" tag instead of a specific versioned tag.

-----------
From: Chiing-Ting Huang [mailto:<email address hidden>]
Sent: Monday, November 04, 2019 10:17 PM
To: <email address hidden>
Subject: [Starlingx-discuss] sriov-network-device container issue

Dear StarlingX Team

I have installed StarlingX R2.0 in September, and kube-sriov-device-plugin can get VF at that time.

I think the team updated starlingx/k8s-plugins-sriov-network-device docker image in October, so I cannot get VF now.

So I roll back to starlingx/k8s-plugins-sriov-network-device:rc-2.0-centos-stable-20190826T233000Z.0
-----------

Severity
--------
Major - Released stx loads can load an incompatible docker image

Steps to Reproduce
------------------
Install StarlingX 2.0 and configure the k8s sriov plugin

Expected Behavior
------------------
Expect the compatible image to be pulled by the load

Actual Behavior
----------------
In the case of stx.2.0, the wrong image was pulled and had to be rolled back manually.

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Any

Branch/Pull Time/Commit
-----------------------
Seen in stx.2.0, but expected to happen in stx.3.0 and in master.

Last Pass
---------
Not applicable

Timestamp/Logs
--------------
N/A - issue is understood

Test Activity
-------------
Other - reported on the mailing list

Ghada Khalil (gkhalil) wrote :

Adding response from Don Penney:
-------------------------------------
From: Penney, Don [mailto:<email address hidden>]
Sent: Wednesday, November 13, 2019 1:13 PM
To: Chiing-Ting Huang; <email address hidden>
Subject: Re: [Starlingx-discuss] sriov-network-device container issue

There are two references in r/stx.2.0 that configure the image tag to be used for the k8s-plugins-sriov-network-device image:
https://opendev.org/starlingx/ansible-playbooks/src/branch/r/stx.2.0/playbookconfig/src/playbooks/bootstrap/roles/bringup-essential-services/templates/sriov-plugin.yaml.j2#L43
https://opendev.org/starlingx/config/src/branch/r/stx.2.0/puppet-manifests/src/modules/platform/templates/sriovdp-daemonset.yaml.erb#L43

It may only be the ansible-playbooks reference that has any effect, as the puppet-manifests reference may be stale code.

The k8s-cni-sriov and k8s-plugins-sriov-network-device images are built as part of the regular build, and tagged with a versioned tag and a latest tag, as with the other stx images. However, these are loaded as part of the built-in platform-integ-app and therefore uses the static image tag that’s in the repo.

What this means is that every load is referencing the master-stable-centos-latest tag when you initially bring up the system, which would be the latest weekly build at that time. This may not always work, if there is something incompatible in the new image. This is presumably what’s happened here with 2.0.

My suggestion would be to treat these images, built from upstream sources, specially and manually tag them with versions as needed, updating the static references to those tags.

In r/stx.2.0, we could update these static tags to the appropriate tag for that build - presumably the one referenced below by Ting.

Going forward, we can manually tag the image in docker hub with whatever format we choose - such as 1.0.0, or maybe whatever version corresponds to the upstream repo at that time (ie. v3.0.0). Once we’ve manually tagged the image, we push an update into the repo to update the static reference to use the new tag.

So for example:

k8s-cni-sriov: Currently building from https://github.com/intel/sriov-cni/commit/365c8f8cc1204df84f3e976ea30f113e733ca665, which looks like it’s tagged as v2.2. We could retag the current build as starlingx/k8s-cni-sriov:v2.2, then update ansible-playbooks with the new tag.

k8s-plugins-sriov-network-device: Building from: https://github.com/intel/sriov-network-device-plugin/commit/000db15405f3ce3b7c2f9feb180c3051aa3f7aea, tagged as v3.1. We could retag the current build as starlingx/k8s-plugins-sriov-network-device:v3.1, then update ansible-playbooks with the new tag.

Whenever there’s new content in these images we want to pick up, we retag the new build, and push updates to the static references.

-------------------------------------

description: updated
description: updated
tags: added: stx.config stx.containers
Ghada Khalil (gkhalil) wrote :

Problematic Image tags:
ansible-playbooks/playbookconfig/src/playbooks/roles/bootstrap/bringup-essential-services/vars/main.yaml:
sriov_cni_img: docker.io/starlingx/k8s-cni-sriov:master-centos-stable-latest
sriov_network_device_img: docker.io/starlingx/k8s-plugins-sriov-network-device:master-centos-stable-latest

Not sure if these images in stx.3.0 pose the same issue:
ansible-playbooks/playbookconfig/src/playbooks/roles/bootstrap/plugins/templates/intel-fpga-plugin.yaml.j2:
image: "{{ docker_registry.url }}/starlingx/intel-fpga-plugin:master-distroless-stable-latest"

ansible-playbooks/playbookconfig/src/playbooks/roles/bootstrap/plugins/templates/intel-gpu-plugin.yaml.j2
image: "{{ docker_registry.url }}/starlingx/intel-gpu-plugin:master-distroless-stable-latest"

ansible-playbooks/playbookconfig/src/playbooks/roles/bootstrap/plugins/templates/intel-qat-plugin.yaml.j2
image: "{{ docker_registry.url }}/starlingx/intel-qat-plugin:master-distroless-stable-latest"

Ghada Khalil (gkhalil) wrote :

Don Penney confirmed that the plugin images should also be updated with a specific tag instead of using -latest

Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0; will likely need to be addressed in an upcoming mtce release.
Note: This is also an issue with stx.2.0. TBD if we pursue a fix for that release as well.

tags: added: stx.3.0
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Don Penney (dpenney)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers