kube-upgrade-download-images fails for k8s 1.18.1 -> 1.19.13

Bug #1943438 reported by Mihnea Saracin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Mihnea Saracin

Bug Description

Brief Description
-----------------
Kube upgrade download images fail because some images needed for 1.19.13 are not present in the local registry.

Severity
--------
Critical: System/Feature is not usable due to the defect

Steps to Reproduce
------------------
system kube-upgrade-start v1.19.13
system kube-upgrade-download-images

Expected Behavior
------------------
kube-upgrade-download-images succedes

Actual Behavior
----------------
kube-upgrade-download-images fails

Reproducibility
---------------
100%

System Configuration
--------------------
All setups

Branch/Pull Time/Commit
-----------------------
stx master build on 2021-09-10 06:00:51

Last Pass
---------
N/A

Timestamp/Logs
--------------

# puppet.log

2021-09-08T13:50:14.473 ^[[0;36mDebug: 2021-09-08 13:50:13 +0000 Executing: '/bin/umount /usr/local/kubernetes/current/stage1'^[[0m
2021-09-08T13:50:14.476 ^[[0;36mDebug: 2021-09-08 13:50:13 +0000 Executing: '/bin/mount /usr/local/kubernetes/current/stage1'^[[0m
2021-09-08T13:50:14.485 ^[[mNotice: 2021-09-08 13:50:13 +0000 /Stage[main]/Platform::Kubernetes::Bindmounts/Mount[/usr/local/kubernetes/current/stage1]: Triggered 'refresh' from 1 events^[[0m
2021-09-08T13:50:14.489 ^[[0;36mDebug: 2021-09-08 13:50:13 +0000 /Stage[main]/Platform::Kubernetes::Bindmounts/Mount[/usr/local/kubernetes/current/stage1]: The container Class[Platform::Kubernetes::Bindmounts] will propagate my refresh event^[[0m
2021-09-08T13:50:14.492 ^[[0;32mInfo: 2021-09-08 13:50:13 +0000 /Stage[main]/Platform::Kubernetes::Bindmounts/Mount[/usr/local/kubernetes/current/stage1]: Scheduling refresh of Mount[/usr/local/kubernetes/current/stage1]^[[0m
2021-09-08T13:50:14.497 ^[[0;36mDebug: 2021-09-08 13:50:13 +0000 Class[Platform::Kubernetes::Bindmounts]: The container Stage[main] will propagate my refresh event^[[0m
2021-09-08T13:50:14.499 ^[[0;36mDebug: 2021-09-08 13:50:13 +0000 Exec[pre pull images](provider=posix): Executing 'kubeadm --kubeconfig=/etc/kubernetes/admin.conf config images list --kubernetes-version v1.19.13 --image-repository=registry.local:9001/k8s.gcr.io | xargs -i crictl pull --creds sysinv:yvm+_BKe4cuY5pIk {}'^[[0m
2021-09-08T13:50:14.505 ^[[0;36mDebug: 2021-09-08 13:50:13 +0000 Executing: 'kubeadm --kubeconfig=/etc/kubernetes/admin.conf config images list --kubernetes-version v1.19.13 --image-repository=registry.local:9001/k8s.gcr.io | xargs -i crictl pull --creds sysinv:yvm+_BKe4cuY5pIk {}'^[[0m
2021-09-08T13:50:14.508 ^[[mNotice: 2021-09-08 13:50:13 +0000 /Stage[main]/Platform::Kubernetes::Pre_pull_control_plane_images/Exec[pre pull images]/returns: W0908 13:50:13.140075 147144 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]^[[0m
2021-09-08T13:50:14.510 ^[[mNotice: 2021-09-08 13:50:13 +0000 /Stage[main]/Platform::Kubernetes::Pre_pull_control_plane_images/Exec[pre pull images]/returns: Image is up to date for sha256:76696340d79934d9706c5e243689b1e894dd51c43df11c2719e3cb9ba409c3f3^[[0m
2021-09-08T13:50:14.517 ^[[mNotice: 2021-09-08 13:50:13 +0000 /Stage[main]/Platform::Kubernetes::Pre_pull_control_plane_images/Exec[pre pull images]/returns: Image is up to date for sha256:90f4ff69a0bf9a3949399ddcfde362dee089069bf7329f4441abf3fa36202df2^[[0m
2021-09-08T13:50:14.522 ^[[mNotice: 2021-09-08 13:50:13 +0000 /Stage[main]/Platform::Kubernetes::Pre_pull_control_plane_images/Exec[pre pull images]/returns: Image is up to date for sha256:35036a0cd23a85559f3b6662bd6c608c09e9496ce1c041838457999168ab76c7^[[0m
2021-09-08T13:50:14.529 ^[[mNotice: 2021-09-08 13:50:13 +0000 /Stage[main]/Platform::Kubernetes::Pre_pull_control_plane_images/Exec[pre pull images]/returns: Image is up to date for sha256:046ec6b49f0b9677ebb8dc25d372a4f7306531a49e22ac8f300cf7b9340d906d^[[0m
2021-09-08T13:50:14.531 ^[[mNotice: 2021-09-08 13:50:13 +0000 /Stage[main]/Platform::Kubernetes::Pre_pull_control_plane_images/Exec[pre pull images]/returns: Image is up to date for sha256:80d28bedfe5dec59da9ebf8e6260224ac9008ab5c11dbbe16ee3ba3e4439ac2c^[[0m
2021-09-08T13:50:14.539 ^[[mNotice: 2021-09-08 13:50:13 +0000 /Stage[main]/Platform::Kubernetes::Pre_pull_control_plane_images/Exec[pre pull images]/returns: time="2021-09-08T13:50:13Z" level=fatal msg="pulling image: rpc error: code = NotFound desc = failed to pull and unpack image \"registry.local:9001/k8s.gcr.io/etcd:3.4.13-0\": failed to resolve reference \"registry.local:9001/k8s.gcr.io/etcd:3.4.13-0\": registry.local:9001/k8s.gcr.io/etcd:3.4.13-0: not found"^[[0m
2021-09-08T13:50:14.546 ^[[mNotice: 2021-09-08 13:50:13 +0000 /Stage[main]/Platform::Kubernetes::Pre_pull_control_plane_images/Exec[pre pull images]/returns: time="2021-09-08T13:50:13Z" level=fatal msg="pulling image: rpc error: code = NotFound desc = failed to pull and unpack image \"registry.local:9001/k8s.gcr.io/coredns:1.7.0\": failed to resolve reference \"registry.local:9001/k8s.gcr.io/coredns:1.7.0\": registry.local:9001/k8s.gcr.io/coredns:1.7.0: not found"^[[0m
2021-09-08T13:50:14.549 ^[[1;31mError: 2021-09-08 13:50:13 +0000 kubeadm --kubeconfig=/etc/kubernetes/admin.conf config images list --kubernetes-version v1.19.13 --image-repository=registry.local:9001/k8s.gcr.io | xargs -i crictl pull --creds sysinv:yvm+_BKe4cuY5pIk {} returned 123 instead of one of [0]
2021-09-08T13:50:14.550 /usr/share/ruby/vendor_ruby/puppet/util/errors.rb:106:in `fail'
2021-09-08T13:50:14.552 /usr/share/ruby/vendor_ruby/puppet/type/exec.rb:160:in `sync'
2021-09-08T13:50:14.555 /usr/share/ruby/vendor_ruby/puppet/transaction/resource_harness.rb:236:in `sync'
2021-09-08T13:50:14.557 /usr/share/ruby/vendor_ruby/puppet/transaction/resource_harness.rb:134:in `sync_if_needed'
2021-09-08T13:50:14.562 /usr/share/ruby/vendor_ruby/puppet/transaction/resource_harness.rb:88:in `block in perform_changes'
2021-09-08T13:50:14.576 /usr/share/ruby/vendor_ruby/puppet/transaction/resource_harness.rb:87:in `each'
2021-09-08T13:50:14.578 /usr/share/ruby/vendor_ruby/puppet/transaction/resource_harness.rb:87:in `perform_changes'
2021-09-08T13:50:14.588 /usr/share/ruby/vendor_ruby/puppet/transaction/resource_harness.rb:21:in `evaluate'
2021-09-08T13:50:14.592 /usr/share/ruby/vendor_ruby/puppet/transaction.rb:230:in `apply'
19 +0000 kubeadm --kubeconfig=/etc/kubernetes/admin.conf config images list --kubernetes-version v1.19.13 --image-repository=registry.local:9001/k8s.gcr.io | xargs -i crictl pull --creds sysinv:RFVDlz2uRVgQA*q= {} returned 123 instead of one of [0]

Test Activity
-------------
Feature Testing

Changed in starlingx:
assignee: nobody → Mihnea Saracin (msaracin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/808723

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/808723
Committed: https://opendev.org/starlingx/config/commit/fc14db87633827b237c93c049dd77d6a28e15420
Submitter: "Zuul (22348)"
Branch: master

commit fc14db87633827b237c93c049dd77d6a28e15420
Author: Mihnea Saracin <email address hidden>
Date: Mon Sep 13 13:07:02 2021 +0300

    Fix kube-upgrade-download-images step

    When running the kube-upgrade-download-images step, the images needed
    for the new k8s version are first downloaded by the
    https://opendev.org/starlingx/ansible-playbooks/src/branch/master/playbookconfig/src/playbooks/push_k8s_images.yml
    playbook.

    The list of images is computed by the
    https://opendev.org/starlingx/ansible-playbooks/src/branch/master/playbookconfig/src/playbooks/roles/common/load-images-information/tasks/main.yml#L18

    ok: [localhost] => {
        "download_images_list": [
            "k8s.gcr.io/kube-apiserver:v1.19.13",
            "k8s.gcr.io/kube-controller-manager:v1.19.13",
            "k8s.gcr.io/kube-scheduler:v1.19.13",
            "k8s.gcr.io/kube-proxy:v1.19.13",
            "k8s.gcr.io/pause:3.2",
            "k8s.gcr.io/etcd:3.4.3-0",
            "k8s.gcr.io/coredns:1.6.7"
        ]
    }
    And the etcd and coredns image tags are wrong, they should have been
    etcd:3.4.13-0 and coredns:1.7.0. This happens because when running
    the playbook, the kubeadm binary still points to the 1.18.1 version
    and it gets us the wrong images tags.

    The fix for this is to update the bindmounts before running the
    'push_k8s_images.yml' playbook in order to have the kubeadm binary
    pointing to the newer version of k8s (1.19.13) in our case.

    Closes-Bug: 1943438
    Change-Id: Id73370fb8a148a3680dcb4f8291fddd42b7a271f
    Signed-off-by: Mihnea Saracin <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.6.0 stx.containers
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.