SRIOV and Multus images not pulled from local mirror registry

Bug #1829299 reported by Cristopher Lemus
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Mingyuan Qi

Bug Description

Brief Description
-----------------
nfvpe/multus:v3.2 and nfvpe/sriov-cni:latest are not being pulled from local (mirror) registry. They keep trying to pull them from public registry.

Severity
--------
Major: All pods that require these two images are not able to start.

Steps to Reproduce
------------------
Follow up the procedure to configure starlingx. Once the systems are unlocked, the pods try to start and fails when they can't pull the image.

Expected Behavior
------------------
PODs should pull the images from mirror registry, as all other docker images required for kube-system, i.e. calico, coredns, etc.

Actual Behavior
----------------
Pod tries to pull image from public registry:

  Normal Pulling 4m51s (x41 over 3h10m) kubelet, controller-0 pulling image "nfvpe/multus:v3.2"
  Normal BackOff 0s (x840 over 3h10m) kubelet, controller-0 Back-off pulling image "nfvpe/multus:v3.2"

Reproducibility
---------------
100% on configurations with local (mirror) registry.

System Configuration
--------------------
All 4 configs using local (mirror) registry.

Branch/Pull Time/Commit
-----------------------
Currently present on CENGN Image: 20190515T013000Z

Last Pass
---------
These are new images, checking the logs, the issue is present since Monday, May 13th.

Timestamp/Logs
--------------
With describe pod, we detected that the images are not being pulled from mirror registry:

kubectl -n kube-system describe pod kube-multus-ds-amd64-9n44g
.
.
.

Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Warning Failed 49m (x617 over 3h10m) kubelet, controller-0 Error: ImagePullBackOff
  Normal Pulling 4m51s (x41 over 3h10m) kubelet, controller-0 pulling image "nfvpe/multus:v3.2"
  Normal BackOff 0s (x840 over 3h10m) kubelet, controller-0 Back-off pulling image "nfvpe/multus:v3.2"

Test Activity
-------------
Sanity.

Revision history for this message
Cristopher Lemus (cjlemusc) wrote :
Revision history for this message
Cristopher Lemus (cjlemusc) wrote :

Something that might help. Checking on /etc/kubernetes/multus.yaml and /etc/kubernetes/sriov-cni.yaml, the docker image is defined without the mirror registry ip address, i.e., multus:

image: nfvpe/multus:v3.2

Verifying other pods where the image is properly pulled, the IP of the mirror registry should be there, i.e., calico:

image: 192.168.100.60/calico/cni:v3.6.1
image: 192.168.100.60/calico/cni:v3.6.1

192.168.100.60 <- Our mirror registry

As a workaround we tried:
1. Manually edit /etc/kubernetes/multus.yaml and /etc/kubernetes/sriov-cni.yaml and add the registry ip.
2. Update the config with:
    kubectl --kubeconfig=/etc/kubernetes/admin.conf replace -f /etc/kubernetes/sriov-cni.yaml
    kubectl --kubeconfig=/etc/kubernetes/admin.conf replace -f /etc/kubernetes/multus.yaml
3. Remove old pods to launch new ones

With that, the pods started successfully:

controller-0:~$ kubectl get pods -n kube-system |egrep "multus|sriov"
kube-multus-ds-amd64-dd2s6 1/1 Running 0 76m
kube-multus-ds-amd64-r99qz 1/1 Running 0 76m
kube-multus-ds-amd64-rdgxr 1/1 Running 0 76m
kube-multus-ds-amd64-ssfxb 1/1 Running 0 76m
kube-sriov-cni-ds-amd64-8nrbz 1/1 Running 0 75m
kube-sriov-cni-ds-amd64-g999n 1/1 Running 0 75m
kube-sriov-cni-ds-amd64-j9mlh 1/1 Running 0 75m
kube-sriov-cni-ds-amd64-r4gms 1/1 Running 0 76m

Is something missing from the templates?: https://opendev.org/starlingx/config/src/branch/master/puppet-manifests/src/modules/platform/templates/multus.yaml.erb#L134

Ghada Khalil (gkhalil)
description: updated
Revision history for this message
Ghada Khalil (gkhalil) wrote :

These images are available in a public location (similar to other images used in starlingx):
nfvpe/sriov-cni:latest
nfvpe/multus:v3.2

Assigning to Mingyuan Qi to investigate if this has something to do with the proxy setup

Changed in starlingx:
assignee: nobody → Mingyuan Qi (myqi)
summary: - SRIOV and Multus images not pulled from mirror registry
+ SRIOV and Multus images not pulled from local mirror registry
Changed in starlingx:
importance: Undecided → High
tags: added: stx.2.0 stx.containers
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating / high priority as the issue is affecting sanity with local mirror registry.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/659596

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/659596
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=9508aa3e83677e60d56f615c184ef7e25108c504
Submitter: Zuul
Branch: master

commit 9508aa3e83677e60d56f615c184ef7e25108c504
Author: Mingyuan Qi <email address hidden>
Date: Thu May 16 16:17:03 2019 +0800

    device plugin images can be pulled from private registry

    Check user specified docker registry and pull device plugin from
    it if exists. And fix 2 issues related to private registry.

    1. pause image redirection is not needed in k8s 1.13
    2. missing double quotes in /etc/docker/daemon.json

    Closes-bug: 1829299

    Change-Id: I71074056009544abd6c91b10716b3dd5bf7b9e89
    Signed-off-by: Mingyuan Qi <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Cristopher Lemus (cjlemusc) wrote :

Just to confirm, using ISO: 20190521T132734Z the images are properly pulled now. I confirmed that the corresponding yaml files are now updated with the mirror registry IP:

calico.yaml: image: 192.168.100.60/calico/cni:v3.6.1
calico.yaml: image: 192.168.100.60/calico/cni:v3.6.1
calico.yaml: image: 192.168.100.60/calico/node:v3.6.1
calico.yaml: image: 192.168.100.60/calico/kube-controllers:v3.6.1
multus.yaml: image: 192.168.100.60/nfvpe/multus:v3.2
sriov-cni.yaml: image: 192.168.100.60/nfvpe/sriov-cni:latest
sriovdp-daemonset.yaml: image: 192.168.100.60/nfvpe/sriov-device-plugin:latest

Thanks a lot!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers