SRIOV and Multus images not pulled from local mirror registry

Bug #1829299 reported by Cristopher Lemus
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Mingyuan Qi

Bug Description

Brief Description
-----------------
nfvpe/multus:v3.2 and nfvpe/sriov-cni:latest are not being pulled from local (mirror) registry. They keep trying to pull them from public registry.

Severity
--------
Major: All pods that require these two images are not able to start.

Steps to Reproduce
------------------
Follow up the procedure to configure starlingx. Once the systems are unlocked, the pods try to start and fails when they can't pull the image.

Expected Behavior
------------------
PODs should pull the images from mirror registry, as all other docker images required for kube-system, i.e. calico, coredns, etc.

Actual Behavior
----------------
Pod tries to pull image from public registry:

  Normal Pulling 4m51s (x41 over 3h10m) kubelet, controller-0 pulling image "nfvpe/multus:v3.2"
  Normal BackOff 0s (x840 over 3h10m) kubelet, controller-0 Back-off pulling image "nfvpe/multus:v3.2"

Reproducibility
---------------
100% on configurations with local (mirror) registry.

System Configuration
--------------------
All 4 configs using local (mirror) registry.

Branch/Pull Time/Commit
-----------------------
Currently present on CENGN Image: 20190515T013000Z

Last Pass
---------
These are new images, checking the logs, the issue is present since Monday, May 13th.

Timestamp/Logs
--------------
With describe pod, we detected that the images are not being pulled from mirror registry:

kubectl -n kube-system describe pod kube-multus-ds-amd64-9n44g
.
.
.

Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Warning Failed 49m (x617 over 3h10m) kubelet, controller-0 Error: ImagePullBackOff
  Normal Pulling 4m51s (x41 over 3h10m) kubelet, controller-0 pulling image "nfvpe/multus:v3.2"
  Normal BackOff 0s (x840 over 3h10m) kubelet, controller-0 Back-off pulling image "nfvpe/multus:v3.2"

Test Activity
-------------
Sanity.

Revision history for this message
Cristopher Lemus (cjlemusc) wrote :
Revision history for this message
Cristopher Lemus (cjlemusc) wrote :

Something that might help. Checking on /etc/kubernetes/multus.yaml and /etc/kubernetes/sriov-cni.yaml, the docker image is defined without the mirror registry ip address, i.e., multus:

image: nfvpe/multus:v3.2

Verifying other pods where the image is properly pulled, the IP of the mirror registry should be there, i.e., calico:

image: 192.168.100.60/calico/cni:v3.6.1
image: 192.168.100.60/calico/cni:v3.6.1

192.168.100.60 <- Our mirror registry

As a workaround we tried:
1. Manually edit /etc/kubernetes/multus.yaml and /etc/kubernetes/sriov-cni.yaml and add the registry ip.
2. Update the config with:
    kubectl --kubeconfig=/etc/kubernetes/admin.conf replace -f /etc/kubernetes/sriov-cni.yaml
    kubectl --kubeconfig=/etc/kubernetes/admin.conf replace -f /etc/kubernetes/multus.yaml
3. Remove old pods to launch new ones

With that, the pods started successfully:

controller-0:~$ kubectl get pods -n kube-system |egrep "multus|sriov"
kube-multus-ds-amd64-dd2s6 1/1 Running 0 76m
kube-multus-ds-amd64-r99qz 1/1 Running 0 76m
kube-multus-ds-amd64-rdgxr 1/1 Running 0 76m
kube-multus-ds-amd64-ssfxb 1/1 Running 0 76m
kube-sriov-cni-ds-amd64-8nrbz 1/1 Running 0 75m
kube-sriov-cni-ds-amd64-g999n 1/1 Running 0 75m
kube-sriov-cni-ds-amd64-j9mlh 1/1 Running 0 75m
kube-sriov-cni-ds-amd64-r4gms 1/1 Running 0 76m

Is something missing from the templates?: https://opendev.org/starlingx/config/src/branch/master/puppet-manifests/src/modules/platform/templates/multus.yaml.erb#L134

Ghada Khalil (gkhalil)
description: updated
Revision history for this message
Ghada Khalil (gkhalil) wrote :

These images are available in a public location (similar to other images used in starlingx):
nfvpe/sriov-cni:latest
nfvpe/multus:v3.2

Assigning to Mingyuan Qi to investigate if this has something to do with the proxy setup

Changed in starlingx:
assignee: nobody → Mingyuan Qi (myqi)
summary: - SRIOV and Multus images not pulled from mirror registry
+ SRIOV and Multus images not pulled from local mirror registry
Changed in starlingx:
importance: Undecided → High
tags: added: stx.2.0 stx.containers
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating / high priority as the issue is affecting sanity with local mirror registry.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/659596

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/659596
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=9508aa3e83677e60d56f615c184ef7e25108c504
Submitter: Zuul
Branch: master

commit 9508aa3e83677e60d56f615c184ef7e25108c504
Author: Mingyuan Qi <email address hidden>
Date: Thu May 16 16:17:03 2019 +0800

    device plugin images can be pulled from private registry

    Check user specified docker registry and pull device plugin from
    it if exists. And fix 2 issues related to private registry.

    1. pause image redirection is not needed in k8s 1.13
    2. missing double quotes in /etc/docker/daemon.json

    Closes-bug: 1829299

    Change-Id: I71074056009544abd6c91b10716b3dd5bf7b9e89
    Signed-off-by: Mingyuan Qi <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Cristopher Lemus (cjlemusc) wrote :

Just to confirm, using ISO: 20190521T132734Z the images are properly pulled now. I confirmed that the corresponding yaml files are now updated with the mirror registry IP:

calico.yaml: image: 192.168.100.60/calico/cni:v3.6.1
calico.yaml: image: 192.168.100.60/calico/cni:v3.6.1
calico.yaml: image: 192.168.100.60/calico/node:v3.6.1
calico.yaml: image: 192.168.100.60/calico/kube-controllers:v3.6.1
multus.yaml: image: 192.168.100.60/nfvpe/multus:v3.2
sriov-cni.yaml: image: 192.168.100.60/nfvpe/sriov-cni:latest
sriovdp-daemonset.yaml: image: 192.168.100.60/nfvpe/sriov-device-plugin:latest

Thanks a lot!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.