Containers: Worker nodes are pulling from external registry instead from the internal registry
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Angie Wang |
Bug Description
Brief Description
-----------------
After install and unlock worker nodes, they were stuck at ContainerCreating for extended amount of time (> 40m). While investigating this issue, it was determined that the worker nodes are trying to pull from the external registry instead of from an internal registry.
Severity
--------
Minor
Steps to Reproduce
------------------
- Install and configure controller-0
- Install controller-1 and worker nodes from controller-0, and unlock them
Expected Behavior
------------------
- Worker nodes should pull images from internal registry
Actual Behavior
----------------
- worker nodes were trying to pull images from external repo and got stuck at NotReady - ContainerCreating for 40 minutes plus
Reproducibility
---------------
Intermittent
System Configuration
-------
Multi-node system
Branch/Pull Time/Commit
-------
f/stein as of 2019-02-25
Timestamp/Logs
--------------
# All nodes unlocked and available:
[2019-02-26 03:12:48,748] 262 DEBUG MainThread ssh.send :: Send 'system --os-endpoint-type internalURL --os-region-name RegionOne host-list'
+----+-
| id | hostname | personality | administrative | operational | availability |
+----+-
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | controller-1 | controller | unlocked | enabled | available |
| 3 | compute-0 | worker | unlocked | enabled | available |
| 4 | compute-1 | worker | unlocked | enabled | available |
+----+-
[wrsroot@
# worker nodes stuck for 45 minutes
NAME STATUS ROLES AGE VERSION
compute-0 NotReady <none> 45m v1.12.3
compute-1 NotReady <none> 44m v1.12.3
controller-0 Ready master 117m v1.12.3
controller-1 Ready master 65m v1.12.3
[wrsroot@
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
kube-system calico-node-m6znx 0/2 ContainerCreating 0 41m 192.168.204.91 compute-0 <none>
kube-system calico-node-w9nlk 0/2 ContainerCreating 0 40m 192.168.204.185 compute-1 <none>
kube-system kube-proxy-66j88 0/1 ContainerCreating 0 40m 192.168.204.185 compute-1 <none>
kube-system kube-proxy-86jn4 0/1 ContainerCreating 0 41m 192.168.204.91 compute-0 <none>
# seems to be pulling images from external repo
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 43m default-scheduler Successfully assigned kube-system/
Warning FailedCreatePod
# Note that workers eventually recovered automatically and reached Ready status. But the proper behaviour to minimize time spent accessing the external register is for worker nodes to pull from the internal registry.
tags: | added: stx.containers |
Changed in starlingx: | |
assignee: | nobody → Angie Wang (angiewang) |
tags: |
added: stx.2.0 removed: stx.2019.05 |
Changed in starlingx: | |
assignee: | Bruce Jones (brucej) → Abraham Arce (xe1gyq) |
Changed in starlingx: | |
assignee: | Abraham Arce (xe1gyq) → Erich Cordoba (ericho) |
tags: |
added: stx.3.0 removed: stx.2.0 |
Marking as release gating. The worker nodes should be pulling images from an internal registry on the controller; instead of going to the external registry every time. Medium priority as this is a performance optimization.