Bug #1817723 “[Containers] Pods not running on computes (Standar...” : Bugs : StarlingX

Revision history for this message

Jose Perez Carranza (jgperezc) wrote on 2019-02-26:

#1

ALL_NODES_20190225.155732.tar Edit (112.0 MiB, application/x-tar)

Ada Cabrales (acabrale) on 2019-02-26

tags:

added: stx.containers

Revision history for this message

Cristopher Lemus (cjlemusc) wrote on 2019-02-26:

#2

Tried using ISO from stein branch (2019-02-25):

[wrsroot@controller-1 ~(keystone_admin)]$ cat /etc/build.info
###
### StarlingX
### Release 19.01
###

OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="f/stein"

JOB="STX_build_stein_master"
<email address hidden>"
BUILD_NUMBER="54"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-02-25 19:13:50 +0000"

The same pods are not properly running:

[wrsroot@controller-1 ~(keystone_admin)]$ kubectl get pods --all-namespaces -o wide |egrep -vi "running|completed"
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
kube-system calico-node-9hfr4 0/2 ContainerCreating 0 163m 10.10.54.100 compute-1 <none>
kube-system calico-node-dqbzh 0/2 ContainerCreating 0 164m 10.10.54.192 compute-0 <none>
kube-system kube-proxy-krmkz 0/1 ContainerCreating 0 164m 10.10.54.192 compute-0 <none>
kube-system kube-proxy-tkxwf 0/1 ContainerCreating 0 163m 10.10.54.100 compute-1 <none>
openstack heat-engine-cleaner-1551185700-7c66t 0/1 PodInitializing 0 5s 172.16.0.67 controller-0 <none>
openstack nova-cell-setup-bl4br 0/1 Init:0/2 0 42m 172.16.1.49 controller-1 <none>
openstack osh-openstack-garbd-garbd-74d7ff4f-t4gsw 0/1 Pending 0 88m <none> <none> <none>

Revision history for this message

Bob Church (rchurch) wrote on 2019-02-26:

#3

We are not seeing this is a non-proxy environment. Can you gather some more information from these pods?

See some debugging tips: https://wiki.openstack.org/wiki/StarlingX/Containers/FAQ#What_should_I_do_if_I_see_a_pod_is_not_in_a_Running_state

Basically you want to use "kubectl describe" at this point and check the events.

It seems like those pods are having trouble pulling their images or perhaps there is a long delay in getting the docker images for those containers.

Revision history for this message

Cristopher Lemus (cjlemusc) wrote on 2019-02-26:

#4

Outputs of both pods:

http://paste.openstack.org/show/746303/

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2019-02-26:

#5

Download full text (4.3 KiB)

Same behavior on 2 + 2 + 2

--------------------------------------------------------------------------------------------
controller-0:~$ kubectl get pods --all-namespaces -o wide |grep -v -e "Running" -e "Completed"
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
kube-system calico-node-jd2gd 0/2 ContainerCreating 0 3h51m 192.168.204.172 compute-0 <none>
kube-system calico-node-nlt9h 0/2 ContainerCreating 0 3h50m 192.168.204.133 compute-1 <none>
kube-system kube-proxy-7gkrj 0/1 ContainerCreating 0 3h50m 192.168.204.133 compute-1 <none>
kube-system kube-proxy-fpk2z 0/1 ContainerCreating 0 3h51m 192.168.204.172 compute-0 <none>
openstack nova-cell-setup-krck7 0/1 Init:0/2 0 158m 172.16.0.34 controller-0 <none>
openstack osh-openstack-garbd-garbd-5d495764d9-zrddq 0/1 Pending 0 3h22m <none> <none> <none>
--------------------------------------------------------------------------------------------
controller-0:~$ kubectl describe pods -n openstack osh-openstack-garbd-garbd-5d495764d9-zrddq
Name: osh-openstack-garbd-garbd-5d495764d9-zrddq
Namespace: openstack
Priority: 0
PriorityClassName: <none>
Node: <none>
Labels: application=garbd
                    component=server
                    pod-template-hash=5d495764d9
                    release_group=osh-openstack-garbd
Annotations: configmap-bin-hash: e9eadebcdb0d47224b47dfa165edeca8ac773b0939b1672ee5d4a438c2341f5a
Status: Pending
IP:
Controlled By: ReplicaSet/osh-openstack-garbd-garbd-5d495764d9
Init Containers:
  init:
    Image: 192.168.204.2:9001/quay.io/stackanetes/kubernetes-entrypoint:v0.3.1
    Port: <none>
    Host Port: <none>
    Command:
      kubernetes-entrypoint
    Environment:
      POD_NAME: osh-openstack-garbd-garbd-5d495764d9-zrddq (v1:metadata.name)
      NAMESPACE: openstack (v1:metadata.namespace)
      INTERFACE_NAME: eth0
      PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/
      DEPENDENCY_SERVICE:
      DEPENDENCY_DAEMONSET:
      DEPENDENCY_CONTAINER:
      DEPENDENCY_POD_JSON:
      COMMAND: echo done
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from osh-openstack-garbd-garbd-token-8x7p6 (ro)
Containers:
  garbd:
    Image: 192.168.204.2:9001/starlingx/stx-mariadb:dev-centos-pike-latest
    Port: <none>
    Host Port: <none>
    Command:
      /tmp/garbd.sh
    Environment:
      GROUP_NAME: mariadb-server_openstack
      GROUP_ADDRESS: gcomm://mariadb-server-0.mariadb-discovery.openst...

Same behavior on 2 + 2 + 2

--------------------------------------------------------------------------------------------
controller-0:~$ kubectl get pods --all-namespaces -o wide |grep -v -e "Running" -e "Completed"
NAMESPACE     NAME                                                          READY   STATUS              RESTARTS   AGE     IP                NODE           NOMINATED NODE
kube-system   calico-node-jd2gd                                             0/2     ContainerCreating   0          3h51m   192.168.204.172   compute-0      <none>
kube-system   calico-node-nlt9h                                             0/2     ContainerCreating   0          3h50m   192.168.204.133   compute-1      <none>
kube-system   kube-proxy-7gkrj                                              0/1     ContainerCreating   0          3h50m   192.168.204.133   compute-1      <none>
kube-system   kube-proxy-fpk2z                                              0/1     ContainerCreating   0          3h51m   192.168.204.172   compute-0      <none>
openstack     nova-cell-setup-krck7                                         0/1     Init:0/2            0          158m    172.16.0.34       controller-0   <none>
openstack     osh-openstack-garbd-garbd-5d495764d9-zrddq                    0/1     Pending             0          3h22m   <none>            <none>         <none>
--------------------------------------------------------------------------------------------
controller-0:~$ kubectl describe pods -n openstack osh-openstack-garbd-garbd-5d495764d9-zrddq
Name:               osh-openstack-garbd-garbd-5d495764d9-zrddq
Namespace:          openstack
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             application=garbd
                    component=server
                    pod-template-hash=5d495764d9
                    release_group=osh-openstack-garbd
Annotations:        configmap-bin-hash: e9eadebcdb0d47224b47dfa165edeca8ac773b0939b1672ee5d4a438c2341f5a
Status:             Pending
IP:                 
Controlled By:      ReplicaSet/osh-openstack-garbd-garbd-5d495764d9
Init Containers:
  init:
    Image:      192.168.204.2:9001/quay.io/stackanetes/kubernetes-entrypoint:v0.3.1
    Port:       <none>
    Host Port:  <none>
    Command:
      kubernetes-entrypoint
    Environment:
      POD_NAME:              osh-openstack-garbd-garbd-5d495764d9-zrddq (v1:metadata.name)
      NAMESPACE:             openstack (v1:metadata.namespace)
      INTERFACE_NAME:        eth0
      PATH:                  /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/
      DEPENDENCY_SERVICE:    
      DEPENDENCY_DAEMONSET:  
      DEPENDENCY_CONTAINER:  
      DEPENDENCY_POD_JSON:   
      COMMAND:               echo done
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from osh-openstack-garbd-garbd-token-8x7p6 (ro)
Containers:
  garbd:
    Image:      192.168.204.2:9001/starlingx/stx-mariadb:dev-centos-pike-latest
    Port:       <none>
    Host Port:  <none>
    Command:
      /tmp/garbd.sh
    Environment:
      GROUP_NAME:     mariadb-server_openstack
      GROUP_ADDRESS:  gcomm://mariadb-server-0.mariadb-discovery.openstack.svc.cluster.local,mariadb-server-1.mariadb-discovery.openstack.svc.cluster.local
    Mounts:
      /tmp/garbd.sh from garbd-bin (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from osh-openstack-garbd-garbd-token-8x7p6 (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  garbd-bin:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      osh-openstack-garbd-garbd-bin
    Optional:  false
  osh-openstack-garbd-garbd-token-8x7p6:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  osh-openstack-garbd-garbd-token-8x7p6
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  openstack-compute-node=enabled
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                        From               Message
  ----     ------            ----                       ----               -------
  Warning  FailedScheduling  3m14s (x12091 over 3h23m)  default-scheduler  0/4 nodes are available: ***2 node(s) didn't match node selector, 2 node(s) had taints that the pod didn't tolerate.***

Cindy Xie (xxie1) on 2019-02-27

Changed in starlingx:
assignee:	nobody → Austin Sun (sunausti)

Mingyuan Qi (myqi) on 2019-02-27

Changed in starlingx:
assignee:	Austin Sun (sunausti) → Mingyuan Qi (myqi)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-02-27: Fix proposed to stx-config (master)

#6

Fix proposed to branch: master
Review: https://review.openstack.org/639619

Changed in starlingx:
status:	New → In Progress

Frank Miller (sensfan22) on 2019-02-27

Changed in starlingx:
importance:	Undecided → High

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-02-27:

#7

Marking as release gating; high priority as it affects configurations where a proxy is used

summary:	- [Containers] Pods not running on computes (Standar 2+2) + [Containers] Pods not running on computes (Standard 2+2)
tags:	added: stx.2019.05

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-03-05: Fix merged to stx-config (master)

#8

Reviewed: https://review.openstack.org/639619
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=1004ae82fd0290e5040542f05f851fc204956973
Submitter: Zuul
Branch: master

commit 1004ae82fd0290e5040542f05f851fc204956973
Author: Mingyuan Qi <email address hidden>
Date: Wed Feb 27 17:45:02 2019 +0800

Fix k8s firewall blocks docker proxy port

    Allow docker proxy port(s) in k8s firewall. It unblocks worker
    node to access proxy via controller. As a result, worker node can
    successfully pull k8s/calico images through proxy.

    Duplex: docker proxy port correctly added in iptables SNAT rule
    2+2: docker proxy port correctly added in iptables SNAT rule
    2+2 without proxy: pass, no regression issue

    Closes-Bug: #1817723
    Change-Id: I7a8093a1fdce0089e5d0a9483a5c58184d1e213e
    Signed-off-by: Mingyuan Qi <email address hidden>

Changed in starlingx:
status:	In Progress → Fix Released

Ken Young (kenyis) on 2019-04-05

tags:

added: stx.2.0
removed: stx.2019.05

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2019-05-22:

#9

Download full text (5.5 KiB)

I believe we are having problems with pods not running like LP "https://bugs.launchpad.net/starlingx/+bug/1817723" but this time in 2+2+2 cluster.

[wrsroot@controller-0 ~(keystone_admin)]$ kubectl get pods --all-namespaces | grep 0/1
kube-system calico-node-5kw4w 0/1 Init:0/2 0 6h4m
kube-system calico-node-whrbc 0/1 Init:0/2 0 6h4m
kube-system ceph-pools-audit-1558479000-zcmmr 0/1 Completed 0 11m
kube-system ceph-pools-audit-1558479300-dxj65 0/1 Completed 0 6m10s
kube-system ceph-pools-audit-1558479600-cm9rs 0/1 Completed 0 69s
kube-system kube-multus-ds-amd64-rglr7 0/1 ContainerCreating 0 6h4m
kube-system kube-multus-ds-amd64-wsqvv 0/1 ContainerCreating 0 6h4m
kube-system kube-proxy-kh8rs 0/1 ContainerCreating 0 6h4m
kube-system kube-proxy-pcxgc 0/1 ContainerCreating 0 6h4m
openstack osh-openstack-garbd-garbd-85564795f6-9sj6v 0/1 Pending 0 5h56m

[wrsroot@controller-0 ~(keystone_admin)]$ kubectl get pods -n openstack
NAME READY STATUS RESTARTS AGE
ingress-7bf7c8458f-dtr7t 1/1 Running 0 5h59m
ingress-7bf7c8458f-grw8h 1/1 Running 0 5h59m
ingress-error-pages-cf8cf7ccd-lw8sr 1/1 Running 0 5h59m
ingress-error-pages-cf8cf7ccd-s6hfg 1/1 Running 0 5h59m
mariadb-ingress-66c7f9964b-d4t9p 1/1 Running 0 5h59m
mariadb-ingress-66c7f9964b-jnhws 1/1 Running 0 5h59m
mariadb-ingress-error-pages-749cf64f44-dcr5h 1/1 Running 0 5h59m
mariadb-server-0 1/1 Running 0 5h59m
mariadb-server-1 1/1 Running 0 5h59m
osh-openstack-garbd-garbd-85564795f6-9sj6v 0/1 Pending 0 5h57m

[wrsroot@controller-0 ~(keystone_admin)]$ kubectl describe pod osh-openstack-garbd-garbd-85564795f6-9sj6v -n openstack
Name: osh-openstack-garbd-garbd-85564795f6-9sj6v
Namespace: openstack
Priority: 0
PriorityClassName: <none>
Node: <none>
Labels: application=garbd
                    component=server
                    pod-template-hash=85564795f6
                    release_group=osh-openstack-garbd
Annotations: configmap-bin-hash: e9eadebcdb0d47224b47dfa165edeca8ac773b0939b1672ee5d4a438c2341f5a
Status: Pending
IP:
Controlled By: ReplicaSet/osh-openstack-garbd-garbd-85564795f6
Init Containers:
  init:
    Image: 10.10.58.2:9001/quay.io/stackanetes/kubernetes-entrypoint:v0.3.1
    Port: <none>
    Host Port: <none>
    Command:
      kubernetes-entrypoint
    Environment:
      POD_NAME: osh-openstack-garbd-garbd-85564...

I believe we are having problems with pods not running like LP "https://bugs.launchpad.net/starlingx/+bug/1817723" but this time in 2+2+2 cluster.

[wrsroot@controller-0 ~(keystone_admin)]$ kubectl get pods --all-namespaces | grep 0/1
kube-system   calico-node-5kw4w                              0/1     Init:0/2            0          6h4m
kube-system   calico-node-whrbc                              0/1     Init:0/2            0          6h4m
kube-system   ceph-pools-audit-1558479000-zcmmr              0/1     Completed           0          11m
kube-system   ceph-pools-audit-1558479300-dxj65              0/1     Completed           0          6m10s
kube-system   ceph-pools-audit-1558479600-cm9rs              0/1     Completed           0          69s
kube-system   kube-multus-ds-amd64-rglr7                     0/1     ContainerCreating   0          6h4m
kube-system   kube-multus-ds-amd64-wsqvv                     0/1     ContainerCreating   0          6h4m
kube-system   kube-proxy-kh8rs                               0/1     ContainerCreating   0          6h4m
kube-system   kube-proxy-pcxgc                               0/1     ContainerCreating   0          6h4m
openstack     osh-openstack-garbd-garbd-85564795f6-9sj6v     0/1     Pending             0          5h56m

[wrsroot@controller-0 ~(keystone_admin)]$ kubectl get pods -n openstack
NAME                                           READY   STATUS    RESTARTS   AGE
ingress-7bf7c8458f-dtr7t                       1/1     Running   0          5h59m
ingress-7bf7c8458f-grw8h                       1/1     Running   0          5h59m
ingress-error-pages-cf8cf7ccd-lw8sr            1/1     Running   0          5h59m
ingress-error-pages-cf8cf7ccd-s6hfg            1/1     Running   0          5h59m
mariadb-ingress-66c7f9964b-d4t9p               1/1     Running   0          5h59m
mariadb-ingress-66c7f9964b-jnhws               1/1     Running   0          5h59m
mariadb-ingress-error-pages-749cf64f44-dcr5h   1/1     Running   0          5h59m
mariadb-server-0                               1/1     Running   0          5h59m
mariadb-server-1                               1/1     Running   0          5h59m
osh-openstack-garbd-garbd-85564795f6-9sj6v     0/1     Pending   0          5h57m

[wrsroot@controller-0 ~(keystone_admin)]$ kubectl describe pod osh-openstack-garbd-garbd-85564795f6-9sj6v -n openstack
Name:               osh-openstack-garbd-garbd-85564795f6-9sj6v
Namespace:          openstack
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             application=garbd
                    component=server
                    pod-template-hash=85564795f6
                    release_group=osh-openstack-garbd
Annotations:        configmap-bin-hash: e9eadebcdb0d47224b47dfa165edeca8ac773b0939b1672ee5d4a438c2341f5a
Status:             Pending
IP:
Controlled By:      ReplicaSet/osh-openstack-garbd-garbd-85564795f6
Init Containers:
  init:
    Image:      10.10.58.2:9001/quay.io/stackanetes/kubernetes-entrypoint:v0.3.1
    Port:       <none>
    Host Port:  <none>
    Command:
      kubernetes-entrypoint
    Environment:
      POD_NAME:              osh-openstack-garbd-garbd-85564795f6-9sj6v (v1:metadata.name)
      NAMESPACE:             openstack (v1:metadata.namespace)
      INTERFACE_NAME:        eth0
      PATH:                  /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/
      DEPENDENCY_SERVICE:
      DEPENDENCY_DAEMONSET:
      DEPENDENCY_CONTAINER:
      DEPENDENCY_POD_JSON:
      COMMAND:               echo done
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from osh-openstack-garbd-garbd-token-9mscp (ro)
Containers:
  garbd:
    Image:      10.10.58.2:9001/docker.io/starlingx/stx-mariadb:master-centos-stable-20190520T233000Z.0
    Port:       <none>
    Host Port:  <none>
    Command:
      /tmp/garbd.sh
    Environment:
      GROUP_NAME:     mariadb-server_openstack
      GROUP_ADDRESS:  gcomm://mariadb-server-0.mariadb-discovery.openstack.svc.cluster.local,mariadb-server-1.mariadb-discovery.openstack.svc.cluster.local
    Mounts:
      /tmp/garbd.sh from garbd-bin (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from osh-openstack-garbd-garbd-token-9mscp (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  garbd-bin:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      osh-openstack-garbd-garbd-bin
    Optional:  false
  osh-openstack-garbd-garbd-token-9mscp:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  osh-openstack-garbd-garbd-token-9mscp
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  openstack-compute-node=enabled
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  29s (x367 over 6h)  default-scheduler  0/4 nodes are available: 2 node(s) didn't match node selector, 2 node(s) had taints that the pod didn't tolerate.
[wrsroot@controller-0 ~(keystone_admin)]$ kubectl describe pod osh-openstack-garbd-garbd-85564795f6-9sj6v -n openstack

Using 20190521T132734Z in the BM 2+2+2.
###
### StarlingX
###     Built from master
###

OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20190521T132734Z"

JOB="STX_build_master_master"
BUILD_BY="starlingx.build@cengn.ca"
BUILD_NUMBER="111"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-05-21 13:27:34 +0000"

Revision history for this message

Mingyuan Qi (myqi) wrote on 2019-05-22:

#10

@Fernando I don't think it's the same issue as the previous one. the log didn't show the containers not running was because of the image pulling issue. I think you could file another LP and add more detail logs like "kubectl describe node" and describe all the failed pods within kube-system namespace.

StarlingX

[Containers] Pods not running on computes (Standard 2+2)

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches