Kubernetes Pods remain in Pending state

Bug #1900639 reported by svanschalkwyk
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Charmed Kubernetes Bundles
Incomplete
Undecided
Unassigned

Bug Description

Single server 18.04 LTS.
LXD 4.7.
Juju 2.8.1-bionic-amd64
Kubectl 1.19.2
```
lxc profile show default
config:
  limits.cpu: "2"
description: Default LXD profile
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
name: default
used_by:
- /1.0/instances/juju-128992-0
- /1.0/instances/juju-ce2bda-0
- /1.0/instances/juju-ce2bda-1
- /1.0/instances/juju-ce2bda-2
- /1.0/instances/juju-ce2bda-6
- /1.0/instances/juju-ce2bda-4
- /1.0/instances/juju-ce2bda-3
- /1.0/instances/juju-ce2bda-5
- /1.0/instances/juju-ce2bda-7
- /1.0/instances/juju-ce2bda-9
- /1.0/instances/juju-ce2bda-8
```
```
lxc list
+---------------+---------+-----------------------+------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+---------------+---------+-----------------------+------+-----------+-----------+
| bionic-maas | RUNNING | 10.242.109.146 (eth0) | | CONTAINER | 0 |
+---------------+---------+-----------------------+------+-----------+-----------+
| juju-128992-0 | RUNNING | 10.242.109.209 (eth0) | | CONTAINER | 0 |
+---------------+---------+-----------------------+------+-----------+-----------+
| juju-ce2bda-0 | RUNNING | 10.242.109.240 (eth0) | | CONTAINER | 0 |
+---------------+---------+-----------------------+------+-----------+-----------+
| juju-ce2bda-1 | RUNNING | 10.242.109.210 (eth0) | | CONTAINER | 0 |
+---------------+---------+-----------------------+------+-----------+-----------+
| juju-ce2bda-2 | RUNNING | 10.242.109.33 (eth0) | | CONTAINER | 0 |
+---------------+---------+-----------------------+------+-----------+-----------+
| juju-ce2bda-3 | RUNNING | 10.242.109.148 (eth0) | | CONTAINER | 0 |
+---------------+---------+-----------------------+------+-----------+-----------+
| juju-ce2bda-4 | RUNNING | 10.242.109.251 (eth0) | | CONTAINER | 0 |
+---------------+---------+-----------------------+------+-----------+-----------+
| juju-ce2bda-5 | RUNNING | 10.242.109.11 (eth0) | | CONTAINER | 0 |
| | | 10.1.32.0 (flannel.1) | | | |
+---------------+---------+-----------------------+------+-----------+-----------+
| juju-ce2bda-6 | RUNNING | 10.242.109.191 (eth0) | | CONTAINER | 0 |
| | | 10.1.53.0 (flannel.1) | | | |
+---------------+---------+-----------------------+------+-----------+-----------+
| juju-ce2bda-7 | RUNNING | 10.242.109.247 (eth0) | | CONTAINER | 0 |
| | | 10.1.40.0 (flannel.1) | | | |
+---------------+---------+-----------------------+------+-----------+-----------+
| juju-ce2bda-8 | RUNNING | 10.242.109.252 (eth0) | | CONTAINER | 0 |
| | | 10.1.72.0 (flannel.1) | | | |
+---------------+---------+-----------------------+------+-----------+-----------+
| juju-ce2bda-9 | RUNNING | 10.242.109.162 (eth0) | | CONTAINER | 0 |
| | | 10.1.56.0 (flannel.1) | | | |
+---------------+---------+-----------------------+------+-----------+-----------+
```
```
juju status --color
Model Controller Cloud/Region Version SLA Timestamp
k8s sydney localhost/localhost 2.8.1 unsupported 21:35:15-05:00

App Version Status Scale Charm Store Rev OS Notes
containerd 1.3.3 active 5 containerd jujucharms 94 ubuntu
easyrsa 3.0.1 active 1 easyrsa jujucharms 333 ubuntu
etcd 3.4.5 active 3 etcd jujucharms 540 ubuntu
flannel 0.11.0 active 5 flannel jujucharms 506 ubuntu
kubeapi-load-balancer 1.18.0 active 1 kubeapi-load-balancer jujucharms 747 ubuntu exposed
kubernetes-master 1.19.2 waiting 2 kubernetes-master jujucharms 891 ubuntu
kubernetes-worker 1.19.2 active 3 kubernetes-worker jujucharms 704 ubuntu exposed

Unit Workload Agent Machine Public address Ports Message
easyrsa/0* active idle 0 10.242.109.240 Certificate Authority connected.
etcd/0 active idle 1 10.242.109.210 2379/tcp Healthy with 3 known peers
etcd/1 active idle 2 10.242.109.33 2379/tcp Healthy with 3 known peers
etcd/2* active idle 3 10.242.109.148 2379/tcp Healthy with 3 known peers
kubeapi-load-balancer/0* active idle 4 10.242.109.251 443/tcp Loadbalancer ready.
kubernetes-master/0* waiting idle 5 10.242.109.11 6443/tcp Waiting for 3 kube-system pods to start
  containerd/2 active idle 10.242.109.11 Container runtime available
  flannel/2 active idle 10.242.109.11 Flannel subnet 10.1.32.1/24
kubernetes-master/1 waiting idle 6 10.242.109.191 6443/tcp Waiting for 3 kube-system pods to start
  containerd/1 active idle 10.242.109.191 Container runtime available
  flannel/1 active idle 10.242.109.191 Flannel subnet 10.1.53.1/24
kubernetes-worker/0 active idle 7 10.242.109.247 80/tcp,443/tcp Kubernetes worker running.
  containerd/0* active idle 10.242.109.247 Container runtime available
  flannel/0* active idle 10.242.109.247 Flannel subnet 10.1.40.1/24
kubernetes-worker/1* active idle 8 10.242.109.252 80/tcp,443/tcp Kubernetes worker running.
  containerd/3 active idle 10.242.109.252 Container runtime available
  flannel/3 active idle 10.242.109.252 Flannel subnet 10.1.72.1/24
kubernetes-worker/2 active idle 9 10.242.109.162 80/tcp,443/tcp Kubernetes worker running.
  containerd/4 active idle 10.242.109.162 Container runtime available
  flannel/4 active idle 10.242.109.162 Flannel subnet 10.1.56.1/24

Machine State DNS Inst id Series AZ Message
0 started 10.242.109.240 juju-ce2bda-0 focal Running
1 started 10.242.109.210 juju-ce2bda-1 focal Running
2 started 10.242.109.33 juju-ce2bda-2 focal Running
3 started 10.242.109.148 juju-ce2bda-3 focal Running
4 started 10.242.109.251 juju-ce2bda-4 focal Running
5 started 10.242.109.11 juju-ce2bda-5 focal Running
6 started 10.242.109.191 juju-ce2bda-6 focal Running
7 started 10.242.109.247 juju-ce2bda-7 focal Running
8 started 10.242.109.252 juju-ce2bda-8 focal Running
9 started 10.242.109.162 juju-ce2bda-9 focal Running
```
```
AGE seems to reset without RESTARTS incrementing
k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
ingress-nginx-kubernetes-worker default-http-backend-kubernetes-worker-6494cbc7fd-4zls7 0/1 Pending 0 18m
ingress-nginx-kubernetes-worker nginx-ingress-controller-kubernetes-worker-bd5d6 0/1 Pending 0 6m9s
ingress-nginx-kubernetes-worker nginx-ingress-controller-kubernetes-worker-rrjgn 0/1 Pending 0 5m59s
kube-system coredns-7bb4d77796-h5vv9 0/1 Pending 0 19m
kube-system kube-state-metrics-6f586bb967-2dnnq 0/1 Pending 0 19m
kube-system metrics-server-v0.3.6-59bc9c775c-mh4j6 0/2 Pending 0 19m
kubernetes-dashboard dashboard-metrics-scraper-74757fb5b7-sscwd 0/1 Pending 0 19m
kubernetes-dashboard kubernetes-dashboard-64f87676d4-vtx7c 0/1 Pending 0 19m
```

Revision history for this message
svanschalkwyk (step-o) wrote :

Also using this profile
```
lxc profile show juju-k8s
config:
  boot.autostart: "true"
  linux.kernel_modules: ip_tables,ip6_tables,netlink_diag,nf_nat,overlay
  raw.lxc: |
    lxc.apparmor.profile=unconfined
    lxc.mount.auto=proc:rw sys:rw
    lxc.cap.drop=
  security.nesting: "true"
  security.privileged: "true"
  user.user-data: |
    #cloud-config
    ssh_authorized_keys:
      - ssh-rsa AAA******ynQHxTBncb ubuntu@sydney
description: ""
devices:
  aadisable:
    path: /sys/module/nf_conntrack/parameters/hashsize
    source: /dev/null
    type: disk
  aadisable1:
    path: /sys/module/apparmor/parameters/enabled
    source: /dev/null
    type: disk
name: juju-k8s
used_by:
- /1.0/instances/juju-ce2bda-0
- /1.0/instances/juju-ce2bda-1
- /1.0/instances/juju-ce2bda-2
- /1.0/instances/juju-ce2bda-6
- /1.0/instances/juju-ce2bda-4
- /1.0/instances/juju-ce2bda-3
- /1.0/instances/juju-ce2bda-5
- /1.0/instances/juju-ce2bda-7
- /1.0/instances/juju-ce2bda-9
- /1.0/instances/juju-ce2bda-8
```

Revision history for this message
George Kraft (cynerva) wrote :

Please share output of:

kubectl describe po --all-namespaces
kubectl describe nodes
juju ssh kubernetes-worker/0 journalctl -o cat -u snap.kubelet.daemon

Changed in charmed-kubernetes-bundles:
status: New → Incomplete
Revision history for this message
svanschalkwyk (step-o) wrote :

Added describe-nodes.txt
describe-po.txt
kubelet-daemon.txt (partly as it runs into many lines)

Revision history for this message
svanschalkwyk (step-o) wrote :
Download full text (5.2 KiB)

tail -300 /var/log/juju/unit-kubernetes-master-0.log

```
2020-10-20 01:26:01 DEBUG juju.worker.dependency engine.go:564 "upgrade-check-flag" manifold worker started at 2020-10-20 01:26:01.295303721 +0000 UTC
2020-10-20 01:26:01 DEBUG juju.worker.dependency engine.go:564 "migration-fortress" manifold worker started at 2020-10-20 01:26:01.306568767 +0000 UTC
2020-10-20 01:26:01 DEBUG juju.worker.dependency engine.go:564 "migration-minion" manifold worker started at 2020-10-20 01:26:01.317095239 +0000 UTC
2020-10-20 01:26:01 INFO juju.worker.migrationminion worker.go:140 migration phase is now: NONE
2020-10-20 01:26:01 DEBUG juju.worker.dependency engine.go:564 "meter-status" manifold worker started at 2020-10-20 01:26:01.329110796 +0000 UTC
2020-10-20 01:26:01 DEBUG juju.worker.dependency engine.go:564 "api-address-updater" manifold worker started at 2020-10-20 01:26:01.329221345 +0000 UTC
2020-10-20 01:26:01 DEBUG juju.worker.dependency engine.go:564 "proxy-config-updater" manifold worker started at 2020-10-20 01:26:01.329295099 +0000 UTC
2020-10-20 01:26:01 DEBUG juju.worker.dependency engine.go:564 "metric-spool" manifold worker started at 2020-10-20 01:26:01.329421973 +0000 UTC
2020-10-20 01:26:01 DEBUG juju.worker.logger logger.go:64 initial log config: "<root>=DEBUG"
2020-10-20 01:26:01 DEBUG juju.worker.dependency engine.go:564 "logging-config-updater" manifold worker started at 2020-10-20 01:26:01.329524516 +0000 UTC
2020-10-20 01:26:01 DEBUG juju.worker.dependency engine.go:564 "charm-dir" manifold worker started at 2020-10-20 01:26:01.33004534 +0000 UTC
2020-10-20 01:26:01 DEBUG juju.worker.dependency engine.go:564 "leadership-tracker" manifold worker started at 2020-10-20 01:26:01.330076168 +0000 UTC
2020-10-20 01:26:01 INFO juju.worker.logger logger.go:118 logger worker started
2020-10-20 01:26:01 DEBUG juju.worker.leadership tracker.go:125 kubernetes-master/0 making initial claim for kubernetes-master leadership
2020-10-20 01:26:01 DEBUG juju.worker.dependency engine.go:564 "hook-retry-strategy" manifold worker started at 2020-10-20 01:26:01.33159668 +0000 UTC
2020-10-20 01:26:01 DEBUG juju.worker.logger logger.go:92 reconfiguring logging from "<root>=DEBUG" to "<root>=WARNING"
2020-10-20 04:18:36 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: api connection broken unexpectedly
2020-10-20 17:23:05 INFO juju.cmd supercommand.go:91 running jujud [2.8.1 0 16439b3d1c528b7a0e019a16c2122ccfcf6aa41f gc go1.14.4]
2020-10-20 17:23:05 DEBUG juju.cmd supercommand.go:92 args: []string{"/var/lib/juju/tools/unit-kubernetes-master-0/jujud", "unit", "--data-dir", "/var/lib/juju", "--unit-name", "kubernetes-master/0", "--debug"}
2020-10-20 17:23:05 DEBUG juju.agent agent.go:583 read agent config, format "2.0"
2020-10-20 17:23:05 INFO juju.cmd.jujud agent.go:138 setting logging config to "<root>=WARNING"
2020-10-20 17:23:07 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [623655] "unit-kubernetes-master-0" cannot open api: unable to connect to API: dial tcp 10.242.109.209:17070: connect: connection refused
2020-10-20 17:23:12 ERROR juju.worker...

Read more...

Revision history for this message
George Kraft (cynerva) wrote :

From kubectl get nodes, it looks like node status is Unknown and kubelets are crashing/restaring frequently.

> kubelet-daemon.txt (partly as it runs into many lines)

The snippet you attached looks to be pretty early in the logs, before the charm has had a chance to configure Kubelet. I need to see output from later in the logs. Can you grab the last 100 lines or so at least? e.g.

juju ssh kubernetes-worker/0 journalctl -o cat -u snap.kubelet.daemon | tail -n 100

Revision history for this message
svanschalkwyk (step-o) wrote :

juju ssh kubernetes-worker/0 journalctl -o cat -u snap.kubelet.daemon
attached

Revision history for this message
svanschalkwyk (step-o) wrote :

Looks as if it is looking for an AWS Credential Provider?
This is on bare metal.

Revision history for this message
George Kraft (cynerva) wrote :

It looks like the attachment on #6 is the same, incomplete kubelet-daemon.txt. I need to see more of the kubelet logs - can you try again?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.