Installation crash -> Waiting for 3 kube-system pods to start

Bug #1918160 reported by Cezary Wagner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Charmed Kubernetes Bundles
Undecided
Unassigned

Bug Description

https://askubuntu.com/questions/1322084/installing-ubuntu-kubernates-1-20-on-localhost-documentation-lead-to-error

I was created fresh Ubuntu 20.03 on VirtualBox than started installation using documentation. Machine have Quadro RTX Card (maybe it impacts?).

Documentation is messy because there is many ways to do - whatever I checked almost all ways and results are very bad because it was never lead to success. Can you help with this installation or maybe Ubuntu Kubernetes not works and it is waste of time?

Is it extra logs is need?

Invalid documentation leading to this errors:

    https://ubuntu.com/kubernetes/install#multi-node
    https://ubuntu.com/kubernetes/docs/install-manual
    https://ubuntu.com/kubernetes/docs/install-local

First of all I try:

sudo snap install lxd

sudo lxd init
# all default apart:
# dir
# ipv6 none

sudo snap install juju --classic
juju bootstrap localhost

juju add-model k8s
juju deploy charmed-kubernetes

Result is 10 machines and not working masters :)

kubernetes-master/0* waiting idle 5 10.184.167.240 6443/tcp Waiting for 3 kube-system pods to start
  containerd/2 active idle 10.184.167.240 Container runtime available
  flannel/2 active idle 10.184.167.240 Flannel subnet 10.1.1.1/24
kubernetes-master/1 waiting idle 6 10.184.167.89 6443/tcp Waiting for 3 kube-system pods to start
  containerd/3 active idle 10.184.167.89 Container runtime available
  flannel/3 active idle 10.184.167.89 Flannel subnet 10.1.43.1/24

Full status is:

juju status --color
Model Controller Cloud/Region Version SLA Timestamp
k8s-production localhost-localhost localhost/localhost 2.8.9 unsupported 17:00:24+01:00

App Version Status Scale Charm Store Rev OS Notes
containerd 1.3.3 active 5 containerd jujucharms 102 ubuntu
easyrsa 3.0.1 active 1 easyrsa jujucharms 345 ubuntu
etcd 3.4.5 active 3 etcd jujucharms 553 ubuntu
flannel 0.11.0 active 5 flannel jujucharms 518 ubuntu
kubeapi-load-balancer 1.18.0 active 1 kubeapi-load-balancer jujucharms 757 ubuntu exposed
kubernetes-master 1.20.4 waiting 2 kubernetes-master jujucharms 955 ubuntu
kubernetes-worker 1.20.4 active 3 kubernetes-worker jujucharms 726 ubuntu exposed

Unit Workload Agent Machine Public address Ports Message
easyrsa/0* active idle 0 10.184.167.48 Certificate Authority connected.
etcd/0* active idle 1 10.184.167.23 2379/tcp Healthy with 3 known peers
etcd/1 active idle 2 10.184.167.180 2379/tcp Healthy with 3 known peers
etcd/2 active idle 3 10.184.167.194 2379/tcp Healthy with 3 known peers
kubeapi-load-balancer/0* active idle 4 10.184.167.106 443/tcp Loadbalancer ready.
kubernetes-master/0* waiting idle 5 10.184.167.240 6443/tcp Waiting for 3 kube-system pods to start
  containerd/2 active idle 10.184.167.240 Container runtime available
  flannel/2 active idle 10.184.167.240 Flannel subnet 10.1.1.1/24
kubernetes-master/1 waiting idle 6 10.184.167.89 6443/tcp Waiting for 3 kube-system pods to start
  containerd/3 active idle 10.184.167.89 Container runtime available
  flannel/3 active idle 10.184.167.89 Flannel subnet 10.1.43.1/24
kubernetes-worker/0* active idle 7 10.184.167.52 80/tcp,443/tcp Kubernetes worker running.
  containerd/0* active idle 10.184.167.52 Container runtime available
  flannel/0* active idle 10.184.167.52 Flannel subnet 10.1.20.1/24
kubernetes-worker/1 active idle 8 10.184.167.226 80/tcp,443/tcp Kubernetes worker running.
  containerd/4 active idle 10.184.167.226 Container runtime available
  flannel/4 active idle 10.184.167.226 Flannel subnet 10.1.6.1/24
kubernetes-worker/2 active idle 9 10.184.167.158 80/tcp,443/tcp Kubernetes worker running.
  containerd/1 active idle 10.184.167.158 Container runtime available
  flannel/1 active idle 10.184.167.158 Flannel subnet 10.1.12.1/24

Machine State DNS Inst id Series AZ Message
0 started 10.184.167.48 juju-c4f295-0 focal Running
1 started 10.184.167.23 juju-c4f295-1 focal Running
2 started 10.184.167.180 juju-c4f295-2 focal Running
3 started 10.184.167.194 juju-c4f295-3 focal Running
4 started 10.184.167.106 juju-c4f295-4 focal Running
5 started 10.184.167.240 juju-c4f295-5 focal Running
6 started 10.184.167.89 juju-c4f295-6 focal Running
7 started 10.184.167.52 juju-c4f295-7 focal Running
8 started 10.184.167.226 juju-c4f295-8 focal Running
9 started 10.184.167.158 juju-c4f295-9 focal Running

Same with other sequencing from documentation all lead to nothing. Maybe I am doing something wrong maybe Ubuntu Kubernetes just not works on localhost.

juju add-model k8s-production
juju deploy cs:bundle/charmed-kubernetes-596

Even this small deployment not works (installation never ends).

juju add-model k8s-development
juju deploy cs:bundle/kubernetes-core-1200

Revision history for this message
George Kraft (cynerva) wrote :

I think you might be hitting https://bugs.launchpad.net/charm-kubernetes-worker/+bug/1903566

Can you please check the kubelet logs:

juju ssh kubernetes-worker/0 -- journalctl -o cat -u snap.kubelet.daemon

and look for this error?

Nov 06 10:28:17 juju-431bee-1-lxd-2 kubelet.daemon[253561]: F1106 16:28:17.063216 253561 kubelet.go:1296] Failed to start ContainerManager [invalid kernel flag: kernel/panic, expected value: 10, actual value: 0, invalid kernel flag: kernel/panic_on_oops, expected value: 1, actual value: 0, invalid kernel flag: vm/overcommit_memory, expected value: 1, actual value: 0]

If you see that error, you can work around it with:

juju config kubernetes-worker kubelet-extra-config='{protectKernelDefaults: false}'

Changed in charmed-kubernetes-bundles:
status: New → Incomplete
Revision history for this message
Cezary Wagner (cezary-wagner) wrote :
Download full text (4.2 KiB)

Pattern is found but solution not works.

cezary@ubuntu64:~$ juju ssh kubernetes-worker/0 -- journalctl --no-pager -o cat -u snap.kubelet.daemon | grep "Failed to start ContainerManager" | wc
Connection to 10.251.0.85 closed.
    319 12122 103675

After "juju config kubernetes-worker kubelet-extra-config='{protectKernelDefaults: false}'".
Errors "Failed to start ContainerManager" removed but it not works again with strange messages.

Ubuntu is restarted before test.

0309 17:32:29.088367 12174 kubelet.go:1368] Failed to start ContainerManager [invalid kernel flag: vm/overcommit_memory, expected value: 1, actual value: 0, invalid kernel flag: kernel/panic, expected value: 10, actual value: 0, invalid kernel flag: kernel/panic_on_oops, expected value: 1, actual value: 0]
Connection to 10.251.0.85 closed.
cezary@ubuntu64:~$ date -u
wto, 9 mar 2021, 17:40:13 UTC

Strange messages is: "Waiting for 2 kube-system pods to start" only on one master.

Full screen:
cezary@ubuntu64:~$ juju status
Model Controller Cloud/Region Version SLA Timestamp
kubernetes localhost-localhost localhost/localhost 2.8.9 unsupported 18:44:17+01:00

App Version Status Scale Charm Store Rev OS Notes
containerd 1.3.3 active 5 containerd jujucharms 102 ubuntu
easyrsa 3.0.1 active 1 easyrsa jujucharms 345 ubuntu
etcd 3.4.5 active 3 etcd jujucharms 553 ubuntu
flannel 0.11.0 active 5 flannel jujucharms 518 ubuntu
kubeapi-load-balancer 1.18.0 active 1 kubeapi-load-balancer jujucharms 757 ubuntu exposed
kubernetes-master 1.20.4 waiting 2 kubernetes-master jujucharms 955 ubuntu
kubernetes-worker 1.20.4 active 3 kubernetes-worker jujucharms 726 ubuntu exposed

Unit Workload Agent Machine Public address Ports Message
easyrsa/0* active idle 0 10.251.0.162 Certificate Authority connected.
etcd/0* active idle 1 10.251.0.160 2379/tcp Healthy with 3 known peers
etcd/1 active idle 2 10.251.0.124 2379/tcp Healthy with 3 known peers
etcd/2 active idle 3 10.251.0.203 2379/tcp Healthy with 3 known peers
kubeapi-load-balancer/0* active idle 4 10.251.0.126 443/tcp Loadbalancer ready.
kubernetes-master/0* waiting idle 5 10.251.0.74 6443/tcp Waiting for 2 kube-system pods to start
  containerd/2 active idle 10.251.0.74 Container runtime available
  flannel/2 active idle 10.251.0.74 Flannel subnet 10.1.77.1/24
kubernetes-master/1 active idle 6 10.251.0.86 6443/tcp Kubernetes master running.
  containerd/3 active idle 10.251.0.86 Container runtime available
  f...

Read more...

Revision history for this message
Cezary Wagner (cezary-wagner) wrote :

Still strange message on one master (not 3 but 2) - why?

Strange messages is: "Waiting for 2 kube-system pods to start" only on one master.

Revision history for this message
Cezary Wagner (cezary-wagner) wrote :
Download full text (4.1 KiB)

Magic - after some time all master works - I was not tested all but it is progress :)

cezary@ubuntu64:~$ juju status
Model Controller Cloud/Region Version SLA Timestamp
kubernetes localhost-localhost localhost/localhost 2.8.9 unsupported 18:48:31+01:00

App Version Status Scale Charm Store Rev OS Notes
containerd 1.3.3 active 5 containerd jujucharms 102 ubuntu
easyrsa 3.0.1 active 1 easyrsa jujucharms 345 ubuntu
etcd 3.4.5 active 3 etcd jujucharms 553 ubuntu
flannel 0.11.0 active 5 flannel jujucharms 518 ubuntu
kubeapi-load-balancer 1.18.0 active 1 kubeapi-load-balancer jujucharms 757 ubuntu exposed
kubernetes-master 1.20.4 active 2 kubernetes-master jujucharms 955 ubuntu
kubernetes-worker 1.20.4 active 3 kubernetes-worker jujucharms 726 ubuntu exposed

Unit Workload Agent Machine Public address Ports Message
easyrsa/0* active idle 0 10.251.0.162 Certificate Authority connected.
etcd/0* active idle 1 10.251.0.160 2379/tcp Healthy with 3 known peers
etcd/1 active idle 2 10.251.0.124 2379/tcp Healthy with 3 known peers
etcd/2 active idle 3 10.251.0.203 2379/tcp Healthy with 3 known peers
kubeapi-load-balancer/0* active idle 4 10.251.0.126 443/tcp Loadbalancer ready.
kubernetes-master/0* active idle 5 10.251.0.74 6443/tcp Kubernetes master running.
  containerd/2 active idle 10.251.0.74 Container runtime available
  flannel/2 active idle 10.251.0.74 Flannel subnet 10.1.77.1/24
kubernetes-master/1 active idle 6 10.251.0.86 6443/tcp Kubernetes master running.
  containerd/3 active idle 10.251.0.86 Container runtime available
  flannel/3 active idle 10.251.0.86 Flannel subnet 10.1.88.1/24
kubernetes-worker/0 active idle 7 10.251.0.85 80/tcp,443/tcp Kubernetes worker running.
  containerd/0* active idle 10.251.0.85 Container runtime available
  flannel/0* active idle 10.251.0.85 Flannel subnet 10.1.64.1/24
kubernetes-worker/1* active idle 8 10.251.0.19 80/tcp,443/tcp Kubernetes worker running.
  containerd/4 active idle 10.251.0.19 Container runtime available
  flannel/4 active idle 10.251.0.19 Flannel subnet 10.1.52.1/24
kubernetes-worker/2 active idle 9 10.251.0.16 80/tcp,443/tcp Kubernetes worker running.
  containerd/1 ...

Read more...

Revision history for this message
George Kraft (cynerva) wrote :

Marking this as a duplicate of 1903566

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers