Comment 1 for bug 1884469

Revision history for this message
Yang Liu (yliu12) wrote :

# system looked healthy when this happened.
System was alarm free, all nodes were unlocked-enabled via system host-list, and all nodes were ready via kubectl get nodes -n deployment.

# rbd pod did not like controller-1, because it was tainted for some reason. App apply worked after removing the taint.

[sysadmin@controller-0 ~(keystone_admin)]$ kubectl get pods --all-namespaces -o wide | grep rbd
kube-system rbd-provisioner-77bfb6dbb-k98vh 0/1 Pending 0 26m <none> <none> <none> <none>
kube-system rbd-provisioner-77bfb6dbb-vm242 1/1 Running 1 26m dead:beef::8e22:765f:6121:eb5b controller-0 <none> <none>

Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Warning FailedScheduling 36s (x21 over 26m) default-scheduler 0/4 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match node selector.
[sysadmin@controller-0 ~(keystone_admin)]$

[sysadmin@controller-0 ~(keystone_admin)]$ kubectl describe node controller-1 | grep Taints
Taints: node-role.kubernetes.io/master:NoSchedule
[sysadmin@controller-0 ~(keystone_admin)]$ kubectl describe node controller-0 | grep Taints
Taints: <none>

# Workaround:
kubectl taint node controller-1 node-role.kubernetes.io/master:NoSchedule-

Following app apply already failed before the workaround:
[sysadmin@controller-0 ~(keystone_admin)]$ ls -lrt /var/log/armada/platform-integ-apps-apply_2020-06-2*
-rw-r--r-- 1 1000 users 21784 Jun 20 16:00 /var/log/armada/platform-integ-apps-apply_2020-06-20-15-30-07.log
-rw-r--r-- 1 1000 users 21784 Jun 20 16:30 /var/log/armada/platform-integ-apps-apply_2020-06-20-16-00-39.log
-rw-r--r-- 1 1000 users 21784 Jun 20 17:01 /var/log/armada/platform-integ-apps-apply_2020-06-20-16-31-11.log
-rw-r--r-- 1 1000 users 21784 Jun 20 17:31 /var/log/armada/platform-integ-apps-apply_2020-06-20-17-01-43.log
-rw-r--r-- 1 1000 users 21784 Jun 20 18:02 /var/log/armada/platform-integ-apps-apply_2020-06-20-17-32-15.log
-rw-r--r-- 1 1000 users 21987 Jun 21 20:17 /var/log/armada/platform-integ-apps-apply_2020-06-21-19-47-22.log
-rw-r--r-- 1 1000 users 22122 Jun 21 20:47 /var/log/armada/platform-integ-apps-apply_2020-06-21-20-17-54.log
-rw-r--r-- 1 1000 users 22122 Jun 21 21:18 /var/log/armada/platform-integ-apps-apply_2020-06-21-20-48-26.log
-rw-r--r-- 1 1000 users 9154 Jun 21 21:19 /var/log/armada/platform-integ-apps-apply_2020-06-21-21-18-58.log

[sysadmin@controller-0 ~(keystone_admin)]$ ls -lrt /var/log/armada/oidc-auth-apps-apply_2020-*
-rw-r--r-- 1 1000 users 19243 Jun 20 21:22 /var/log/armada/oidc-auth-apps-apply_2020-06-20-20-52-52.log