Comment 0 for bug 1887438

Revision history for this message
Andrew Vaillancourt (availlancourt) wrote : Controller-0 Not Ready after force rebooting active controller (Controller-1))

Brief Description
-----------------
After force rebooting controller-1, controller-0 did not reach 'Ready' status.

Severity
--------
Major

Steps to Reproduce
------------------
Force reboot active controller

Expected Behavior
------------------
Upon rebooting active controller, standby controller takes over in ready state, the system pods, applications, any test pods are up and running.

Actual Behavior
----------------

After force rebooting controller-1, controller-0 did not reach 'Ready' status.

Following pods never reached healthy status:
cert-manager cm-cert-manager-856678cfb7-mmbzn 0/1 Pending 0 4h23m
cert-manager cm-cert-manager-cainjector-85849bd97-7trcg 0/1 Pending 0 4h22m
cert-manager cm-cert-manager-webhook-5745478cbc-8k2m7 0/1 Pending 0 4h22m
kube-system coredns-78d9fd7cb9-7bdw9 0/1 Pending 0 4h22m
kube-system ic-nginx-ingress-default-backend-5ffcfd7744-zr4wj 0/1 Terminating 0 4h23m
kube-system rbd-provisioner-77bfb6dbb-7pglp 0/1 Pending 0 4h22m

Reproducibility
---------------
Unknown

System Configuration
--------------------
Standard System
2 Controllers and 3 Computes
LAB: WCP_71_75

Branch/Pull Time/Commit
-----------------------
BUILD_ID="2020-07-13_00-00-00"
BUILD_DATE="2020-07-13 00:05:40 -0400"

Last Pass
---------
N/A

Timestamp/Logs
--------------
Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Warning FailedScheduling <unknown> default-scheduler 0/5 nodes are available: 1 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate, 1 node(s) had taint {services: disabled}, that the pod didn't tolerate, 3 node(s) didn't match node selector.
  Warning FailedScheduling <unknown> default-scheduler 0/5 nodes are available: 1 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate, 1 node(s) had taint {services: disabled}, that the pod didn't tolerate, 3 node(s) didn't match node selector.
  Warning FailedScheduling <unknown> default-scheduler 0/5 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate, 3 node(s) didn't match node selector.

Test Activity
-------------
System Test Automation Development

Workaround
----------
N/A