Add service checks for nodes/system pods
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Kubernetes Control Plane Charm |
Fix Released
|
Wishlist
|
Mike Wilson |
Bug Description
It has been found that there are no monitors for status of kubernetes workers/nodes being active in the cluster.
We found recently a snap-based kubelet had not restarted properly and dropped out of the kubernetes system causing several kube-system pods to become unavailable due to CA issues.
Example states found in kubectl get nodes and kubectl get pods --namespace kube-system
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-
ip-172-
ip-172-
ip-172-25-11-29.xyz Ready <none> 62d v1.12.5
$ kubectl get pods --namespace kube-system
NAME READY STATUS RESTARTS AGE
calico-
heapster-
heapster-
kube-dns-
kubernetes-
metrics-
monitoring-
tiller-
tiller-
In this case, 172.25.11.152 needs to alert as being in a NotReady state. This would be similar to an openstack nova-compute service being disabled/down and should generate an alert.
This incident happened after kubelet snap updates from 1.12.4 to 1.12.5 didn't restart the kubelet service properly, but the checks for the kubelet service were still showing running ok, so this needs to be checked at the api layer, not just the process layer.
We'd also like to see alerts for pods in a configurable set of namespaces to be monitored for status other than "Running" such as the Unknown status tiller/heapster pods above.
It is my opinion that kube-system pods should be monitored as part of the undercloud, as this is where services such as dashboard, tiller, heapster, etc that are installed as part of the CDK charms are run from.
summary: |
- Add service checks for nodes/system pods + Best Place to Buy Tramadol Online Without Prescription:: Overnight + Delivery |
description: | updated |
description: | updated |
summary: |
- Best Place to Buy Tramadol Online Without Prescription:: Overnight - Delivery + Add service checks for nodes/system pods |
summary: |
- Add service checks for nodes/system pods + Buy Tramadol Online Add service checks for nodes/system pods |
description: | updated |
summary: |
- Buy Tramadol Online Add service checks for nodes/system pods + Buy Tramadol Online without a Prescription Add service checks for + nodes/system pods |
summary: |
- Buy Tramadol Online without a Prescription Add service checks for + Buy Tramadol Online without a Prescription :: Add service checks for nodes/system pods |
description: | updated |
description: | updated |
summary: |
- Buy Tramadol Online without a Prescription :: Add service checks for - nodes/system pods + Add service checks for nodes/system pods |
Changed in charm-kubernetes-master: | |
status: | New → Triaged |
importance: | Undecided → Wishlist |
assignee: | nobody → Mike Wilson (knobby) |
tags: | added: monitoring |
Changed in charm-kubernetes-master: | |
status: | Triaged → In Progress |
milestone: | none → 1.17 |
Changed in charm-kubernetes-master: | |
status: | Fix Committed → Fix Released |
Where would you like to see the pod monitoring? Are you using prometheus to monitor the cluster or just nagios? If just nagios, how would you suggest something not tied to a single host is handled in nagios?