Prometheus2 charm

Bug #1899706
Comment #2

Comment 2 for bug 1899706

Revision history for this message

Drew Freiberger (afreiberger) wrote on 2020-10-14:

When I looked at the code from a charm-prometheus2 support perspective, my take was that the kubernetes-master charm will set the prometheus manual jobs up on leader-elected[1], but, the demoted leader does not stop advertising it's jobs.

Because the charm-prometheus2 is architected to monitor each unit of a related application, this is why the tagging of the request UUID to ensure each unit that is announcing to be monitored is monitored individually. Imagine 100 units of telegraf with an additional manual-job metric for a per-unit counter.

In the instance of kubernetes-master charm providing details for monitoring a single pod (per manual-job) behind the 3 HA k8s-master, there is only need for one monitoring endpoint for kubernetes-master (announced to be monitored through the kubeapi-load-balancer, IIRC), but multiple units advertising to prometheus to be monitored, the data is duplicated.

I believe the solution is to add a function to remove the prometheus manual scrape job details from the relation @when_not('leadership.is_leader'). basically, the opposite of this function when leadership is lost:

[1]: https://github.com/charmed-kubernetes/charm-kubernetes-master/blob/master/reactive/kubernetes_master.py#L3128