prometheus_client relation stopped working in revision 65

Bug #2011618 reported by Guillermo Gonzalez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Telegraf Charm
Triaged
High
Unassigned

Bug Description

After upgrading from revision 62 to 65 (stable channel), the prometheus_client relation stopped working and new units related to telegraf were not included in the prometheus config.

steps to reproduce:

$ juju deploy prometheus2
Located charm "prometheus2" in charm-hub, revision 48
Deploying "prometheus2" from charm-hub charm "prometheus2", revision 48 in channel stable on focal

$ juju deploy telegraf
Located charm "telegraf" in charm-hub, revision 65
Deploying "telegraf" from charm-hub charm "telegraf", revision 65 in channel stable on focal

$ juju add-relation telegraf:prometheus-client prometheus2:target

$ juju deploy ubuntu
Located charm "ubuntu" in charm-hub, revision 21
Deploying "ubuntu" from charm-hub charm "ubuntu", revision 21 in channel stable on focal

$ juju add-relation telegraf ubuntu

wait until everything is deployed....

$ juju status prometheus2 ubuntu telegraf
Model Controller Cloud/Region Version SLA Timestamp
default local-lxd localhost/localhost 2.9.35 unsupported 15:19:59-03:00

App Version Status Scale Charm Channel Rev Exposed Message
prometheus2 active 1 prometheus2 stable 48 no Ready
telegraf active 1 telegraf stable 65 no Monitoring ubuntu/0 (source version/commit 23.01)
ubuntu 20.04 active 1 ubuntu stable 21 no

Unit Workload Agent Machine Public address Ports Message
prometheus2/0* active idle 14 10.149.216.236 9090/tcp,12321/tcp Ready
ubuntu/0* active idle 15 10.149.216.49
  telegraf/0* active idle 10.149.216.49 9103/tcp Monitoring ubuntu/0 (source version/commit 23.01)

Machine State Address Inst id Series AZ Message
14 started 10.149.216.236 juju-92a939-14 focal Running
15 started 10.149.216.49 juju-92a939-15 focal Running

$ juju ssh prometheus2/0
Welcome to Ubuntu 20.04.5 LTS (GNU/Linux 6.2.0-76060200-generic x86_64)
...
ubuntu@juju-92a939-14:~$ sudo cat /var/snap/prometheus/current/prometheus.yml
# my global config
global:
  scrape_interval: 15s # default scrape_interval
  evaluation_interval: 15s # default evaluation_interval
  scrape_timeout: 15s # default scrape_timeout, must be <= scrape_interval

  # Attach these extra labels to all timeseries collected by this Prometheus instance.
  external_labels:
      monitor: prometheus2-monitor

rule_files:
    - /var/snap/prometheus/current/generic.rules

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    metrics_path: '/metrics'
    # scheme defaults to 'http'.
    static_configs:
      - targets: ['10.149.216.236:9090']

# services related to the 'scrape' endpoint

# static-targets

# related services (eg collectd)

# related manual jobs

# config manual jobs

# federate jobsubuntu@juju-92a939-14:~$
ubuntu@juju-92a939-14:~$

Result: a bare prometheus config file, without any targets except prometheus itself.

Expected: to have an entry for the telegraf unit just related to prometheus

if telegraf is reverted to revision 62, it works as expected:

$ juju upgrade-charm telegraf --switch cs:telegraf-62
Added charm-store charm "telegraf", revision 62 in channel stable, to the model
Leaving endpoints in "alpha": amqp, apache, dashboards, elasticsearch, exec, haproxy, influxdb-api, juju-info, memcached, mongodb, mysql, mysql-monitor, nrpe-external-master, postfix, postgresql, prometheus-client, prometheus-rules, redis, sentry

$ juju ssh prometheus2/0
Welcome to Ubuntu 20.04.5 LTS (GNU/Linux 6.2.0-76060200-generic x86_64)
...
Last login: Tue Mar 14 18:18:17 2023 from 10.149.216.1
ubuntu@juju-92a939-14:~$ sudo cat /var/snap/prometheus/current/prometheus.yml
# my global config
global:
  scrape_interval: 15s # default scrape_interval
  evaluation_interval: 15s # default evaluation_interval
  scrape_timeout: 15s # default scrape_timeout, must be <= scrape_interval

  # Attach these extra labels to all timeseries collected by this Prometheus instance.
  external_labels:
      monitor: prometheus2-monitor

rule_files:
    - /var/snap/prometheus/current/generic.rules

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    metrics_path: '/metrics'
    # scheme defaults to 'http'.
    static_configs:
      - targets: ['10.149.216.236:9090']

# services related to the 'scrape' endpoint

# static-targets

# related services (eg collectd)
  - job_name: 'telegraf'

    static_configs:
      - targets: ['10.149.216.49:9103']
        labels:
          group: 'promoagents-juju'
          dns_name: 'juju-92a939-15.lxd'

# related manual jobs

# config manual jobs

# federate jobsubuntu@juju-92a939-14:~$

Tags: bseng-1007
Andrea Ieri (aieri)
Changed in charm-telegraf:
status: New → Triaged
importance: Undecided → High
tags: added: bseng-1007
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.