Telegraf Charm

prometheus_client relation stopped working in revision 65

Bug #2011618 reported by Guillermo Gonzalez on 2023-03-14

This bug report is a duplicate of: Bug #2008436: The charm stopped working without 'certificates' relation. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Telegraf Charm	Triaged	High	Unassigned

Bug Description

After upgrading from revision 62 to 65 (stable channel), the prometheus_client relation stopped working and new units related to telegraf were not included in the prometheus config.

steps to reproduce:

$ juju deploy prometheus2
Located charm "prometheus2" in charm-hub, revision 48
Deploying "prometheus2" from charm-hub charm "prometheus2", revision 48 in channel stable on focal

$ juju deploy telegraf
Located charm "telegraf" in charm-hub, revision 65
Deploying "telegraf" from charm-hub charm "telegraf", revision 65 in channel stable on focal

$ juju add-relation telegraf:prometheus-client prometheus2:target

$ juju deploy ubuntu
Located charm "ubuntu" in charm-hub, revision 21
Deploying "ubuntu" from charm-hub charm "ubuntu", revision 21 in channel stable on focal

$ juju add-relation telegraf ubuntu

wait until everything is deployed....

$ juju status prometheus2 ubuntu telegraf
Model Controller Cloud/Region Version SLA Timestamp
default local-lxd localhost/localhost 2.9.35 unsupported 15:19:59-03:00

App Version Status Scale Charm Channel Rev Exposed Message
prometheus2 active 1 prometheus2 stable 48 no Ready
telegraf active 1 telegraf stable 65 no Monitoring ubuntu/0 (source version/commit 23.01)
ubuntu 20.04 active 1 ubuntu stable 21 no

Unit Workload Agent Machine Public address Ports Message
prometheus2/0* active idle 14 10.149.216.236 9090/tcp,12321/tcp Ready
ubuntu/0* active idle 15 10.149.216.49
telegraf/0* active idle 10.149.216.49 9103/tcp Monitoring ubuntu/0 (source version/commit 23.01)

Machine State Address Inst id Series AZ Message
14 started 10.149.216.236 juju-92a939-14 focal Running
15 started 10.149.216.49 juju-92a939-15 focal Running

$ juju ssh prometheus2/0
Welcome to Ubuntu 20.04.5 LTS (GNU/Linux 6.2.0-76060200-generic x86_64)
...
ubuntu@juju-92a939-14:~$ sudo cat /var/snap/prometheus/current/prometheus.yml
# my global config
global:
  scrape_interval: 15s # default scrape_interval
  evaluation_interval: 15s # default evaluation_interval
  scrape_timeout: 15s # default scrape_timeout, must be <= scrape_interval

  # Attach these extra labels to all timeseries collected by this Prometheus instance.
  external_labels:
      monitor: prometheus2-monitor

rule_files:
- /var/snap/prometheus/current/generic.rules

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    metrics_path: '/metrics'
    # scheme defaults to 'http'.
    static_configs:
      - targets: ['10.149.216.236:9090']

# services related to the 'scrape' endpoint

# static-targets

# related services (eg collectd)

# related manual jobs

# config manual jobs

# federate jobsubuntu@juju-92a939-14:~$
ubuntu@juju-92a939-14:~$

Result: a bare prometheus config file, without any targets except prometheus itself.

Expected: to have an entry for the telegraf unit just related to prometheus

if telegraf is reverted to revision 62, it works as expected:

$ juju upgrade-charm telegraf --switch cs:telegraf-62
Added charm-store charm "telegraf", revision 62 in channel stable, to the model
Leaving endpoints in "alpha": amqp, apache, dashboards, elasticsearch, exec, haproxy, influxdb-api, juju-info, memcached, mongodb, mysql, mysql-monitor, nrpe-external-master, postfix, postgresql, prometheus-client, prometheus-rules, redis, sentry

$ juju ssh prometheus2/0
Welcome to Ubuntu 20.04.5 LTS (GNU/Linux 6.2.0-76060200-generic x86_64)
...
Last login: Tue Mar 14 18:18:17 2023 from 10.149.216.1
ubuntu@juju-92a939-14:~$ sudo cat /var/snap/prometheus/current/prometheus.yml
# my global config
global:
  scrape_interval: 15s # default scrape_interval
  evaluation_interval: 15s # default evaluation_interval
  scrape_timeout: 15s # default scrape_timeout, must be <= scrape_interval

  # Attach these extra labels to all timeseries collected by this Prometheus instance.
  external_labels:
      monitor: prometheus2-monitor

rule_files:
- /var/snap/prometheus/current/generic.rules

# services related to the 'scrape' endpoint

# static-targets

# related services (eg collectd)
- job_name: 'telegraf'

    static_configs:
      - targets: ['10.149.216.49:9103']
        labels:
          group: 'promoagents-juju'
          dns_name: 'juju-92a939-15.lxd'

# related manual jobs

# config manual jobs

# federate jobsubuntu@juju-92a939-14:~$

Tags:

Andrea Ieri (aieri) on 2023-03-14

Changed in charm-telegraf:
status:	New → Triaged
importance:	Undecided → High
tags:	added: bseng-1007

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #2008436 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.