The exporter service is constantly restarted every 5 mins

Bug #2029445 reported by Hua Zhang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Prometheus Openstack Exporter Charm
Fix Released
Undecided
Hua Zhang

Bug Description

One customer got an alert on nagios which reported the content:

service: fcb-sbibits-cc1-prometheus-openstack-exporter-0-prometheus_openstack_exporter_http
Status Information: CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds.

We identified it as a result of the exporter service consatantly restarting every 5 mins,

2023-06-22 02:16:25 INFO unit.prometheus-openstack-exporter/5.juju-log server.go:316 Restarting snap.prometheus-openstack-exporter.prometheus-openstack-exporter.service, config file changed...
2023-06-22 02:21:27 INFO unit.prometheus-openstack-exporter/5.juju-log server.go:316 Restarting snap.prometheus-openstack-exporter.prometheus-openstack-exporter.service, config file changed...
2023-06-22 02:27:22 INFO unit.prometheus-openstack-exporter/5.juju-log server.go:316 Restarting snap.prometheus-openstack-exporter.prometheus-openstack-exporter.service, config file changed...

I can't reproduce the problem by:

./generate-bundle.sh -s bionic --name bionic --num-compute 1 --vault --use-stable-charms --revision-info ~/xxx_juju_status --grafana --nagios --run

I was using latest/stable 0.1.7 (revision 28), but the customer is using latest/candidate 1.1.10 (revision 23), so then I switched to 1.1.10 by:

juju refresh prometheus-openstack-exporter --channel=candidate
juju config prometheus-openstack-exporter snap_channel=latest/candidate

but I can't reproduce the problem as well after switching to 1.1.10.

Howerver, the customer is indeed hitting this issue, and the following log can confirm the handler do_restart was called by the hook update-status every 5 mins.

$ sudo grep -r -E 'do_restart|Reactive main running for hook update-status' var/log/juju/unit-prometheus-openstack-exporter-5.log |tail -n6
2023-06-22 02:41:50 INFO unit.prometheus-openstack-exporter/5.juju-log server.go:316 Reactive main running for hook update-status
2023-06-22 02:41:51 INFO unit.prometheus-openstack-exporter/5.juju-log server.go:316 Invoking reactive handler: reactive/openstack_exporter.py:216:do_restart
2023-06-22 02:47:12 INFO unit.prometheus-openstack-exporter/5.juju-log server.go:316 Reactive main running for hook update-status
2023-06-22 02:47:13 INFO unit.prometheus-openstack-exporter/5.juju-log server.go:316 Invoking reactive handler: reactive/openstack_exporter.py:216:do_restart
2023-06-22 02:51:54 INFO unit.prometheus-openstack-exporter/5.juju-log server.go:316 Reactive main running for hook update-status
2023-06-22 02:51:55 INFO unit.prometheus-openstack-exporter/5.juju-log server.go:316 Invoking reactive handler: reactive/openstack_exporter.py:216:do_restart

Tags: sts

Related branches

Hua Zhang (zhhuabj)
tags: added: sts
Eric Chen (eric-chen)
Changed in charm-prometheus-openstack-exporter:
assignee: nobody → Hua Zhang (zhhuabj)
status: New → Fix Committed
milestone: none → 23.10
Tianqi Xiao (txiao)
Changed in charm-prometheus-openstack-exporter:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.