Handle telegraf down gracefully; Add pushing metrics for last successful test run
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
APT Stress Test Charm |
Fix Released
|
Medium
|
Haw Loeung |
Bug Description
Hi,
On an environment with issues with telegraf, apt-stresstest fails but not gracefully:
| apt-stresstest@
| apt_transaction
| DEBUG: Sending data to influxdb: 127.0.0.1:8094
| Traceback (most recent call last):
| File "/usr/local/
| sys.exit(main())
| File "/usr/local/
| run_tests(
| File "/usr/local/
| output_results(
| File "/usr/local/
| send_to_
| File "/usr/local/
| s.connect((host, port))
| ConnectionRefus
I think the charm should push metrics for last successful run. We can then add prometheus alerting rules to catch those not run with say the last hour.
Related branches
- Colin Misare: Approve
- Canonical IS Reviewers: Pending requested
-
Diff: 23 lines (+8/-4)1 file modifiedfiles/test_apt_mirrors.py (+8/-4)
summary: |
- Add pushing metrics for last successful test run + Handle telegraf down gracefully; Add pushing metrics for last successful + test run |
Changed in apt-stresstest-charm: | |
status: | New → In Progress |
importance: | Undecided → Medium |
assignee: | nobody → Haw Loeung (hloeung) |
Changed in apt-stresstest-charm: | |
status: | Fix Committed → Fix Released |
The charm already pushes metrics for successful runs - apt_transaction _total_ duration_ seconds and apt_mirror_ units_count