[2.7.1] collect-metrics hook failures are not visible in `juju status`

Bug #1863365 reported by Dmitrii Shcherbakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned

Bug Description

Scenario:

1) collect-metrics hook runs and fails;
2) juju status doesn't reflect a non-zero exit code.

I understand that metric hooks currently have different execution semantics but they fail silently nevertheless.

juju status
Model Controller Cloud/Region Version SLA Timestamp
default localhost-localhost localhost/localhost 2.7.1 unsupported 20:38:33+03:00

App Version Status Scale Charm Store Rev OS Notes
cockroachdb active 3 cockroachdb local 2 ubuntu

Unit Workload Agent Machine Public address Ports Message
cockroachdb/0* active idle 0 10.209.240.121
cockroachdb/1 active idle 1 10.209.240.55
cockroachdb/2 active idle 2 10.209.240.19

Machine State DNS Inst id Series AZ Message
0 started 10.209.240.121 juju-5285e0-0 bionic Running
1 started 10.209.240.55 juju-5285e0-1 bionic Running
2 started 10.209.240.19 juju-5285e0-2 bionic Running

unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics Traceback (most recent call last):
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics File "/var/lib/juju/agents/unit-cockroachdb-1/charm/hooks/collect-metrics", line 217, in <module>
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics main(CockroachDBCharm)
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics File "lib/ops/main.py", line 178, in main
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics _emit_charm_event(charm, juju_event_name)
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics File "lib/ops/main.py", line 107, in _emit_charm_event
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics event_to_emit.emit(*args, **kwargs)
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics File "lib/ops/framework.py", line 178, in emit
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics framework._emit(event)
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics File "lib/ops/framework.py", line 588, in _emit
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics self._reemit(event_path)
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics File "lib/ops/framework.py", line 623, in _reemit
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics custom_handler(event)
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics File "/var/lib/juju/agents/unit-cockroachdb-1/charm/hooks/collect-metrics", line 78, in on_collect_metrics
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics event.add_metrics({'=dead=beef=': 'cafe'})
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics File "lib/ops/charm.py", line 79, in add_metrics
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics self.framework.model._backend.add_metric(metrics, labels)
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics File "lib/ops/model.py", line 780, in add_metric
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics self._run(*cmd)
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics File "lib/ops/model.py", line 638, in _run
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics raise ModelError(e.stderr)
unit-cockroachdb-1: 20:36:43 DEBUG unit.unit-cockroachdb-1.collect-metrics ops.model.ModelError: b'ERROR invalid metrics: expected "key=value", got "=dead=beef==cafe"\n'

Revision history for this message
Richard Harding (rharding) wrote :

This one will take some thought. In theory metrics are a bit invisible to the operator and having things pop up in status that they can't do much about isn't great. However, there's a gap when things fail silently and are not obvious.

Changed in juju:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: Medium → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.