prometheus unit stuck in "maintenance", with no hook running
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Prometheus Charm |
Fix Released
|
High
|
Unassigned |
Bug Description
Today during a deploy I discovered my environment was failing to settle because:
prometheus/0* maintenance idle 2 10.25.2.224 9090/tcp,12321/tcp Updating configuration
but without any config-changed hook actively running.
I did a manual "juju run --unit prometheus/0 status-set active", which unstuck things, but it's not clear how the unit ended up in the state in the first place.
One thing that did occur earlier was that scrape-jobs was set to a malformed (but possibly still valid YAML) value, but that happened much earlier than the last update:
application
current: maintenance
message: Updating configuration
since: 27 Feb 2017 02:04:17Z
but maybe I'm not interpreting "since" correctly. Or maybe the reactive framework thought it should still be in maintenance and therefore keeps applying status-set, updating "since", but the unit is still in "active" and has not returned to "maintenance". Shruggity shrug.
Related branches
- Jacek Nykis (community): Approve
-
Diff: 12 lines (+1/-0)1 file modifiedreactive/prometheus.py (+1/-0)
summary: |
- prometheus unit stuck in "maintenance" with no hookl running + prometheus unit stuck in "maintenance", with no hook running |
Changed in prometheus-charm: | |
status: | Triaged → Fix Committed |
Changed in prometheus-charm: | |
status: | Fix Committed → Fix Released |
The handler that sets the status to active never ran, which indicates other handlers that should be run have not run. The unit is in an unknown state, despite having papered over the problem.