NTP charm should have tunable alert threshold(s)

Bug #2012752 reported by Paul Goins
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
NTP Charm
Confirmed
Wishlist
Gabriel Cocenza

Bug Description

Sometimes, a customer's corporate NTP servers may be more or less in sync (and likely close enough for the purposes of applications requiring accurate time, e.g. OpenStack), but not close enough to avoid alerts re: clock skew.

It would be good if we had some method to tune this threshold at a minimum, and perhaps others if there's other things which would be appropriate to tune. Obviously, it'd be best for the time servers in question to be brought closer into sync, but if the customer is not able to address the issue, it would be nice if we had a way to reduce the occurance of noise from alerts driven by expected levels of offset skew.

For the sake of this bug, I am interested primarily in the offset warning/critical messages like this:

  WARNING: offset is out of range (0.033600) - should be between -0.010000 and 0.010000
  CRITICAL: offset is out of range (0.083933) - must be between -0.050000 and 0.050000

This looks like it's controlled by the _metricdefs structure in alert.py (/opt/ntpmon-ntp-charm/alert.py as installed on systems), and the thresholds are hard-coded. It would be nice if these thresholds could be modified. Maybe we want a big warning in the config saying that modifying these settings is not recommended and shouldn't be done to simply avoid fixing the upstream servers; I'm not sure what others think.

In any case, having this type of tunability would reduce the occurance of borderline alerts for NTP offsets - and would also reduce the likeliness of an engineer ignoring real issues thinking that they're just part of the "typical noise".

Tags: bseng-1136
Revision history for this message
Tom Haddon (mthaddon) wrote :

If you're able to propose a merge for this we'd be happy to review. I agree we should be careful about how we expose this option, and should keep the current values as default.

Changed in ntp-charm:
status: New → Confirmed
importance: Undecided → Wishlist
Revision history for this message
Andrea Ieri (aieri) wrote :

As a sidenote, the grafana-agent machine charm provides some ntp-related metrics (notably sync status and offset) so on a cloud using COS it may be possible to switch away from the current ntpmon script altogether.

Andrea Ieri (aieri)
tags: added: bseng-1136
Changed in ntp-charm:
assignee: nobody → Gabriel Cocenza (gabrielcocenza)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.