No way to skip the check for some NVME devices

Bug #2044398 reported by Facundo Ciccioli
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
hw-health-charm
Won't Fix
Undecided
Unassigned

Bug Description

One of the things the check_nvme.py check does is to read the percentage_used metric and alert in case it's greater than a configured threshold. The issue is that often a single host has more than one NVME, and once one of them reaches the threshold, then the check will remain CRITICAL while any actions are taken to resolve the issue with that particular drive. During this period, the other NVMEs are effectively shadowed and them reaching the threshold will go unnoticed.

Analyze the feature of adding an "ignore-nvme" config property to skip some NVMEs from the check, or provide a workaround to remove an NVME device from a system while it is not being used.

Revision history for this message
Facundo Ciccioli (fandanbango) wrote :

Since hw-health is deprecated, we've worked around this issue in a different way (went for alertmanager and prometheus alert rules, which allows us to silence very selectively).

Changed in charm-hw-health:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.