hw-health-charm

No way to skip the check for some NVME devices

Bug #2044398 reported by Facundo Ciccioli on 2023-11-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	hw-health-charm	Won't Fix	Undecided	Unassigned

Bug Description

One of the things the check_nvme.py check does is to read the percentage_used metric and alert in case it's greater than a configured threshold. The issue is that often a single host has more than one NVME, and once one of them reaches the threshold, then the check will remain CRITICAL while any actions are taken to resolve the issue with that particular drive. During this period, the other NVMEs are effectively shadowed and them reaching the threshold will go unnoticed.

Analyze the feature of adding an "ignore-nvme" config property to skip some NVMEs from the check, or provide a workaround to remove an NVME device from a system while it is not being used.

Revision history for this message

Facundo Ciccioli (fandanbango) wrote on 2023-12-07:

Since hw-health is deprecated, we've worked around this issue in a different way (went for alertmanager and prometheus alert rules, which allows us to silence very selectively).

Changed in charm-hw-health:
status:	New → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.