mdadm --detail status with degraded mode is not reported

Bug #1832906 reported by Steven Parker
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
hw-health-charm
Fix Released
High
Unassigned

Bug Description

Drives with the following status are not reported in Thruk.
Need to check for degraded as well as clean status.

--

sudo mdadm --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Thu Oct 18 19:18:41 2018
     Raid Level : raid10
     Array Size : 15624470528 (14900.66 GiB 15999.46 GB)
  Used Dev Size : 3906117632 (3725.16 GiB 3999.86 GB)
   Raid Devices : 8
  Total Devices : 7
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Sun Jun 2 00:57:01 2019
          State : clean, degraded
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0

         Layout : near=2
     Chunk Size : 512K

           Name : CMOOSCHSTUP7325:1 (local to host CMOOSCHSTUP7325)
           UUID : 02536f97:443d4acb:cd301628:038165d5
         Events : 30508

    Number Major Minor RaidDevice State
       0 8 34 0 active sync set-A /dev/sdc2
       1 8 50 1 active sync set-B /dev/sdd2
       2 8 66 2 active sync set-A /dev/sde2
       3 8 82 3 active sync set-B /dev/sdf2
       4 8 98 4 active sync set-A /dev/sdg2
      10 0 0 10 removed
       6 8 146 6 active sync set-A /dev/sdj2
       7 8 162 7 active sync set-B /dev/sdk2

Related branches

Revision history for this message
Drew Freiberger (afreiberger) wrote :

This is not a thruk bug, but a monitoring agent needed bug.

I have added hw-health-charm to this bug as that is likely the best place to monitor for software raid health.

Changed in thruk-master-charm:
status: New → Invalid
Revision history for this message
Drew Freiberger (afreiberger) wrote :

I recommend having this check be based on the "State :" field of mdadm --detail <md_dev>:

OK:
State : clean

Warning - mdX Degraded but recovering:
State : clean, degraded, recovering

Critical - mdX is degraded:
State : clean, degraded

UNKNOWN - mdX is in an unknown state *$state*
State : *anything missing clean*

Revision history for this message
Drew Freiberger (afreiberger) wrote :

it appears that this is coded, but not working properly:

root@CMOOSCHSTUP7305:/etc/cron.d# cat /var/lib/nagios/mdadm.out
OK: /dev/md1 ok

root@CMOOSCHSTUP7305:/etc/cron.d# mdadm --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Thu Oct 18 17:41:01 2018
     Raid Level : raid10
     Array Size : 7812235264 (7450.33 GiB 7999.73 GB)
  Used Dev Size : 3906117632 (3725.16 GiB 3999.86 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Oct 28 18:46:34 2019
          State : active, degraded, recovering
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : near=2
     Chunk Size : 512K

 Rebuild Status : 13% complete

           Name : CMOOSCHSTUP7305:1 (local to host CMOOSCHSTUP7305)
           UUID : fed3a645:1f742fd3:1685dda5:71794407
         Events : 750593

    Number Major Minor RaidDevice State
       0 8 98 0 active sync set-A /dev/sdg2
       1 8 114 1 active sync set-B /dev/sdh2
       2 8 146 2 active sync set-A /dev/sdj2
       4 8 162 3 spare rebuilding /dev/sdk2

This should show Critical/degraded.

Xav Paice (xavpaice)
Changed in hw-health-charm:
importance: Undecided → High
Changed in thruk-master-charm:
status: Invalid → Won't Fix
Xav Paice (xavpaice)
Changed in hw-health-charm:
status: New → In Progress
Andrea Ieri (aieri)
no longer affects: charm-thruk-master
Changed in charm-hw-health:
assignee: nobody → Andrea Ieri (aieri)
Revision history for this message
Andrea Ieri (aieri) wrote :

released as cs:~llama-charmers-next/hw-health-4

Changed in charm-hw-health:
status: In Progress → Fix Released
milestone: none → 20.08
Andrea Ieri (aieri)
Changed in charm-hw-health:
assignee: Andrea Ieri (aieri) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.