Comment 7 for bug 1943628

Revision history for this message
Vern Hart (vern) wrote :

I'm seeing this test fail in a new deployment with the following nagios alert:

    UNKNOWN: could not determine OSDs versions, error: Command '['ceph', 'versions']' retruned non-zero exit status 1.

Line 98 of files/nagios/check_ceph_status.py tries to run "ceph versions" but this file is executed by nrpe as the nagios user. This user does not have access to the ceph keys so cannot run ceph commands.

This needs to be relegated to a cron job so it can run as root, the output saved to a text file, and then that text file slurped in by the nrpe check. This is how all the other checks work that rely on the output of ceph commands.