NRPE checks fail: Stderr: 'usage: check_reboot.py [-h] known_reboot_time

Bug #1971156 reported by Bas de Bruijne
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
NRPE Charm
Fix Released
Undecided
Unassigned

Bug Description

In testrun:
https://solutions.qa.canonical.com/testruns/testRun/2321f1c4-3dea-4af0-b692-5ca201d2bc35
with FCE console output:
https://oil-jenkins.canonical.com/job/fce_build/1623//console

A number of nrpe checks fail with:
```
Stderr: 'usage: check_reboot.py [-h] known_reboot_time
  check_reboot.py: error: argument known_reboot_time: time must be in format
  yyyy-mm-dd HH:MM:SS, same as output from `uptime --since`.
  '
```

I assume something in the check arguments or calling has changed.

Link to crashdumps (which also contains an exported bundle):
https://oil-jenkins.canonical.com/artifacts/2321f1c4-3dea-4af0-b692-5ca201d2bc35/index.html

Related branches

Revision history for this message
Alexander Balderson (asbalderson) wrote :

A simple reproducer is deploying NRPE as shown from the charm help, and then running the check with no arguments:

juju deploy ubuntu
juju deploy nrpe
juju deploy nagios
juju add-relation ubuntu nrpe
juju add-relation nrpe:monitors nagios:monitors
juju wait
juju run-action --wait nrpe/0 run-nrpe-check name=check-reboot

Currently SQA has a nagios validator which
1) lists all the checks for a NRPE unit
2) runs the nrpe check for each check
3) reports the errors.

Check reboot comes up as an listed check, but when ran with no argument reports as a failed check. I feel like the check should be able to check the juju kv for the known reboot time, if there is no known reboot time, it does nothing, otherwise does the compare.
ack-reboot can then behave the way it does currently.

Changed in charm-nrpe:
status: New → Confirmed
Revision history for this message
Xav Paice (xavpaice) wrote :

in nrpe_helpers.py, the check is initially populated with a time from set_known_reboot_time(), and sent to get_check_reboot_context().

When I created a fresh cluster using the instructions in the bug description, the resulting NRPE check is this:

# System reboot time (sub)
command[check_reboot]=/usr/local/lib/nagios/plugins/check_reboot.py "2022-05-03 05:12:03"

Running that manually:

root@juju-da6b2f-nrpe-0:/etc/nagios/nrpe.d# /usr/local/lib/nagios/plugins/check_reboot.py "2022-05-03 05:12:03"
OK: system is up since 2022-05-03 05:12:03

The action, however, fails:

ubuntu@xavpaice-bastion:~$ juju run-action --wait nrpe/0 run-nrpe-check name=check-reboot
unit-nrpe-0:
  UnitId: nrpe/0
  id: "2"
  results:
    Stderr: |
      usage: check_reboot.py [-h] known_reboot_time
      check_reboot.py: error: argument known_reboot_time: time must be in format yyyy-mm-dd HH:MM:SS, same as output from `uptime --since`.
    check-output: ""
  status: completed
  timing:
    completed: 2022-05-03 06:11:51 +0000 UTC
    enqueued: 2022-05-03 06:11:48 +0000 UTC
    started: 2022-05-03 06:11:51 +0000 UTC

This bug is therefore about the action, not the NRPE check itself.

Revision history for this message
Xav Paice (xavpaice) wrote :

Looking at the action script (which is written in bash):

root@juju-da6b2f-nrpe-0:/home/ubuntu# check=check_reboot
root@juju-da6b2f-nrpe-0:/home/ubuntu# nrpedir="/etc/nagios/nrpe.d"
root@juju-da6b2f-nrpe-0:/home/ubuntu# checkfile="$nrpedir/${check}.cfg"
root@juju-da6b2f-nrpe-0:/home/ubuntu# command=$(awk -F "=" '{ print $2 }' $checkfile)
root@juju-da6b2f-nrpe-0:/home/ubuntu# echo $command
/usr/local/lib/nagios/plugins/check_reboot.py "2022-05-03 05:12:03"

root@juju-da6b2f-nrpe-0:/home/ubuntu# sudo -u nagios $command
usage: check_reboot.py [-h] known_reboot_time
check_reboot.py: error: argument known_reboot_time: time must be in format yyyy-mm-dd HH:MM:SS, same as output from `uptime --since`.

root@juju-da6b2f-nrpe-0:/home/ubuntu# echo $command |xargs sudo -u nagios
OK: system is up since 2022-05-03 05:12:03

This is about the way bash passes args and quotes.

Xav Paice (xavpaice)
Changed in charm-nrpe:
status: Confirmed → Fix Released
milestone: none → 22.04
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.