However, we have been reviewing the scripts' code and here is what we see:
1) cronjob script: writes "{schedule} root timeout ... {timeout} {command}" in /etc/cron.d
2) {command} is the collect_rabbitmq_stats script, which generates 2 output files
3) check_rabbitmq.py sends probe messages via rmq
4) check_rabbitmq_queues.py parses one of the files generated by the cronjob at #2
I think the fix involves 2 scenarios:
a) the one mentioned by Liam, where the output file parsed by #4 does not exist (the first 5 minutes until the first run of the cronjob). A possible fix can be found at https://review.opendev.org/661814
b) when the cronjob is removed, the nrpe check that calls check_rabbitmq_queues.py should also be removed. The reason is that the output file wouldn't be updated (or it would not even exist in the first place), so a check should no exist. The only reason cronjobs are used on charms are: permission issue (checks are run as "nagios", but "root" may be needed); and time taken to run a check (rmq checks can take more than 10 seconds, so check_nrpe could return a "socket timeout" if we don't run it asynchronously [just checking the output file]).
To fix b),
b.1) rsync of check_rabbitmq_queues.py from scripts/* to NAGIOS_PLUGINS should be done in the same place where the cronjob is copied
b.2) when the cronjob is removed, the script could also be removed
b.3) when the nrpe check is added, under the same condition as above, nrpe_compat.remove_check should be called instead of add_check.
I have been discussing this bug with a colleague and we've seen that the condition mentioned by Liam is now "config( 'stats_ cron_schedule' ). /github. com/openstack/ charm-rabbitmq- server/ blob/master/ hooks/rabbitmq_ server_ relations. py#L651
https:/
However, we have been reviewing the scripts' code and here is what we see: rabbitmq_ stats script, which generates 2 output files queues. py parses one of the files generated by the cronjob at #2
1) cronjob script: writes "{schedule} root timeout ... {timeout} {command}" in /etc/cron.d
2) {command} is the collect_
3) check_rabbitmq.py sends probe messages via rmq
4) check_rabbitmq_
I think the fix involves 2 scenarios: /review. opendev. org/661814
a) the one mentioned by Liam, where the output file parsed by #4 does not exist (the first 5 minutes until the first run of the cronjob). A possible fix can be found at https:/
b) when the cronjob is removed, the nrpe check that calls check_rabbitmq_ queues. py should also be removed. The reason is that the output file wouldn't be updated (or it would not even exist in the first place), so a check should no exist. The only reason cronjobs are used on charms are: permission issue (checks are run as "nagios", but "root" may be needed); and time taken to run a check (rmq checks can take more than 10 seconds, so check_nrpe could return a "socket timeout" if we don't run it asynchronously [just checking the output file]).
To fix b), queues. py from scripts/* to NAGIOS_PLUGINS should be done in the same place where the cronjob is copied remove_ check should be called instead of add_check.
b.1) rsync of check_rabbitmq_
b.2) when the cronjob is removed, the script could also be removed
b.3) when the nrpe check is added, under the same condition as above, nrpe_compat.