NRPE checks fail due to malformed crontab entry

Bug #1939702 reported by Gábor Mészáros
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack RabbitMQ Server Charm
Fix Released
Undecided
Aurelien Lourot

Bug Description

In a deployment rabbitmq-server charm was upgraded to revision 110 (on bionic, queens).
After the upgrade went successfully, nagios nrpe check starts to fail.

Checking the root cause shows that the new charm calls python3-croniter, which expects 5 or 6 columns in the crontab entries.

This is the output of the check rabbitmq queues, with added prints for which line fails (the collect_rabbitmq_stats.sh) and how many columns it has (18)

root@juju-4278df-77-lxd-6:~# sudo -u root /usr/local/lib/nagios/plugins/check_rabbitmq_queues.py -c \* \* 100 200 /var/lib/rabbitmq/data/juju-4278df-77-lxd-6_queue_stats.dat
['*/5', '*', '*', '*', '*', 'rabbitmq', 'timeout', '-k', '10s', '-s', 'SIGINT', '300', '/usr/local/bin/collect_rabbitmq_stats.sh', '2>&1', '|', 'logger', '-p', 'local0.notice']
18
Traceback (most recent call last):
  File "/usr/local/lib/nagios/plugins/check_rabbitmq_queues.py", line 183, in <module>
    for f in args.stats_file]
  File "/usr/local/lib/nagios/plugins/check_rabbitmq_queues.py", line 183, in <listcomp>
    for f in args.stats_file]
  File "/usr/local/lib/nagios/plugins/check_rabbitmq_queues.py", line 124, in check_stats_file_freshness
    interval = get_cron_interval(cronspec, asof)
  File "/usr/local/lib/nagios/plugins/check_rabbitmq_queues.py", line 101, in get_cron_interval
    it = croniter(cronspec, base)
  File "/usr/lib/python3/dist-packages/croniter/croniter.py", line 71, in __init__
    raise ValueError(self.bad_length)
ValueError: Exactly 5 or 6 columns has to be specified for iteratorexpression.

tags: added: upgrade-charm
Changed in charm-rabbitmq-server:
assignee: nobody → Aurelien Lourot (aurelien-lourot)
Revision history for this message
Billy Olsen (billy-olsen) wrote :

The problem here is that the get_stats_cron_schedule() function in check_rabbitmq_queues splits the cron entry on 'root' [0], which is the user which is used to run the command here. In the provided traceback, the user is 'rabbitmq'. The charm has used the 'root' user since the 16.04 release of the charms so this charm must have some local modifications which break this.

A better option here would be to strip out the cron schedule by splitting the string and grabbing only the schedule pieces definition pieces (the first 5 entries).

[0] https://github.com/openstack/charm-rabbitmq-server/blob/stable/21.04/files/check_rabbitmq_queues.py#L108

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-rabbitmq-server (master)
Changed in charm-rabbitmq-server:
status: New → In Progress
Changed in charm-rabbitmq-server:
milestone: none → 21.10
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-rabbitmq-server (master)

Reviewed: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/804911
Committed: https://opendev.org/openstack/charm-rabbitmq-server/commit/45ded8b0f90c184a5dca819822e5fdfbb92c702a
Submitter: "Zuul (22348)"
Branch: master

commit 45ded8b0f90c184a5dca819822e5fdfbb92c702a
Author: Billy Olsen <email address hidden>
Date: Tue Aug 17 11:27:04 2021 -0700

    Improve parsing of cron schedule

    Improving the parsing of the cron schedule for /etc/cron/rabbitmq-stats.
    The code makes assumptions that the user in the cron entry will be the
    root user, which is generally safe as that's what the charm applied.
    However, the parsing is brittle in that it depends on the 'root' string
    in the entry. This changes the code so that the cron timer spec is
    stripped out based on the column entries in the file.

    Change-Id: I2d573e8942e840e0e5376f1537a2a3373fea3db8
    Fixes-Bug: #1939702

Changed in charm-rabbitmq-server:
status: In Progress → Fix Committed
Revision history for this message
Gábor Mészáros (gabor.meszaros) wrote :

I have installed cs:~openstack-charmers-next/rabbitmq-server-437
and still getting Unable to read output errors

Aug 24 09:05:15 juju-4278df-77-lxd-6 nrpe[94309]: CONN_CHECK_PEER: checking if host is allowed: 111.222.5.67 port 23256
Aug 24 09:05:15 juju-4278df-77-lxd-6 nrpe[94309]: Connection from 111.222.5.67 port 23256
Aug 24 09:05:15 juju-4278df-77-lxd-6 nrpe[94309]: is_an_allowed_host (AF_INET): is host >111.222.5.67< an allowed host >111.222.5.67<
Aug 24 09:05:15 juju-4278df-77-lxd-6 nrpe[94309]: is_an_allowed_host (AF_INET): is host >111.222.5.67< an allowed host >111.222.5.67<
Aug 24 09:05:15 juju-4278df-77-lxd-6 nrpe[94309]: is_an_allowed_host (AF_INET): host is in allowed host list!
Aug 24 09:05:15 juju-4278df-77-lxd-6 nrpe[94309]: Host address is in allowed_hosts
Aug 24 09:05:15 juju-4278df-77-lxd-6 nrpe[94309]: Host 111.222.5.67 is asking for command 'check_rabbitmq_queue' to be run...
Aug 24 09:05:15 juju-4278df-77-lxd-6 nrpe[94309]: Running command: /usr/local/lib/nagios/plugins/check_rabbitmq_queues.py -c \* \* 100 200 /var/lib/rabbitmq/data/juju-4278df-77-lxd-6_queue_stats.dat
Aug 24 09:05:15 juju-4278df-77-lxd-6 nrpe[94310]: WARNING: my_system() seteuid(0): Operation not permitted
Aug 24 09:05:15 juju-4278df-77-lxd-6 nrpe[94309]: Command completed with return code 1 and output:
Aug 24 09:05:15 juju-4278df-77-lxd-6 nrpe[94309]: Return Code: 3, Output: NRPE: Unable to read output
Aug 24 09:05:15 juju-4278df-77-lxd-6 nrpe[94309]: Connection from 111.222.5.67 closed.

however the other check for check_rabbitmq succeeds:

Aug 24 09:05:30 juju-4278df-77-lxd-6 nrpe[94325]: CONN_CHECK_PEER: checking if host is allowed: 111.222.5.67 port 59097
Aug 24 09:05:30 juju-4278df-77-lxd-6 nrpe[94325]: Connection from 111.222.5.67 port 59097
Aug 24 09:05:30 juju-4278df-77-lxd-6 nrpe[94325]: is_an_allowed_host (AF_INET): is host >111.222.5.67< an allowed host >111.222.5.67<
Aug 24 09:05:30 juju-4278df-77-lxd-6 nrpe[94325]: is_an_allowed_host (AF_INET): is host >111.222.5.67< an allowed host >111.222.5.67<
Aug 24 09:05:30 juju-4278df-77-lxd-6 nrpe[94325]: is_an_allowed_host (AF_INET): host is in allowed host list!
Aug 24 09:05:30 juju-4278df-77-lxd-6 nrpe[94325]: Host address is in allowed_hosts
Aug 24 09:05:30 juju-4278df-77-lxd-6 nrpe[94325]: Host 111.222.5.67 is asking for command 'check_rabbitmq' to be run...
Aug 24 09:05:30 juju-4278df-77-lxd-6 nrpe[94325]: Running command: /usr/local/lib/nagios/plugins/check_rabbitmq.py --user nagios-rabbitmq-server-3 --password <redacted>> --vhost nagios-rabbitmq-server-3
Aug 24 09:05:30 juju-4278df-77-lxd-6 nrpe[94326]: WARNING: my_system() seteuid(0): Operation not permitted
Aug 24 09:05:30 juju-4278df-77-lxd-6 nrpe[94325]: Command completed with return code 0 and output: Ok: sent and received 10 test messages
Aug 24 09:05:30 juju-4278df-77-lxd-6 nrpe[94325]: Return Code: 0, Output: Ok: sent and received 10 test messages
Aug 24 09:05:30 juju-4278df-77-lxd-6 nrpe[94325]: Connection from 111.222.5.67 closed.

Changed in charm-rabbitmq-server:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.