ilorest simultaneous run contention
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
hw-health-charm |
Won't Fix
|
Low
|
Unassigned |
Bug Description
The `ilorest` tool prefers only one application instance at a time:
$ TMP=/var/
iLOrest : RESTful Interface Tool version 3.2.2
Copyright (c) 2014-2021 Hewlett Packard Enterprise Development LP
-------
ERROR : 2
Blob was overwritten by another user. Please ensure only one user is making changes at a time locally.
$ echo $?
76
As a result, any time the previous run of the cronjob to read `ilorest` runs too long, the next run hits this error. Further, multi-threaded attempts to acquire `ilorest` data will hit this same wall. The script will halt on this error and fail to write out data.
We are starting to see cases where this happens, causing the following alarm:
/var/lib/
A probable solution is setting a lock before executing the `ilorest` program that checks for running instances.
Related branches
- 🤖 prod-jenkaas-bootstack: Needs Fixing (continuous-integration)
- BootStack Reviewers: Pending requested
- BootStack Reviewers: Pending requested
-
Diff: 79 lines (+40/-5)1 file modifiedsrc/files/ilorest/cron_ilorest.py (+40/-5)
Changed in charm-hw-health: | |
status: | New → Triaged |
importance: | Undecided → Medium |
tags: | added: bseng-1094 |
Changed in charm-hw-health: | |
importance: | Medium → Low |
I was hit by the same bug but it spat out a different error message
sudo /usr/sbin/ilorest list -j --selector=Power --refresh
Unable to locate instance for power
root@cz20430cj4:~# echo $?
23
Also Chassis instead of Power but the error code remained the same.
A manual workaround would be : nagios/ ilorest. out nagios/ ilorest. out lib/nagios/ plugins/ cron_ilorest. py lib/nagios/ plugins/ check_hw_ health_ cron_output. py --filename /var/lib/ nagios/ ilorest. out
in a root shell root: ilorest logout; ilorest login
sudo chmod 644 /var/lib/
sudo chown nagios:nagios /var/lib/
# regenerate fresh data
sudo /usr/local/
# rerun the nagios health check
sudo -u nagios /usr/local/
OK No errors found
All OK