healthchecks are mostly unreliable
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
High
|
Cédric Jeanneret |
Bug Description
Hello there,
After some checks and digging with the healthchecks, it appears most of them are unreliable due to the lack of strict error checking, such as "set -o pipefail" and other options.
We probably want to add the following options:
set -eo pipefail
in tripleo-
------
While digging into that issue, I also found out that, apparently, "grep -q -E ..." doesn't return the correct exit code when it does match a piped content - at least in some cases (see bellow).
For instance, in nova_conductor container::
(ss -ntuap; sudo -u nova ss -ntuap) | sort -u | /usr/bin/grep -Eq ":(5672)
141
But if we do it without -q:
(ss -ntuap; sudo -u nova ss -ntuap) | sort -u | /usr/bin/grep -E ":(5672)
tcp ESTAB 0 0 192.168.24.1:54136 192.168.24.1:5672 users:(
tcp ESTAB 0 0 192.168.24.1:54138 192.168.24.1:5672 users:(
tcp ESTAB 0 0 192.168.24.1:54140 192.168.24.1:5672 users:(
tcp ESTAB 0 0 192.168.24.1:54142 192.168.24.1:5672 users:(
tcp ESTAB 0 0 192.168.24.1:54144 192.168.24.1:5672 users:(
tcp ESTAB 0 0 192.168.24.1:54146 192.168.24.1:5672 users:(
tcp ESTAB 0 0 192.168.24.1:54148 192.168.24.1:5672 users:(
tcp ESTAB 0 0 192.168.24.1:54150 192.168.24.1:5672 users:(
tcp ESTAB 0 0 192.168.24.1:57270 192.168.24.1:5672 users:(
tcp ESTAB 0 0 192.168.24.1:57310 192.168.24.1:5672 users:(
tcp ESTAB 0 0 192.168.24.1:57320 192.168.24.1:5672 users:(
tcp ESTAB 0 0 192.168.24.1:57324 192.168.24.1:5672 users:(
tcp ESTAB 0 0 192.168.24.1:57326 192.168.24.1:5672 users:(
tcp ESTAB 0 0 192.168.24.1:57364 192.168.24.1:5672 users:(
tcp ESTAB 8 0 192.168.24.1:57328 192.168.24.1:5672 users:(
tcp ESTAB 8 0 192.168.24.1:57360 192.168.24.1:5672 users:(
0
This unreliable behaviour was detected in a rhel-8 OSP-16 container, while on the rhel-8 host, it was working as expected. There's probably something fishy with the container env at some point, but to be honest, I didn't dig further.
A solution for that last issue is to drop the -q and redirect STDOUT to /dev/null:
(ss -ntuap; sudo -u nova ss -ntuap) | sort -u | /usr/bin/grep -E ":(5672)
0
since it will return 0 if nothing is matched, as you can see here:
(ss -ntuap; sudo -u nova ss -ntuap) | sort -u | /usr/bin/grep -E ":(15672)
1
tags: | added: train-backport-potential |
I have a two jokes for this:
- here containers proven to become the "sufficiently advanced technology" by Arthur C. Clarke
- let's rewrite on awk. grep -q as of now considered harmful