healthcheck_port is broken in master for certain containers
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Triaged
|
Medium
|
Unassigned |
Bug Description
Noticed this while debugging an introspection problem:
[root@undercloud-0 ironic]# systemctl list-units --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● 96e0bdd1c924e0e
● a40de3a5e50467e
● aaf605c5faebcdf
● d0dc972a52e25bf
● dd5b72635f2bd03
● fd65cfe98fa03dc
All these healthcheck correspond to:
[root@undercloud-0 ironic]# podman ps |grep -e 96e0bdd1 -e a40de3a5e504 -e aaf605c5fa -e d0dc972a52e2 -e dd5b72635f2b -e fd65cfe98fa
fd65cfe98fa0 undercloud-
aaf605c5faeb undercloud-
dd5b72635f2b undercloud-
d0dc972a52e2 undercloud-
96e0bdd1c924 undercloud-
a40de3a5e504 undercloud-
The reason these healthcheck fails seems to be because it uses sudo -u <user> find ...
Let's take ironic_
process='dnsmasq'
if pgrep $process; then
listen_
....
port="67"
....
if ! healthcheck_port $process $port; then
Now even though the dnsmasq process is clearly listening to port 67 as seen here:
[root@undercloud-0 ~]# ss -natulpe |grep :67
udp UNCONN 0 0 0.0.0.0:67 0.0.0.0:* users:(
[root@undercloud-0 ~]# ps auxwf |grep 189442
setroub+ 189442 0.1 0.0 57508 4152 ? S 08:33 0:04 \_ /sbin/dnsmasq --conf-
this healthcheck is still in failed state. The reason for it is the following code in common.sh:
for pid in $(pgrep -f $process); do
# Here, we check if a socket is actually associated to the process PIDs
match=$(( $match+$(sudo -u $puser find /proc/$pid/fd/ -ilname "socket*" -printf "%l\n" 2>/dev/null | grep -c -E "(${sockets})") ))
test $match -gt 0 && exit 0 # exit as soon as we get a match
done
The above sudo -u $puser fails with:
[root@undercloud-0 /]# sudo -u dnsmasq find /proc/7/fd/ -ilname 'socket*' -printf '%l\n'
+ sudo -u dnsmasq find /proc/7/fd/ -ilname 'socket*' -printf '%l\n'
find: ‘/proc/7/fd/’: Permission denied
That is because only root can access /proc/<pid>/fd in those containers:
[root@undercloud-0 healthcheck]# podman exec -it ironic_
dr-x------. 2 root root 0 Jun 30 09:37 /proc/7/fd
Not all containers have this restriction though. For example, ironic_inspector seems to not have it:
[root@undercloud-0 healthcheck]# podman exec -it ironic_inspector sh -c 'ls -lda /proc/7/fd ; sleep 2'
dr-x------. 2 ironic-inspector ironic-inspector 0 Jun 30 09:36 /proc/7/fd
So clearly the healthcheck_port is not working for those containers (like ironic_
summary: |
- healthcheck_port is broken in master + healthcheck_port is broken in master for certain containers |
mac has been removed, please switch to ports