healthcheck_port is broken in master for certain containers

Bug #1934118 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Triaged
Medium
Unassigned

Bug Description

Noticed this while debugging an introspection problem:
[root@undercloud-0 ironic]# systemctl list-units --failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
● 96e0bdd1c924e0ebc872a05e03ef3c17c121c138fa1f217bea7f5047dbe4a225.service loaded failed failed /usr/bin/podman healthcheck run 96e0bdd1c924e0ebc872a05e03ef3c17c121c138fa1f217bea7f5047dbe4a225
● a40de3a5e50467e2a36d99c8c5c7b51ac24002fdffd1bb3a8fcfeeaa5ca35e97.service loaded failed failed /usr/bin/podman healthcheck run a40de3a5e50467e2a36d99c8c5c7b51ac24002fdffd1bb3a8fcfeeaa5ca35e97
● aaf605c5faebcdf1d4decff8662dc44287bdf8a4c7d7122e60b751bfbdfae9c6.service loaded failed failed /usr/bin/podman healthcheck run aaf605c5faebcdf1d4decff8662dc44287bdf8a4c7d7122e60b751bfbdfae9c6
● d0dc972a52e25bf16079d7fe3ed5bc5b387de9595d4c5e4c25738b16d959266d.service loaded failed failed /usr/bin/podman healthcheck run d0dc972a52e25bf16079d7fe3ed5bc5b387de9595d4c5e4c25738b16d959266d
● dd5b72635f2bd0384d73c555b8067cc875446327e37fda1677107bc7745bc051.service loaded failed failed /usr/bin/podman healthcheck run dd5b72635f2bd0384d73c555b8067cc875446327e37fda1677107bc7745bc051
● fd65cfe98fa03dcf39d121ee650227d2284388a7c01e6f8b315b446e67691a46.service loaded failed failed /usr/bin/podman healthcheck run fd65cfe98fa03dcf39d121ee650227d2284388a7c01e6f8b315b446e67691a46

All these healthcheck correspond to:
[root@undercloud-0 ironic]# podman ps |grep -e 96e0bdd1 -e a40de3a5e504 -e aaf605c5fa -e d0dc972a52e2 -e dd5b72635f2b -e fd65cfe98fa
fd65cfe98fa0 undercloud-0.ctlplane.alejandro.ftw:8787/tripleomaster/openstack-neutron-openvswitch-agent:current-tripleo kolla_start 8 hours ago Up 8 hours ago neutron_ovs_agent
aaf605c5faeb undercloud-0.ctlplane.alejandro.ftw:8787/tripleomaster/openstack-neutron-dhcp-agent:current-tripleo kolla_start 8 hours ago Up 8 hours ago neutron_dhcp
dd5b72635f2b undercloud-0.ctlplane.alejandro.ftw:8787/tripleomaster/openstack-neutron-l3-agent:current-tripleo kolla_start 8 hours ago Up 8 hours ago neutron_l3_agent
d0dc972a52e2 undercloud-0.ctlplane.alejandro.ftw:8787/tripleomaster/openstack-ironic-conductor:current-tripleo kolla_start 8 hours ago Up 14 minutes ago ironic_conductor
96e0bdd1c924 undercloud-0.ctlplane.alejandro.ftw:8787/tripleomaster/openstack-ironic-neutron-agent:current-tripleo kolla_start 8 hours ago Up 14 minutes ago ironic_neutron_agent
a40de3a5e504 undercloud-0.ctlplane.alejandro.ftw:8787/tripleomaster/openstack-ironic-inspector:current-tripleo kolla_start 8 hours ago Up 14 minutes ago ironic_inspector_dnsmasq

The reason these healthcheck fails seems to be because it uses sudo -u <user> find ...

Let's take ironic_inspector_dnsmasq for example, which does this (in my case):
process='dnsmasq'
if pgrep $process; then
    listen_address=$(get_config_val /etc/ironic-inspector/inspector.conf DEFAULT listen_address 127.0.0.1)
....
    port="67"
....
if ! healthcheck_port $process $port; then

Now even though the dnsmasq process is clearly listening to port 67 as seen here:
 [root@undercloud-0 ~]# ss -natulpe |grep :67
udp UNCONN 0 0 0.0.0.0:67 0.0.0.0:* users:(("dnsmasq",pid=189442,fd=4)) ino:7400742 sk:3c <->
 [root@undercloud-0 ~]# ps auxwf |grep 189442
setroub+ 189442 0.1 0.0 57508 4152 ? S 08:33 0:04 \_ /sbin/dnsmasq --conf-file=/etc/ironic-inspector/dnsmasq.conf -k --log-facility=/var/log/ironic-inspector/dnsmasq.log

this healthcheck is still in failed state. The reason for it is the following code in common.sh:
    for pid in $(pgrep -f $process); do
        # Here, we check if a socket is actually associated to the process PIDs
        match=$(( $match+$(sudo -u $puser find /proc/$pid/fd/ -ilname "socket*" -printf "%l\n" 2>/dev/null | grep -c -E "(${sockets})") ))
        test $match -gt 0 && exit 0 # exit as soon as we get a match
    done

The above sudo -u $puser fails with:
[root@undercloud-0 /]# sudo -u dnsmasq find /proc/7/fd/ -ilname 'socket*' -printf '%l\n'
+ sudo -u dnsmasq find /proc/7/fd/ -ilname 'socket*' -printf '%l\n'
find: ‘/proc/7/fd/’: Permission denied

That is because only root can access /proc/<pid>/fd in those containers:
[root@undercloud-0 healthcheck]# podman exec -it ironic_inspector_dnsmasq sh -c 'ls -lda /proc/7/fd ; sleep 2'
dr-x------. 2 root root 0 Jun 30 09:37 /proc/7/fd

Not all containers have this restriction though. For example, ironic_inspector seems to not have it:
 [root@undercloud-0 healthcheck]# podman exec -it ironic_inspector sh -c 'ls -lda /proc/7/fd ; sleep 2'
dr-x------. 2 ironic-inspector ironic-inspector 0 Jun 30 09:36 /proc/7/fd

So clearly the healthcheck_port is not working for those containers (like ironic_inspector_dnsmasq which only allow root to see /proc/<XYZ>/fd). Unsure yet as to what the difference is between these containers.

summary: - healthcheck_port is broken in master
+ healthcheck_port is broken in master for certain containers
Revision history for this message
Alex Schultz (alex-schultz) wrote :

mac has been removed, please switch to ports

Revision history for this message
Alex Schultz (alex-schultz) wrote :

oops wrong bug

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.