Comment 11 for bug 1761096

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

As already reported dnsmasq hangs on the hooks:
             ├─2089 /bin/sh /etc/init.d/dnsmasq systemd-start-resolvconf
             ├─2097 run-parts --arg=-a --arg=lo.dnsmasq /etc/resolvconf/update.d
             ├─2116 run-parts /etc/resolvconf/update-libc.d
             ├─2129 /bin/sh /etc/resolvconf/update-libc.d/squid
             ├─2130 /bin/sh /usr/sbin/invoke-rc.d squid reload
             └─2148 systemctl reload squid.service

The hierarchy of these is:
run-parts /etc/resolvconf/update-libc.d
  -> /etc/resolvconf/update-libc.d/squid
    -> /usr/sbin/invoke-rc.d squid reload

The actual reload is done really fast:
Sep 02 06:32:19 x2 systemd[1]: Reloading LSB: Squid HTTP Proxy version 3.x.
Sep 02 06:32:19 x2 squid[2505]: * Reloading Squid HTTP Proxy configuration files
Sep 02 06:32:19 x2 squid[2505]: ...done.
Sep 02 06:32:19 x2 systemd[1]: Reloaded LSB: Squid HTTP Proxy version 3.x.

But in ps hierarchy the systemctl reload hangs still
0 0 2737 1 20 0 4500 1372 wait Ss ? 0:00 /bin/sh /etc/init.d/dnsmasq systemd-start-resolvconf
0 0 2745 2737 20 0 4360 644 wait S ? 0:00 \_ run-parts --arg=-a --arg=lo.dnsmasq /etc/resolvconf/update.d
0 0 2763 2745 20 0 4360 1184 wait S ? 0:00 \_ run-parts /etc/resolvconf/update-libc.d
0 0 2776 2763 20 0 4500 692 wait S ? 0:00 \_ /bin/sh /etc/resolvconf/update-libc.d/squid
0 0 2777 2776 20 0 4500 1300 wait S ? 0:00 \_ /bin/sh /usr/sbin/invoke-rc.d squid reload
0 0 2795 2777 20 0 24884 1068 poll_s S ? 0:00 \_ systemctl reload squid.service

The delay on that is actually 90 seconds.
And it waits until killed:
     0.000000 ppoll([{fd=3, events=POLLIN}], 1, NULL, NULL, 8) = 1 ([{fd=3, revents=POLLIN}]) <53.100476>
    53.101445 +++ killed by SIGKILL +++
Due to
  Sep 02 06:37:19 x2 systemd[1]: dnsmasq.service: Start-post operation timed out. Stopping.

The old init script reload in xenial was:
    reload|force-reload)
        res=`$DAEMON -k parse -f $CONFIG 2>&1 | grep -o "FATAL: .*"`
        if test -n "$res";
        then
                log_failure_msg "$res"
                exit 3
        else
                log_action_msg "Reloading $DESC configuration files"
                start-stop-daemon --stop --signal 1 \
                        --pidfile $PIDFILE --quiet --exec $DAEMON
                log_action_end_msg 0
        fi
        ;;

But all these actions directly executed are always fast:
- /usr/sbin/invoke-rc.d squid reload
- systemctl reload squid
- start-stop-daemon --stop --signal 1 --pidfile /var/run/squid.pid --quiet --exec /usr/sbin/squid

I can make "systemctl reload squid" hang thou - if I invoke it at the right time.
To do so I replaced /etc/resolvconf/update-libc.d/squid with a debug hook that waits on me:
echo "DEBUG DEBUG DEBUG"
echo "would call 'invoke-rc.d squid reload' now"
echo "instead wait until /tmp/foo was touched"
f="/tmp/foo"
start=$(date -r $f);
new=$(date -r $f);
while [ "$start" = "$new" ]; do
  sleep 2;
  new=$(date -r $f);
done;
echo "DEBUG DEBUG DEBUG"

Once that is active "systemctl reload squid.service" hangs as it is in the error case.
This only calls into "/etc/init.d/squid reload" and that is what actually hangs

And what did I find when watching this hang:
0 0 4135 2159 20 0 4500 1368 wait S+ ? 0:00 \_ /bin/sh /etc/init.d/squid reload
0 0 4143 4135 20 0 26588 1128 poll_s S+ ? 0:00 \_ /bin/systemctl --no-pager reload squid.service
0 0 4144 4143 20 0 15404 840 poll_s S+ ? 0:00 \_ /bin/systemd-tty-ask-password-agent --watch

I mean systemd-tty-ask-password-agent what ?!?

I must say that I love to hate the old complexity of the sysV<->systemd change.
- /etc/init.d/squid is generated into a service
- dnsmasq hook calls: systemctl reload squid
- that calls: /etc/init.d/squid reload
- which calls: /bin/systemctl --no-pager reload squid.service

This is back to step #2 one would think :-)

But if at the broken time I directly call "/bin/systemctl --no-pager reload squid.service" I get the same behavior with less redirections:
4 0 2159 0 20 0 21608 2800 wait Ss ? 0:00 bash
0 0 5312 2159 20 0 26588 2364 poll_s S+ ? 0:00 \_ /bin/systemctl --no-pager reload squid.service
0 0 5314 5312 20 0 15404 840 poll_s S+ ? 0:00 \_ /bin/systemd-tty-ask-password-agent --watch

But when this happens /run/systemd/ask-password is empty.

I think we are looking at something more similar to:
- https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1456789
- https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=774153
- https://bugs.freedesktop.org/show_bug.cgi?id=92430
- https://bugzilla.redhat.com/show_bug.cgi?id=1367981
But for me `systemctl daemon-reexec` didn't serve as a workaround, so maybe not the exact same?

Similar is https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1565617 in which case /bin/systemd-tty-ask-password-agent was a red herring - but the fix for that has landed and we still see it.

The next questions open to me now are:
- why does /bin/systemd-tty-ask-password-agent open here
- is /bin/systemd-tty-ask-password-agent the process holding this up?

For the latter I set up this debug trick:
lrwxrwxrwx 1 root root 9 Sep 2 07:27 /bin/systemd-tty-ask-password-agent -> /bin/true*
But that was probably too rude and hangs it in:
  [systemd-tty-ask] <defunct>

Sergio, maybe you can continue from here tracking these two open questions further down?