Comment 15 for bug 1073478

Revision history for this message
Tiago Stürmer Daitx (tdaitx) wrote : Re: [Bug 1073478] Re: [SRU] Update squid3 upstart script to kill it with SIGINT and wait longer

On the contrary, thanks for bringing that up. I was good to be made aware
of such problems. Just because it worked now on Precise and Trusty didn't
mean I should skip that check, as I might as well run into an obsolete or
deleted feature (not that upstart seem to have those). =)
On Oct 14, 2015 9:56 AM, "Robie Basak" <email address hidden> wrote:

> On Wed, Oct 14, 2015 at 12:34:33PM -0000, Tiago Stürmer Daitx wrote:
> > According to http://upstart.ubuntu.com/cookbook/#stanzas-by-category
> > both "kill timeout" (all upstart versions) and "kill signal" (since
> > upstart 1.3) should be supported by upstart 1.5 in Precise so this SRU
> > should be fine.
>
> You're right. Sorry for the noise.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1073478
>
> Title:
> [SRU] Update squid3 upstart script to kill it with SIGINT and wait
> longer
>
> Status in squid3 package in Ubuntu:
> Triaged
>
> Bug description:
> [Impact]
>
> Squid 3.1.19-1ubuntu3.12.04.3 on Precise and 3.3.8-1ubuntu6.3 on
> Trusty do not clean up its PID file, causing logrotate to spit out an
> error each night when cron runs it.
>
> The underlying issue here is that the upstart script does not wait
> long enough for squid3 to finish - one might also argue that it is
> using the wrong signal if it expects squid to finish quickly.
>
> By default, when squid3 receives a SIGTERM it will close the socket
> for incoming connections and will wait for existing connections to
> complete. The shutdown_lifetime directive configures how long it waits
> before forcefully closing those open connections and it is set by
> default to 30 seconds.
>
> Current setting, SIGKILL happens after 5 seconds (which is the upstart
> default):
> =====
> # strace -r -e trace=signal,file -p $(cat /var/run/squid3.pid)
> Process 20865 attached - interrupt to quit
> 0.000000 --- SIGTERM (Terminated) @ 0 (0) ---
> 0.000440 rt_sigreturn(0xf) = -1 EINTR (Interrupted system call)
> 0.000418 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118,
> ...}) = 0
> 5.006483 +++ killed by SIGKILL +++
>
> Killing squid with SIGKILL might lead to cache index corruption and a
> very slow startup the next time it is started: http://lists.squid-
> cache.org/pipermail/squid-users/2015-July/004752.html
>
> The first way to avoid this is by increasing upstart's "kill timeout"
> to a few seconds more then the "shutdown_lifetime" directive.
>
> With "kill timeout 40" in upstart script, notice the pid file being
> unlinked (the ~30 seconds wait is due to the default value of
> shutdown_lifetime directive):
> =====
> # strace -r -e trace=signal,file -p $(cat /var/run/squid3.pid)
> Process 20805 attached - interrupt to quit
> 0.000000 --- SIGTERM (Terminated) @ 0 (0) ---
> 0.000284 rt_sigreturn(0xf) = -1 EINTR (Interrupted system call)
> 0.000402 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118,
> ...}) = 0
> 31.611726 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118,
> ...}) = 0
> 0.001792 --- SIGCHLD (Child exited) @ 0 (0) ---
> 0.000247 rt_sigreturn(0xffffffffffffffff) = 63
> 0.006027 unlink("/var/run/squid3.pid") = 0
> 0.000539 open("/proc/sys/kernel/ngroups_max", O_RDONLY) = 0
> 0.000387 open("/etc/group", O_RDONLY|O_CLOEXEC) = 0
> Process 20805 detached
>
> Still, this will unfortunately lead to a slow shutdown time as
> reported in http://askubuntu.com/questions/18127/squid3-starts-and-
> stops-slowly
>
> To solve this it is better to issue a SIGINT instead of SIGTERM
> (upstart default) by adding "kill signal SIGINT" to squid's upstart
> script.
>
> With "kill signal SIGINT", which does not wait on outstanding
> connections:
> =====
> # strace -r -e trace=signal,file -p $(cat /var/run/squid3.pid)
> Process 20891 attached - interrupt to quit
> 0.000000 --- SIGINT (Interrupt) @ 0 (0) ---
> 0.000312 rt_sigreturn(0x2) = -1 EINTR (Interrupted system call)
> 0.000237 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118,
> ...}) = 0
> 1.123564 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118,
> ...}) = 0
> 0.000754 --- SIGCHLD (Child exited) @ 0 (0) ---
> 0.000103 rt_sigreturn(0xffffffffffffffff) = 0
> 0.002031 unlink("/var/run/squid3.pid") = 0
> 0.000154 open("/proc/sys/kernel/ngroups_max", O_RDONLY) = 0
> 0.000142 open("/etc/group", O_RDONLY|O_CLOEXEC) = 0
> Process 20891 detached
>
> The whole stop process is very quick. The downside is that squid will
> forcefully close all open connections immediately, but that was
> already happening when SIGKILL was being issued before - so we are
> actually keeping the same behavior as before and there are no bugs
> complaining about it. And the PID file gets removed anyway.
>
> The proposal is to add both "kill signal SIGINT" and "kill timeout 40"
> to squid, so squid should have enough time to close all open cache
> index files.
>
> [Test Case]
>
> # start squid3
> squid3 start/running, process 10392
> # cat /var/run/squid3.pid
> 10392
> # stop squid3
> squid3 stop/waiting
> # cat /var/run/squid3.pid
> 10392
> # logrotate -f /etc/logrotate.d/squid3
> squid: ERROR: Could not send signal 10 to process 10392: (3) No such
> process
> error: error running shared postrotate script for '/var/log/squid3/*.log
> '
>
> [Regression Potential]
>
> * The main regression potencial is related to the "kill timeout" value.
> If set too high, a user's system might *seem* too slow to shutdown. Note
> that this should rarely happen: only when squid actually needs a long time
> to write down and close its cache index files (high system load maybe?).
> Overall, whenever a system behaves this way the user probably has other
> problems to solve.
> * If "kill timeout" value is set too low we might incur in the same
> problem originally reported and end up calling SIGKILL on squid too early,
> probably while it is closing its cache index files, thus corrupting them.
> Anyway, this was probably happening already anyway, as SIGKILL is always
> called after 5 seconds in the current upstart script.
>
> Overall, there is very low regression potential for this fix.
>
>
> [Other info]
>
> # lsb_release -d
> Description: Ubuntu 12.04.5 LTS
> # dpkg -l squid3
> Desired=Unknown/Install/Remove/Purge/Hold
> |
> Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> ||/ Name Version Description
>
> +++-==================-==================-====================================================
> ii squid3 3.1.19-1ubuntu3.12 Full featured Web Proxy cache
> (HTTP proxy)
>
> # lsb_release -d
> Description: Ubuntu 14.04.3 LTS
> # dpkg -l squid3
> Desired=Unknown/Install/Remove/Purge/Hold
> |
> Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> ||/ Name Version
> Architecture Description
>
> +++-===========================================-==========================-==========================-===========================================================================================
> ii squid3 3.3.8-1ubuntu6.3
> amd64 Full featured Web Proxy cache (HTTP proxy)
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/squid3/+bug/1073478/+subscriptions
>