reload does not shut down lighttpd gracefully

Bug #1707312 reported by Vasya Pupkin on 2017-07-28
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lighttpd (Ubuntu)
Medium
Andreas Hasenack

Bug Description

I noticed that /etc/init.d/lighttpd reload kills lighttpd almost instantly. I can manually shut down lighttpd gracefully by sending it SIGINT, so this is an issue somewhere within init.d script.

Here's log message when I call init.d reload:

2017-07-28 22:54:10: (server.c.1558) server stopped by UID = 0 PID = 1
2017-07-28 22:54:10: (log.c.164) server started

Here's log message when I send SIGINT:

2017-07-28 23:30:29: (server.c.1442) [note] graceful shutdown started
2017-07-28 23:30:37: (server.c.1558) server stopped by UID = 0 PID = 16714

Note first line about graceful shutdown. Never happens when reload is done via init.d script.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: lighttpd 1.4.35-4ubuntu2
ProcVersionSignature: Ubuntu 4.10.0-26.30~16.04.1-generic 4.10.17
Uname: Linux 4.10.0-26-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.10
Architecture: amd64
Date: Sat Jul 29 01:43:39 2017
InstallationDate: Installed on 2011-04-14 (2297 days ago)
InstallationMedia: Ubuntu-Server 10.04.2 LTS "Lucid Lynx" - Release amd64 (20110211.1)
SourcePackage: lighttpd
UpgradeStatus: Upgraded to xenial on 2016-07-30 (363 days ago)
modified.conffile..etc.lighttpd.conf-available.05-auth.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-accesslog.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-cgi.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-dir-listing.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-evasive.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-evhost.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-expire.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-fastcgi.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-flv-streaming.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-no-www.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-proxy.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-rrdtool.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-simple-vhost.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-ssi.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-ssl.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-status.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-userdir.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-usertrack.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.11-extforward.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.15-fastcgi-php.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.90-debian-doc.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.README: [deleted]
modified.conffile..etc.lighttpd.lighttpd.conf: [inaccessible: [Errno 13] Permission denied: '/etc/lighttpd/lighttpd.conf']

Vasya Pupkin (shadowlmd) wrote :
Vasya Pupkin (shadowlmd) on 2017-07-28
description: updated
Vasya Pupkin (shadowlmd) wrote :

After reading man start-stop-daemon I found the problem.

"... schedule is a list of at least two items separated by slashes (/); each item may be -signal-number or [-]signal-name, which means to send that signal, or timeout, which means to wait that many seconds for processes to exit ..."

"If a schedule is specified, then any signal specified with --signal is ignored"

So a solution would be to remove "--signal INT" and change "--retry=TERM/60/KILL/5" to "--retry=INT/60/KILL/5".

On my system it works as expected after these changes.

The attachment "lighttpd.diff" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Vasya Pupkin (shadowlmd) wrote :

Can someone please review and accept patch? This is pretty-much critical issue for those using lighttpd as a primary web server.

Vasya Pupkin (shadowlmd) wrote :

Hello? Anyone here? If this package is not supported, please remove it from repository then.

Vasya Pupkin (shadowlmd) wrote :

Since Ubuntu Review Team is ignoring this ticket, I subscribed Robie Basak and Artur Rona. Hope you guys can help.

Robie Basak (racb) wrote :

Andreas, could you validate this and convert it into a suitable form for upload, please?

Changed in lighttpd (Ubuntu):
assignee: nobody → Andreas Hasenack (ahasenack)
Andreas Hasenack (ahasenack) wrote :

The analysis is correct and easy to reproduce. Just keep a connection open to the server and issue the reload. Without the fix, the connection is terminated immediately. With the fix, the timeouts are respected, and if the connection is closed before the timeout is reached, the reload action continues.

I'll prepare packages, check if debian has the same bug (I believe it has), etc. Thanks for this report, @shadowlmd

Changed in lighttpd (Ubuntu):
status: New → Triaged
tags: added: bitesize server-next
Andreas Hasenack (ahasenack) wrote :

xenial is a confusing case, at least it was to me. The package has both a sysv initscript and a systemd service file.

They don't have the same action parameters, though. The systemd service has just start/stop/restart (which is stop+start). The sysv initscript has a lot more.

Depending on what action is called, either the systemd definition or the sysv initscript will be used :/

Start, stop and status, for example, are done via systemd. But reload, since the systemd job has no such action, is taken from the sysv script. That's my understanding at least.

But, we are fixing the reload action, so the sysv script is the right place. That being said, after the fix, the "status" action is no longer accurate. It reports the service as dead even though it's running. Probably because of the pid change, so this is still messy.

Andreas Hasenack (ahasenack) wrote :

"reload" the way it is implemented should be called something else: "graceful-restart" probably, because that's what it is (even in the source code).

Vasya Pupkin (shadowlmd) wrote :

The only way to reload configuration is actually through a graceful restart, so that's why they called it "force-reload", I guess. I don't know when "force-reload" and "reload" became aliases, though. Renaming might be a good idea, but then /usr/sbin/lighty-{enable,disable}-mod scripts should be updated as well. And probably some docs too.

Vasya Pupkin (shadowlmd) wrote :

Ah, nice. So "force-reload" is the right name, but for some reason "reload" is aliased to it.

Andreas Hasenack (ahasenack) wrote :

Ok, so I just need to sort out the fact that "status" doesn't work after that "reload" (because there is a new process, and the reload was done via the sysv script while the status is done via systemd). One could argue this is a different bug, though.

Vasya Pupkin (shadowlmd) wrote :

I believe it should all be done via systemd instead. Less work in the future when sysv will be deprecated. But as a quick solution an obvious bug should be fixed right now.

Andreas Hasenack (ahasenack) wrote :

Yes, if all is done via systemd, then status after reload works. But that change is too much for an SRU I think.

That being said, status after reload is already broken, so we are not making things worse by fixing that reload action.

Andreas Hasenack (ahasenack) wrote :

Ok, sorry for spamming on this bug, but I'm not happy with that reload action, fixed or not. Everything breaks after it's issued, it's not just status: stop, start and restart also don't work anymore. I'm having a hard time overlooking that. I'll file a separate bug.

Vasya Pupkin (shadowlmd) wrote :

I see. Can you just fix systemctl's force-reload action to send SIGINT instead of SIGTERM? I am not familiar with systemd and couldn't find where it's defined.

Vasya Pupkin (shadowlmd) wrote :

Ok, I'm really sorry for spamming, but here's a proper fix for systemd's lighttpd.service. After these changes command `systemctl force-reload lighttpd.service` does a proper graceful reastart. It will also kill all children after 60 seconds in case something goes wrong (for example, if lighttpd fails to kill spawned fcgi processes). This patch should make everyone happy. :)

Andreas Hasenack (ahasenack) wrote :

Haha, I had almost the same thing :)

[Service]
ExecStartPre=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf
ExecStart=/usr/sbin/lighttpd -D -f /etc/lighttpd/lighttpd.conf
KillSignal=SIGINT
TimeoutStopSec=10

I was using 10s for testing :)

That essentially makes any action that involves a "stop" to be a graceful one via SIGINT, which is good. We don't have a solution for a plain "reload" issued via "service", though (misnamed as it is). Maybe we should use ExecReload in systemd and send a HUP signal to the service, effectively making this just like the sysv reopen-logs action. We can also change the logrotate script to use reload then. I'll do some more testing.

Andreas Hasenack (ahasenack) wrote :

The moment we have a reload action in systemd (via ExecReload), then "service lighttpd reload" will call that instead of the misnamed and broken one from the sysv script. We are changing behavior of the reload action, but I think we have to: we are changing it from a broken restart to an actually working reload. That should be acceptable for an SRU as it gets us into a better situation I think.

Vasya Pupkin (shadowlmd) wrote :

Nice! "KillMode=mixed" is important here, though.

Regarding "reload", in /lib/lsb/init-functions.d/40-systemd included from /lib/lsb/init-functions included from /etc/init.d/lighttpd there is this code:

if [ "$(systemctl -p CanReload show $service 2>/dev/null)" = "CanReload=no" ] && [ "${1:-}" = "reload" ]; then
  _use_systemctl=0
fi

Since `sudo systemctl -p CanReload show lighttpd.service` returns "CanReload=no", "reload" action is not handled by systemd and the code from /etc/init.d/lighttpd is executed. I believe, this should not happen and the only way to properly fix it is to remove "reload" from /etc/init.d/lighttpd completely.

Andreas Hasenack (ahasenack) wrote :

Yeah, CanReload will only be true if the service has a ExecReload config. I was toying with this for a bit:
[Service]
...
ExecReload=/bin/kill -HUP $MAINPID

It works, the reload from the sysv script is ignored. But this isn't a proper reload, it's just about reopening the logfiles, so I guess we shouldn't use it.

Changed in lighttpd (Ubuntu):
status: Triaged → In Progress
importance: Undecided → Medium
Andreas Hasenack (ahasenack) wrote :

Can you comment on why KillMode needs to be set to "mixed"? The default is control-group. The sysv script also only seems to kill the main process (which could be a bug, granted). Is it because of CGI scripts?

Vasya Pupkin (shadowlmd) wrote :

From systemd documentation:

"If set to control-group, all remaining processes in the control group of this unit will be killed on unit stop (for services: after the stop command is executed, as configured with ExecStop=)."

We don't want that. We want lighttpd to care about spawned fcgi processes (see fastcgi-php mod, if enabled, lighttpd spawns constant number of fcgi backends and stops them on exit). And only if it fails to stop them, we want to kill them after a timeout. That's exactly what "mixed" mode will do according to the documentation:

"If set to mixed, the SIGTERM signal (see below) is sent to the main process while the subsequent SIGKILL signal (see below) is sent to all remaining processes of the unit's control group."

If spawned fcgi processes will be terminated instantly when stop (restart, whatever) command is called, it will not be a graceful stop/restart.

Andreas Hasenack (ahasenack) wrote :

Test packages are building here for xenial, zesty and artful: https://launchpad.net/~ahasenack/+archive/ubuntu/lighttpd-reload-1707312-1721635

What they do:
a) remove the "reload" action from the SysV initscript
b) apply the systemd service file changes suggested here

Expected outcome:
- slight change in behavior for all actions that involve a stop: the command will block until all existing connections terminate, or the timeout (60s) passes. During that time, new connections won't be accepted.
- no more reload action
- force-reload is an alias to restart in systemd, so the force-reload action from the SysV initscript isn't run, but its code was fixed to behave as a graceful restart anyway
- all actions continue to work after force-reload or restart as expected

Of all these, the one that I wonder about for an SRU is the first one. It's a change in behavior. Now, a restart (for example) can take up to 60s, whereas before it was "immediate". Unfortunately there is no way that I can see where we could add a new graceful-restart action. If we add it to just the SysV initscript, we run into bug #1721635 again.

I'm not too worried about removing the "reload" action because it was a) broken; b) incorrect, as it was restarting the service instead of reloading. There is no proper reload in this version of lighttpd: is going to be supported in the upcoming version 1.4.46 via a USR1 signal (https://git.lighttpd.net/lighttpd/lighttpd1.4.git/commit/?h=0ae6bab4a97f12a0c93200df36ac1741696eeed5)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers