reload does not shut down lighttpd gracefully

Bug #1707312 reported by Vasya Pupkin
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lighttpd (Ubuntu)
Fix Released
Medium
Andreas Hasenack

Bug Description

I noticed that /etc/init.d/lighttpd reload kills lighttpd almost instantly. I can manually shut down lighttpd gracefully by sending it SIGINT, so this is an issue somewhere within init.d script.

Here's log message when I call init.d reload:

2017-07-28 22:54:10: (server.c.1558) server stopped by UID = 0 PID = 1
2017-07-28 22:54:10: (log.c.164) server started

Here's log message when I send SIGINT:

2017-07-28 23:30:29: (server.c.1442) [note] graceful shutdown started
2017-07-28 23:30:37: (server.c.1558) server stopped by UID = 0 PID = 16714

Note first line about graceful shutdown. Never happens when reload is done via init.d script.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: lighttpd 1.4.35-4ubuntu2
ProcVersionSignature: Ubuntu 4.10.0-26.30~16.04.1-generic 4.10.17
Uname: Linux 4.10.0-26-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.10
Architecture: amd64
Date: Sat Jul 29 01:43:39 2017
InstallationDate: Installed on 2011-04-14 (2297 days ago)
InstallationMedia: Ubuntu-Server 10.04.2 LTS "Lucid Lynx" - Release amd64 (20110211.1)
SourcePackage: lighttpd
UpgradeStatus: Upgraded to xenial on 2016-07-30 (363 days ago)
modified.conffile..etc.lighttpd.conf-available.05-auth.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-accesslog.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-cgi.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-dir-listing.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-evasive.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-evhost.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-expire.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-fastcgi.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-flv-streaming.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-no-www.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-proxy.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-rrdtool.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-simple-vhost.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-ssi.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-ssl.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-status.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-userdir.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.10-usertrack.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.11-extforward.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.15-fastcgi-php.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.90-debian-doc.conf: [deleted]
modified.conffile..etc.lighttpd.conf-available.README: [deleted]
modified.conffile..etc.lighttpd.lighttpd.conf: [inaccessible: [Errno 13] Permission denied: '/etc/lighttpd/lighttpd.conf']

Revision history for this message
Vasya Pupkin (shadowlmd) wrote :
Vasya Pupkin (shadowlmd)
description: updated
Revision history for this message
Vasya Pupkin (shadowlmd) wrote :

After reading man start-stop-daemon I found the problem.

"... schedule is a list of at least two items separated by slashes (/); each item may be -signal-number or [-]signal-name, which means to send that signal, or timeout, which means to wait that many seconds for processes to exit ..."

"If a schedule is specified, then any signal specified with --signal is ignored"

So a solution would be to remove "--signal INT" and change "--retry=TERM/60/KILL/5" to "--retry=INT/60/KILL/5".

On my system it works as expected after these changes.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "lighttpd.diff" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Vasya Pupkin (shadowlmd) wrote :

Can someone please review and accept patch? This is pretty-much critical issue for those using lighttpd as a primary web server.

Revision history for this message
Vasya Pupkin (shadowlmd) wrote :

Hello? Anyone here? If this package is not supported, please remove it from repository then.

Revision history for this message
Vasya Pupkin (shadowlmd) wrote :

Since Ubuntu Review Team is ignoring this ticket, I subscribed Robie Basak and Artur Rona. Hope you guys can help.

Revision history for this message
Robie Basak (racb) wrote :

Andreas, could you validate this and convert it into a suitable form for upload, please?

Changed in lighttpd (Ubuntu):
assignee: nobody → Andreas Hasenack (ahasenack)
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

The analysis is correct and easy to reproduce. Just keep a connection open to the server and issue the reload. Without the fix, the connection is terminated immediately. With the fix, the timeouts are respected, and if the connection is closed before the timeout is reached, the reload action continues.

I'll prepare packages, check if debian has the same bug (I believe it has), etc. Thanks for this report, @shadowlmd

Changed in lighttpd (Ubuntu):
status: New → Triaged
tags: added: bitesize server-next
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

xenial is a confusing case, at least it was to me. The package has both a sysv initscript and a systemd service file.

They don't have the same action parameters, though. The systemd service has just start/stop/restart (which is stop+start). The sysv initscript has a lot more.

Depending on what action is called, either the systemd definition or the sysv initscript will be used :/

Start, stop and status, for example, are done via systemd. But reload, since the systemd job has no such action, is taken from the sysv script. That's my understanding at least.

But, we are fixing the reload action, so the sysv script is the right place. That being said, after the fix, the "status" action is no longer accurate. It reports the service as dead even though it's running. Probably because of the pid change, so this is still messy.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

"reload" the way it is implemented should be called something else: "graceful-restart" probably, because that's what it is (even in the source code).

Revision history for this message
Vasya Pupkin (shadowlmd) wrote :

The only way to reload configuration is actually through a graceful restart, so that's why they called it "force-reload", I guess. I don't know when "force-reload" and "reload" became aliases, though. Renaming might be a good idea, but then /usr/sbin/lighty-{enable,disable}-mod scripts should be updated as well. And probably some docs too.

Revision history for this message
Robie Basak (racb) wrote : Re: [Bug 1707312] Re: reload does not shut down lighttpd gracefully

"reload" and "force-reload" are defined in
https://www.debian.org/doc/debian-policy/#writing-the-scripts

Revision history for this message
Vasya Pupkin (shadowlmd) wrote :

Ah, nice. So "force-reload" is the right name, but for some reason "reload" is aliased to it.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Ok, so I just need to sort out the fact that "status" doesn't work after that "reload" (because there is a new process, and the reload was done via the sysv script while the status is done via systemd). One could argue this is a different bug, though.

Revision history for this message
Vasya Pupkin (shadowlmd) wrote :

I believe it should all be done via systemd instead. Less work in the future when sysv will be deprecated. But as a quick solution an obvious bug should be fixed right now.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Yes, if all is done via systemd, then status after reload works. But that change is too much for an SRU I think.

That being said, status after reload is already broken, so we are not making things worse by fixing that reload action.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Ok, sorry for spamming on this bug, but I'm not happy with that reload action, fixed or not. Everything breaks after it's issued, it's not just status: stop, start and restart also don't work anymore. I'm having a hard time overlooking that. I'll file a separate bug.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Vasya Pupkin (shadowlmd) wrote :

I see. Can you just fix systemctl's force-reload action to send SIGINT instead of SIGTERM? I am not familiar with systemd and couldn't find where it's defined.

Revision history for this message
Vasya Pupkin (shadowlmd) wrote :

Ok, I'm really sorry for spamming, but here's a proper fix for systemd's lighttpd.service. After these changes command `systemctl force-reload lighttpd.service` does a proper graceful reastart. It will also kill all children after 60 seconds in case something goes wrong (for example, if lighttpd fails to kill spawned fcgi processes). This patch should make everyone happy. :)

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Haha, I had almost the same thing :)

[Service]
ExecStartPre=/usr/sbin/lighttpd -t -f /etc/lighttpd/lighttpd.conf
ExecStart=/usr/sbin/lighttpd -D -f /etc/lighttpd/lighttpd.conf
KillSignal=SIGINT
TimeoutStopSec=10

I was using 10s for testing :)

That essentially makes any action that involves a "stop" to be a graceful one via SIGINT, which is good. We don't have a solution for a plain "reload" issued via "service", though (misnamed as it is). Maybe we should use ExecReload in systemd and send a HUP signal to the service, effectively making this just like the sysv reopen-logs action. We can also change the logrotate script to use reload then. I'll do some more testing.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

The moment we have a reload action in systemd (via ExecReload), then "service lighttpd reload" will call that instead of the misnamed and broken one from the sysv script. We are changing behavior of the reload action, but I think we have to: we are changing it from a broken restart to an actually working reload. That should be acceptable for an SRU as it gets us into a better situation I think.

Revision history for this message
Vasya Pupkin (shadowlmd) wrote :

Nice! "KillMode=mixed" is important here, though.

Regarding "reload", in /lib/lsb/init-functions.d/40-systemd included from /lib/lsb/init-functions included from /etc/init.d/lighttpd there is this code:

if [ "$(systemctl -p CanReload show $service 2>/dev/null)" = "CanReload=no" ] && [ "${1:-}" = "reload" ]; then
  _use_systemctl=0
fi

Since `sudo systemctl -p CanReload show lighttpd.service` returns "CanReload=no", "reload" action is not handled by systemd and the code from /etc/init.d/lighttpd is executed. I believe, this should not happen and the only way to properly fix it is to remove "reload" from /etc/init.d/lighttpd completely.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Yeah, CanReload will only be true if the service has a ExecReload config. I was toying with this for a bit:
[Service]
...
ExecReload=/bin/kill -HUP $MAINPID

It works, the reload from the sysv script is ignored. But this isn't a proper reload, it's just about reopening the logfiles, so I guess we shouldn't use it.

Changed in lighttpd (Ubuntu):
status: Triaged → In Progress
importance: Undecided → Medium
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Can you comment on why KillMode needs to be set to "mixed"? The default is control-group. The sysv script also only seems to kill the main process (which could be a bug, granted). Is it because of CGI scripts?

Revision history for this message
Vasya Pupkin (shadowlmd) wrote :

From systemd documentation:

"If set to control-group, all remaining processes in the control group of this unit will be killed on unit stop (for services: after the stop command is executed, as configured with ExecStop=)."

We don't want that. We want lighttpd to care about spawned fcgi processes (see fastcgi-php mod, if enabled, lighttpd spawns constant number of fcgi backends and stops them on exit). And only if it fails to stop them, we want to kill them after a timeout. That's exactly what "mixed" mode will do according to the documentation:

"If set to mixed, the SIGTERM signal (see below) is sent to the main process while the subsequent SIGKILL signal (see below) is sent to all remaining processes of the unit's control group."

If spawned fcgi processes will be terminated instantly when stop (restart, whatever) command is called, it will not be a graceful stop/restart.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Test packages are building here for xenial, zesty and artful: https://launchpad.net/~ahasenack/+archive/ubuntu/lighttpd-reload-1707312-1721635

What they do:
a) remove the "reload" action from the SysV initscript
b) apply the systemd service file changes suggested here

Expected outcome:
- slight change in behavior for all actions that involve a stop: the command will block until all existing connections terminate, or the timeout (60s) passes. During that time, new connections won't be accepted.
- no more reload action
- force-reload is an alias to restart in systemd, so the force-reload action from the SysV initscript isn't run, but its code was fixed to behave as a graceful restart anyway
- all actions continue to work after force-reload or restart as expected

Of all these, the one that I wonder about for an SRU is the first one. It's a change in behavior. Now, a restart (for example) can take up to 60s, whereas before it was "immediate". Unfortunately there is no way that I can see where we could add a new graceful-restart action. If we add it to just the SysV initscript, we run into bug #1721635 again.

I'm not too worried about removing the "reload" action because it was a) broken; b) incorrect, as it was restarting the service instead of reloading. There is no proper reload in this version of lighttpd: is going to be supported in the upcoming version 1.4.46 via a USR1 signal (https://git.lighttpd.net/lighttpd/lighttpd1.4.git/commit/?h=0ae6bab4a97f12a0c93200df36ac1741696eeed5)

Revision history for this message
Vasya Pupkin (shadowlmd) wrote :

So, what is going on with this bug? Do more people need to test the package before it goes into repository?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I would like the opinion of more people, yes. I fear we are changing the behavior of the restart action. Before it would be immediate (tearing down live connections), whereas with the fix it is transformed into a graceful-restart. Normally services have these two options separate so that the admin can choose. That was the case with the sysv script. But now with the systemd service file that choice is no longer available.

Revision history for this message
Vasya Pupkin (shadowlmd) wrote :

Actually this wasn't the case with sysv script. If you remember, sysv script was the first thing I attached patch for, because it was obviously bugged: both --signal and change --retry options given, so a signal given with --signal was simply ignored. If you check sysv script in Debian, at least in wheezy, you will find it wasn't using --retry at all, so only --signal INT was used, which is exactly as we are trying to achieve now. There was never supposed to be an ungraceful restart option. It appeared only because someone made a mistake when modified sysv script.

Revision history for this message
Vasya Pupkin (shadowlmd) wrote :

Hmm, I was wrong. Actually, sysv script in Debian Wheezy was using TERM for stop (and thus for restart as well), and INT for reload and force-reload. So yes, you were right, there was a way to kill it ungracefully. But documentation never suggested anything but force-reload to apply new settings, which is supposed to be graceful but is broken currently.

Robie Basak (racb)
tags: added: server-next-drop
Robie Basak (racb)
tags: removed: server-next server-next-drop
tags: removed: bitesize
Revision history for this message
gstrauss (gstrauss) wrote :

lighttpd 1.4.46 was released in Oct 2017, almost 3 years ago.
Is there any reason this is still open?

Revision history for this message
Vasya Pupkin (shadowlmd) wrote :

Well, the issue is still not resolved in Ubuntu 16.04 and probably 18.04 as well. And 16.04 is still supported. Let's wait until 16.04 support ends and close this instead of fixing or backporting new version.

Revision history for this message
gstrauss (gstrauss) wrote :
Changed in lighttpd (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
gstrauss (gstrauss) wrote :

lighttpd 1.4.59 is part of Ubuntu 21.04 (Hirsute Hippo)

no reason to keep this open for years if the Ubuntu lighttpd package maintainers are never going to get around to backporting the changes

Changed in lighttpd (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.