First of all I beg your pardon, for this bug being dormant. We started to clear this kind of bugs recently, but obviously one can't do all in one day :-/ In this in particular I was made aware of by others being affected. ## CASE ## For an SRU we need a reproducible case of some sorts. On first try in the mpm event config as it is installed by default I can't see this issue. Tried on Trusty and Xenial, but this stays at all waiting for connection. I'm through some iterations on this and while not complete yet have some lessons learned, we need: 1. long running requests 2. a graceful restart that puts all those into "G" for a while 3. a lot requests that fail due to most/all slots being blocked After some iterations I got this two system setup: # Server # Prep somewhat large file non compressible file on server $ dd if=/dev/urandom of=/var/www/html/test1 bs=1M count=32 $ dd if=/dev/urandom of=/var/www/html/test1 bs=1k count=4 # Client # slow down to somewhat like an internet connection $ tc qdisc add dev eth0 root handle 1: htb default 12 $ tc class add dev eth0 parent 1:1 classid 1:12 htb rate 4000kbps ceil 12000kbps $ tc qdisc add dev eth0 parent 1:12 netem delay 200ms # Client - 150 slow requests $ ab -q -S -c 150 -n 150 10.0.4.30/test1 # Server reload to cause "G" state $ apache2ctl status; apache2ctl graceful; apache2ctl status; sleep 5s; apache2ctl status # Client many fast exceeding the few/no remaining workers $ ab -q -S -c 150 -n 5000 10.0.4.30/test2 # I can see the status being clogged up in "G" on most workers {1} but things are still working fine :-/ There must be a way to reproduce this that isn't "be a webhoster for 4000 people". If one of the affected has something better please let me know. ## FIX ## On the fix itself it is also a bit messy as there were multiple revision, splits of PRs and such. What I found is that initial proposal of the fix that eventually got into 2.4.25 is attached as [2], but was broken up upstream. There on the 2.4 branch it actually is [3] plus some doc fixups [4],[5] to be correct after the fix. ## Testing ## For now I have made a ppa available for testing at [6]. This is a backport of the referred fix for Xenial - yet untested. Since I can't reproduce yet I'm depending on you to: a) test from the ppa if the fix is working (and showing no related regression b) helping me with or without the ppa to create some working steps to reproduce [1]: https://bz.apache.org/bugzilla/show_bug.cgi?id=53555#c39 [2]: https://bz.apache.org/bugzilla/attachment.cgi?id=34202&action=diff&collapsed=&headers=1&format=raw [3]: https://github.com/apache/httpd/commit/e7407f84ec2a1b7f2c04775a230f147c08860c7c [4]: https://github.com/apache/httpd/commit/86db1247c70699df6acad75f2491b8baa0030ff6 [5]: https://github.com/apache/httpd/commit/1a7e2114393c9dd9f8d87e53dfd74ce9ede3c3c0 [6]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3034 {1} 18.8 requests/sec - 22.7 MB/second - 1.2 MB/request 1 requests currently being processed, 24 idle workers PID Connections Threads Async connections total accepting busy idle writing keep-alive closing 2661 15 no 0 0 0 0 5 2697 22 no 0 0 0 0 0 2725 5 no 0 0 0 0 1 2753 15 no 0 0 0 0 3 2589 8 no 0 0 0 0 3 2815 52 yes 1 24 0 0 34 Sum 117 1 24 0 0 46 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG___ _W____________________ Appendix: default mpm event conf is /etc/apache2/mods-available/mpm_event.conf StartServers 2 MinSpareThreads 25 MaxSpareThreads 75 ThreadLimit 64 ThreadsPerChild 25 MaxRequestWorkers 150 MaxConnectionsPerChild 0