apache stuck and child processes fail to start
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
apache2 (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Jammy |
Fix Released
|
Medium
|
Bryce Harrington |
Bug Description
[Impact]
Starting with Apache 2.4.51 the child processes for Apache fail to start after a period of time or after so many requests. The issue became much more frequent with 2.4.52 due to fixes in accounting of active_daemons.
Apache will stop accepting new connection until the parent Apache process is eventually restarted. Requests for new web pages will just hang. In netstat there are many CLOSE_WAIT and ESTABLISHED entries.
Seems to be affecting mostly servers using the event MPM.
Reverting to focal's Apache 2.4.41 or moving to kinetic's 2.4.53 resolves it.
[Test Case]
$ lxc launch ubuntu:jammy apache2-
$ lxc shell apache2-
# apt update && apt dist-upgrade -y
# apt install apache2 lynx -y
# cat > /etc/apache2/
<IfModule mpm_event_module>
StartServers 1
MinSpareThreads 1
MaxSpareThreads 1
ThreadsPerChild 1
MaxRequestW
MaxConnecti
</IfModule>
__EOF__
# systemctl restart apache2
# while lynx -dump -read_timeout=10 localhost/
... This command should never return, but it will crash eventually ...
Example failed output:
root@apache2-
Looking up localhost
Making HTTP connection to localhost
Sending HTTP request.
HTTP request sent; waiting for response.
Alert!: Socket read failed (too many tries).
Connection interrupted.
lynx: Can't access startfile http://
real 0m10.082s
user 0m0.044s
sys 0m0.011s
Example of successful output:
Srv PID Acc M CPU SS Req Dur Conn Child Slot Client Protocol VHost
Request
0-0 5891 1/0/36 W 0.00 0 0 25 0.0 0.00 0.14 127.0.0.1 http/1.1
apache2-
1-0 - 0/0/36 . 0.00 0 0 26 0.0 0.00 0.14 127.0.0.1 http/1.1
apache2-
... (continuous output) ...
[Where Problems Could Occur]
The included patches involve changes to connection behavior, so it would be worth watching for reports of misbehaviors relating to client processes such as not loading pages from the webserver.
The patches change C code, so the usual sorts of regression risks apply. These don't change memory management or pointer behavior, so its less likely that regressions would involve memory leaks or invalid pointers, but more likely to be loop misbehaviors (getting stuck, skipping an operation or doing it too many times, etc.)
The upstream codebase does not have further refinements to these patches in particular, but there are subsequent patches to the files in question for unrelated issues. It doesn't look like we require any of those for this problem, but it's conceivable something is overlooked. If this is true, then obviously look for problem reports that match description of one of those upstream commits.
[Original Report]
Since updating the LTS server version from 20.04 to 22.04, I've had problems with apache 2.4.52 (2.4.52-1ubuntu4.1) in mpm_event mode. The child processes for apache fail to start after a period of time. The webserver is unreachable. In netstat there are many CLOSE_WAIT and ESTABLISHED entries.
The error_log says:
[Tue Aug 30 12:59:38.451188 2022] [http2:warn] [pid 687247:tid 139925644072832] AH10291: h2_workers: cleanup, 1 idle workers did not exit after 5 seconds.
# ps xau |grep apache
root 899 0.0 0.4 86712 40116 ? Ss Aug25 0:33 /usr/sbin/apache2 -k start
www-data 901 0.0 0.0 3736 156 ? Ss Aug25 0:15 /usr/bin/
www-data 687242 0.0 0.3 87020 30104 ? S 02:00 0:00 /usr/sbin/apache2 -k start
Other processes are gone.
The problem is known and already fixed in 2.4.53, see: https:/
I haven't found anything that this problem has already been fixed in the ubuntu version of apache. That's why I'm making this bug report.
Many Thanks.
Related branches
- git-ubuntu bot: Approve
- Christian Ehrhardt (community): Approve
- Canonical Server Reporter: Pending requested
-
Diff: 294 lines (+240/-2)5 files modifieddebian/changelog (+12/-0)
debian/patches/fix-a-possible-listener-deadlock.patch (+114/-0)
debian/patches/handle-children-killed-pathologically.patch (+99/-0)
debian/patches/series (+2/-0)
debian/perl-framework/t/ssl/ocsp.t (+13/-2)
Changed in apache2 (Ubuntu Jammy): | |
assignee: | nobody → Sergio Durigan Junior (sergiodj) |
Changed in apache2 (Ubuntu Jammy): | |
assignee: | Sergio Durigan Junior (sergiodj) → Bryce Harrington (bryce) |
description: | updated |
description: | updated |
description: | updated |
Changed in apache2 (Ubuntu Jammy): | |
status: | Triaged → In Progress |
Thanks for the bug report.
I tried reproducing it here using the configuration file outlined in https:/ /bz.apache. org/bugzilla/ show_bug. cgi?id= 65769#c1, but as far as I have checked things are still working. Would you have a reproducer that I can use to make sure that we're dealing with aforementioned upstream issue?
Thanks.