apache stuck and child processes fail to start

Bug #1988224 reported by Daniel
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
apache2 (Ubuntu)
Fix Released
Undecided
Unassigned
Jammy
Triaged
Medium
Bryce Harrington

Bug Description

[Impact]
Starting with Apache 2.4.51 the child processes for Apache fail to start after a period of time or after so many requests. The issue became much more frequent with 2.4.52 due to fixes in accounting of active_daemons.

Apache will stop accepting new connection until the parent Apache process is eventually restarted. Requests for new web pages will just hang. In netstat there are many CLOSE_WAIT and ESTABLISHED entries.

Seems to be affecting mostly servers using the event MPM.
Reverting to focal's Apache 2.4.41 or moving to kinetic's 2.4.53 resolves it.

[Test Case]
$ lxc launch ubuntu:jammy test-apache2 --vm
$ lxc shell test-apache2
# apt update && apt dist-upgrade -y
# apt install apache2 lynx -y
# cat > /etc/apache2/mods-enabled/mpm_event.conf << __EOF__
<IfModule mpm_event_module>
    StartServers 1
    MinSpareThreads 1
    MaxSpareThreads 1
    ThreadsPerChild 1
    MaxRequestWorkers 1
    MaxConnectionsPerChild 1
</IfModule>
__EOF__
# systemctl restart apache2
# while lynx -dump -read_timeout=10 localhost/server-status; do continue; done
... This command should never return, but it will crash eventually ...

[Where Problems Could Occur]
TBD

[Original Report]
Since updating the LTS server version from 20.04 to 22.04, I've had problems with apache 2.4.52 (2.4.52-1ubuntu4.1) in mpm_event mode. The child processes for apache fail to start after a period of time. The webserver is unreachable. In netstat there are many CLOSE_WAIT and ESTABLISHED entries.

The error_log says:

[Tue Aug 30 12:59:38.451188 2022] [http2:warn] [pid 687247:tid 139925644072832] AH10291: h2_workers: cleanup, 1 idle workers did not exit after 5 seconds.

# ps xau |grep apache
root 899 0.0 0.4 86712 40116 ? Ss Aug25 0:33 /usr/sbin/apache2 -k start
www-data 901 0.0 0.0 3736 156 ? Ss Aug25 0:15 /usr/bin/htcacheclean -d 120 -p /var/cache/apache2/mod_cache_disk -l 300M -n
www-data 687242 0.0 0.3 87020 30104 ? S 02:00 0:00 /usr/sbin/apache2 -k start

Other processes are gone.

The problem is known and already fixed in 2.4.53, see: https://bz.apache.org/bugzilla/show_bug.cgi?id=65769

I haven't found anything that this problem has already been fixed in the ubuntu version of apache. That's why I'm making this bug report.

Many Thanks.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thanks for the bug report.

I tried reproducing it here using the configuration file outlined in https://bz.apache.org/bugzilla/show_bug.cgi?id=65769#c1, but as far as I have checked things are still working. Would you have a reproducer that I can use to make sure that we're dealing with aforementioned upstream issue?

Thanks.

Changed in apache2 (Ubuntu):
status: New → Incomplete
Revision history for this message
Daniel (danuntu) wrote (last edit ):
Download full text (12.6 KiB)

I think I can reproduce this on my server. I changed the apache config as follows and called lynx a few times. After a few successful lynxes, the server no longer responds. What can i do to debug more?

apache config:

<IfModule mpm_event_module>
    StartServers 1
    MinSpareThreads 1
    MaxSpareThreads 1
    ThreadsPerChild 1
    MaxRequestWorkers 1
    MaxConnectionsPerChild 1
</IfModule>

# systemctl restart apache2.service
# tail -f /var/log/apache2/error.log

[Wed Aug 31 22:45:54.390886 2022] [mpm_event:notice] [pid 998699:tid 139894640904064] AH00492: caught SIGWINCH, shutting down gracefully
[Wed Aug 31 22:45:54.649967 2022] [:notice] [pid 1000585:tid 140120599648128] ModSecurity for Apache/2.9.5 (http://www.modsecurity.org/) configured.
[Wed Aug 31 22:45:54.650003 2022] [:notice] [pid 1000585:tid 140120599648128] ModSecurity: APR compiled version="1.7.0"; loaded version="1.7.0"
[Wed Aug 31 22:45:54.650016 2022] [:notice] [pid 1000585:tid 140120599648128] ModSecurity: PCRE compiled version="8.39 "; loaded version="8.39 2016-06-14"
[Wed Aug 31 22:45:54.650023 2022] [:notice] [pid 1000585:tid 140120599648128] ModSecurity: LUA compiled version="Lua 5.1"
[Wed Aug 31 22:45:54.650029 2022] [:notice] [pid 1000585:tid 140120599648128] ModSecurity: YAJL compiled version="2.1.0"
[Wed Aug 31 22:45:54.650035 2022] [:notice] [pid 1000585:tid 140120599648128] ModSecurity: LIBXML compiled version="2.9.12"
[Wed Aug 31 22:45:54.650094 2022] [:notice] [pid 1000585:tid 140120599648128] ModSecurity: StatusEngine call: "2.9.5,Apache/2.4.52 (Ubuntu),1.7.0/1.7.0,8.39/8.39 2016-06-14,Lua 5.1,2.9.12,e0c86efba80afb51d5e1caae78c38635f2f8b5df"
[Wed Aug 31 22:46:02.727769 2022] [suexec:notice] [pid 1000585:tid 140120599648128] AH01232: suEXEC mechanism enabled (wrapper: /usr/lib/apache2/suexec)
[Wed Aug 31 22:46:02.982447 2022] [ssl:warn] [pid 1000591:tid 140120599648128] AH01909: broeltal.de:443:0 server certificate does NOT include an ID which matches the server name
[Wed Aug 31 22:46:03.025496 2022] [mpm_event:notice] [pid 1000591:tid 140120599648128] AH00489: Apache/2.4.52 (Ubuntu) OpenSSL/3.0.2 mod_fcgid/2.3.9 configured -- resuming normal operations
[Wed Aug 31 22:46:03.025617 2022] [core:notice] [pid 1000591:tid 140120599648128] AH00094: Command line: '/usr/sbin/apache2'

after 3 successfully lynx commands:
[Wed Aug 31 22:46:18.046440 2022] [mpm_event:error] [pid 1000591:tid 140120599648128] AH00484: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting

# ps xaufe
root 1000591 0.5 0.2 69388 18392 ? Ss 22:46 0:00 /usr/sbin/apache2 -k start
root 1000593 0.0 0.0 3088 1164 ? S 22:46 0:00 \_ /usr/bin/rotatelogs -l /www/foo1_log 3600
root 1000594 0.0 0.0 3088 1136 ? S 22:46 0:00 \_ /usr/bin/rotatelogs -l /www/foo2_log 1800
www-data 1000597 0.0 0.1 69332 13424 ? S 22:46 0:00 \_ /usr/sbin/apache2 -k start
www-data 1000644 0.0 0.1 233620 14836 ? Sl 22:46 0:00 \_ /usr/sbin/apache2 -k start

$ lynx -dump -read_timeout=10 localhost/server-status
               Apache Server Status for localhost (...

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hmm, OK, for some reason the bug doesn't reproduce for me if I use a Jammy LXD container. I've just tried reproducing it using a VM and now I can trigger the error.

Thanks for the feedback. I'm marking this bug as Triaged and adding the server-todo tag; someone from the team should work on it soon (unless you would like to drive the SRU yourself, of course!).

Thanks.

tags: added: server-todo
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Instructions on how to reproduce:

$ lxc launch ubuntu:jammy test-apache2 --vm
$ lxc shell test-apache2
# apt update && apt dist-upgrade -y
# apt install apache2 lynx -y
# cat > /etc/apache2/mods-enabled/mpm_event.conf << __EOF__
<IfModule mpm_event_module>
    StartServers 1
    MinSpareThreads 1
    MaxSpareThreads 1
    ThreadsPerChild 1
    MaxRequestWorkers 1
    MaxConnectionsPerChild 1
</IfModule>
__EOF__
# systemctl restart apache2
# while lynx -dump -read_timeout=10 localhost/server-status; do continue; done
... This command should never return, but it will crash eventually ...

Changed in apache2 (Ubuntu Jammy):
status: New → Triaged
importance: Undecided → Medium
Changed in apache2 (Ubuntu):
status: Incomplete → Fix Released
Changed in apache2 (Ubuntu Jammy):
assignee: nobody → Sergio Durigan Junior (sergiodj)
Bryce Harrington (bryce)
Changed in apache2 (Ubuntu Jammy):
assignee: Sergio Durigan Junior (sergiodj) → Bryce Harrington (bryce)
Revision history for this message
Bryce Harrington (bryce) wrote :

From the upstream bug report, do I understand correctly that the solution for this is these two changesets?

    https://svn.apache.org/viewvc?view=revision&revision=1897149
    https://svn.apache.org/viewvc?view=revision&revision=1901234

There are also some intervening merge changesets (r1901199 and r1898467) that each backport a bunch more revisions, but it looks like those are not required for this?

Also, it sounds like this doesn't affect focal or earlier, and is already fixed in kinetic's apache2, so we only need to SRU jammy, correct?

Bryce Harrington (bryce)
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.