[worker/event] Some child processes aren't killed on reload (graceful), being stuck in state "Sending Reply", leaving old logfiles open, disk fills up, server stops responding after some days, DoS

Bug #2054301 reported by thermoman
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
apache2 (Ubuntu)
Triaged
Undecided
Unassigned

Bug Description

We recently migrated http://www.nettolohn.de which has moderate traffic
from self hosted to cloud hosted at AWS and observe a potential
Denial-of-Service bug in Apache httpd.

After some days with the daily "graceful restart" via logrotate the
processlist looks something like this:

% date -R
Mon, 19 Feb 2024 10:07:49 +0100

% ps f -o user,pid,nice,%cpu,%mem,cputime,etime,thcount,command ax
USER PID NI %CPU %MEM TIME ELAPSED THCNT COMMAND
root 765081 0 0.0 0.1 00:00:17 3-21:39:31 1 /usr/sbin/apache2 -k start
www-data 765084 0 0.0 0.2 00:00:01 3-21:39:31 4 \_ /usr/sbin/apache2 -k start <---
www-data 765088 0 0.0 0.2 00:00:01 3-21:39:31 3 \_ /usr/sbin/apache2 -k start <---
www-data 765381 0 0.0 0.2 00:04:31 3-21:38:43 3 \_ /usr/sbin/apache2 -k start <---
www-data 765383 0 0.0 0.2 00:04:52 3-21:38:43 3 \_ /usr/sbin/apache2 -k start <---
www-data 765385 0 0.2 0.2 00:12:24 3-21:38:43 4 \_ /usr/sbin/apache2 -k start <---
www-data 778949 0 0.2 0.2 00:10:48 3-10:07:39 3 \_ /usr/sbin/apache2 -k start <---
www-data 779006 0 0.0 0.2 00:04:15 3-10:07:38 3 \_ /usr/sbin/apache2 -k start <---
www-data 833382 0 0.6 0.2 00:12:31 1-10:07:23 3 \_ /usr/sbin/apache2 -k start <---
www-data 833383 0 0.4 0.2 00:09:17 1-10:07:23 3 \_ /usr/sbin/apache2 -k start <---
www-data 859951 0 0.0 0.0 00:00:00 10:07:26 1 \_ /usr/sbin/apache2 -k start
netto 860221 0 0.0 0.7 00:00:13 10:07:12 1 | \_ /usr/lib/cgi-bin/php7.4
netto 868707 0 1.2 1.1 00:02:04 02:41:29 1 | \_ /usr/lib/cgi-bin/php7.4
netto 868709 0 1.1 1.1 00:01:47 02:41:28 1 | \_ /usr/lib/cgi-bin/php7.4
netto 868711 0 1.4 1.1 00:02:25 02:41:27 1 | \_ /usr/lib/cgi-bin/php7.4
netto 868713 0 1.2 1.1 00:01:59 02:41:27 1 | \_ /usr/lib/cgi-bin/php7.4
netto 868715 0 1.1 1.1 00:01:54 02:41:26 1 | \_ /usr/lib/cgi-bin/php7.4
netto 868721 0 1.1 1.1 00:01:46 02:41:24 1 | \_ /usr/lib/cgi-bin/php7.4
netto 868725 0 1.1 1.1 00:01:52 02:41:23 1 | \_ /usr/lib/cgi-bin/php7.4
netto 870902 0 0.9 1.1 00:00:29 53:50 1 | \_ /usr/lib/cgi-bin/php7.4
netto 870906 0 1.7 1.1 00:00:57 53:50 1 | \_ /usr/lib/cgi-bin/php7.4
netto 870911 0 1.0 1.1 00:00:33 53:50 1 | \_ /usr/lib/cgi-bin/php7.4
netto 870912 0 1.1 0.7 00:00:37 53:50 1 | \_ /usr/lib/cgi-bin/php7.4
www-data 859952 0 0.2 0.3 00:01:29 10:07:26 52 \_ /usr/sbin/apache2 -k start
www-data 860007 0 0.2 0.3 00:01:34 10:07:25 52 \_ /usr/sbin/apache2 -k start
www-data 860008 0 0.6 0.5 00:04:01 10:07:25 52 \_ /usr/sbin/apache2 -k start
www-data 860009 0 0.4 0.4 00:02:59 10:07:25 52 \_ /usr/sbin/apache2 -k start
www-data 860010 0 0.3 0.3 00:01:51 10:07:25 52 \_ /usr/sbin/apache2 -k start

The apache2 processes marked with "<---" are leftover child processes
that should have been killed long ago. The first 5 processes are
almost as old as the parent process 765081 despite "apache2ctl graceful"
being called once a day via logrotate.

These processes are stuck in state "Sending Reply" respectively
"Stopping" (see below) despite netstat not showing any connections
for these processes.

The output of above command

  ps f -o user,pid,nice,%cpu,%mem,cputime,etime,thcount,command ax

should - under normal conditions - show only one long running apache2
process (the parent/father process) whereas all apache2 child processes
should only exist for a maximum of 24 hours considering logrotate
issueing a graceful restart every 24 hours.

Instead, in the above process list child processes can be seen that
are 3 days and 21 hours old - almost as old as the parent process.

 After issuing a fresh

  service apache2 reload

some of the child processes that should be killed have inbound TCP
connections in state CLOSE_WAIT.

Issuing a

  service apache2 restart

results in

  AH00045: child process ..... still did not exit, sending a SIGTERM

lines in the error_log for every lingering apache2 process that didn't
terminate in the past on it's own.

Some of these lingering apache2 processes have still open file handles
on logfiles that logrotate did delete already:

% lsof -n | grep -P 'apache2/.*deleted'
apache2 765084 www-data 2w REG 259,3 69142 263343 /var/log/apache2/error.log-20240216 (deleted)
apache2 765084 www-data 11w REG 259,3 391415370 263308 /var/log/apache2/access.log-20240216 (deleted)
apache2 765084 765293 apache2 www-data 2w REG 259,3 69142 263343 /var/log/apache2/error.log-20240216 (deleted)
apache2 765084 765293 apache2 www-data 11w REG 259,3 391415370 263308 /var/log/apache2/access.log-20240216 (deleted)
apache2 765084 765321 apache2 www-data 2w REG 259,3 69142 263343 /var/log/apache2/error.log-20240216 (deleted)
apache2 765084 765321 apache2 www-data 11w REG 259,3 391415370 263308 /var/log/apache2/access.log-20240216 (deleted)
apache2 765084 765342 apache2 www-data 2w REG 259,3 69142 263343 /var/log/apache2/error.log-20240216 (deleted)
apache2 765084 765342 apache2 www-data 11w REG 259,3 391415370 263308 /var/log/apache2/access.log-20240216 (deleted)
apache2 765088 www-data 2w REG 259,3 69142 263343 /var/log/apache2/error.log-20240216 (deleted)
apache2 765088 www-data 11w REG 259,3 391415370 263308 /var/log/apache2/access.log-20240216 (deleted)
apache2 765088 765228 apache2 www-data 2w REG 259,3 69142 263343 /var/log/apache2/error.log-20240216 (deleted)
apache2 765088 765228 apache2 www-data 11w REG 259,3 391415370 263308 /var/log/apache2/access.log-20240216 (deleted)
apache2 765088 765347 apache2 www-data 2w REG 259,3 69142 263343 /var/log/apache2/error.log-20240216 (deleted)
apache2 765088 765347 apache2 www-data 11w REG 259,3 391415370 263308 /var/log/apache2/access.log-20240216 (deleted)
apache2 765381 www-data 2w REG 259,3 69142 263343 /var/log/apache2/error.log-20240216 (deleted)
apache2 765381 www-data 11w REG 259,3 391415370 263308 /var/log/apache2/access.log-20240216 (deleted)
apache2 765381 765622 apache2 www-data 2w REG 259,3 69142 263343 /var/log/apache2/error.log-20240216 (deleted)
apache2 765381 765622 apache2 www-data 11w REG 259,3 391415370 263308 /var/log/apache2/access.log-20240216 (deleted)
apache2 765381 765630 apache2 www-data 2w REG 259,3 69142 263343 /var/log/apache2/error.log-20240216 (deleted)
apache2 765381 765630 apache2 www-data 11w REG 259,3 391415370 263308 /var/log/apache2/access.log-20240216 (deleted)
apache2 765383 www-data 2w REG 259,3 69142 263343 /var/log/apache2/error.log-20240216 (deleted)
apache2 765383 www-data 11w REG 259,3 391415370 263308 /var/log/apache2/access.log-20240216 (deleted)
apache2 765383 765589 apache2 www-data 2w REG 259,3 69142 263343 /var/log/apache2/error.log-20240216 (deleted)
apache2 765383 765589 apache2 www-data 11w REG 259,3 391415370 263308 /var/log/apache2/access.log-20240216 (deleted)
apache2 765383 765638 apache2 www-data 2w REG 259,3 69142 263343 /var/log/apache2/error.log-20240216 (deleted)
apache2 765383 765638 apache2 www-data 11w REG 259,3 391415370 263308 /var/log/apache2/access.log-20240216 (deleted)
apache2 765385 www-data 2w REG 259,3 69142 263343 /var/log/apache2/error.log-20240216 (deleted)
apache2 765385 www-data 11w REG 259,3 391415370 263308 /var/log/apache2/access.log-20240216 (deleted)
apache2 765385 765488 apache2 www-data 2w REG 259,3 69142 263343 /var/log/apache2/error.log-20240216 (deleted)
apache2 765385 765488 apache2 www-data 11w REG 259,3 391415370 263308 /var/log/apache2/access.log-20240216 (deleted)
apache2 765385 765536 apache2 www-data 2w REG 259,3 69142 263343 /var/log/apache2/error.log-20240216 (deleted)
apache2 765385 765536 apache2 www-data 11w REG 259,3 391415370 263308 /var/log/apache2/access.log-20240216 (deleted)
apache2 765385 765645 apache2 www-data 2w REG 259,3 69142 263343 /var/log/apache2/error.log-20240216 (deleted)
apache2 765385 765645 apache2 www-data 11w REG 259,3 391415370 263308 /var/log/apache2/access.log-20240216 (deleted)
apache2 778949 www-data 2w REG 259,3 3121 263679 /var/log/apache2/error.log-20240217 (deleted)
apache2 778949 www-data 11w REG 259,3 351229757 263330 /var/log/apache2/access.log-20240217 (deleted)
apache2 778949 778998 apache2 www-data 2w REG 259,3 3121 263679 /var/log/apache2/error.log-20240217 (deleted)
apache2 778949 778998 apache2 www-data 11w REG 259,3 351229757 263330 /var/log/apache2/access.log-20240217 (deleted)
apache2 778949 779001 apache2 www-data 2w REG 259,3 3121 263679 /var/log/apache2/error.log-20240217 (deleted)
apache2 778949 779001 apache2 www-data 11w REG 259,3 351229757 263330 /var/log/apache2/access.log-20240217 (deleted)
apache2 779006 www-data 2w REG 259,3 3121 263679 /var/log/apache2/error.log-20240217 (deleted)
apache2 779006 www-data 11w REG 259,3 351229757 263330 /var/log/apache2/access.log-20240217 (deleted)
apache2 779006 779070 apache2 www-data 2w REG 259,3 3121 263679 /var/log/apache2/error.log-20240217 (deleted)
apache2 779006 779070 apache2 www-data 11w REG 259,3 351229757 263330 /var/log/apache2/access.log-20240217 (deleted)
apache2 779006 779114 apache2 www-data 2w REG 259,3 3121 263679 /var/log/apache2/error.log-20240217 (deleted)
apache2 779006 779114 apache2 www-data 11w REG 259,3 351229757 263330 /var/log/apache2/access.log-20240217 (deleted)

Over time, this issue will

a) fill up the /var/log volume since deleted log files are still being
   held open via file handles (and thus still consume disk space since
   they aren't released yet). The tool "df" (disk free) will show
   gigabytes being used on /var/log volume despite "du" (disk usage)
   will only show some megabytes being used by files in /var/log.

b) consume all process slots leaving no apache2 process accepting
   connections, eventually causing a Denial-of-Service.

The only viable workaround right now is restarting apache from time
to time, e.g. via cron.

We did use MPM worker in the past and switched to event but this
didn't change anything.

#
## Reproduce
#

We can reproduce the issue on this server with medium traffic
via restarting apache2 and some time later issueing a reload.
Each time a reload is issued some child processes aren't killed
but keep on running.

On some other hosts we did observe the issue with MPM worker
but after switching to event the issue disappeared (for now).

We didn't test prefork.

#
## The /server-status page looks like this:
#

Server Version: Apache/2.4.52 (Ubuntu) mod_fcgid/2.3.9 OpenSSL/3.0.2
Server MPM: event
Server Built: 2023-10-26T13:44:44

Current Time: Monday, 19-Feb-2024 10:07:49 CET
Restart Time: Thursday, 15-Feb-2024 12:28:19 CET
Parent Server Config. Generation: 6
Parent Server MPM Generation: 5
Server uptime: 3 days 21 hours 39 minutes 30 seconds
Server load: 0.24 0.48 0.44
Total accesses: 3656062 - Total Traffic: 65.5 GB - Total Duration: 89167195
CPU Usage: u3441.31 s812.66 cu27370.4 cs5745.51 - 11.1% CPU load
10.8 requests/sec - 203.7 kB/second - 18.8 kB/request - 24.3889 ms/request
43 requests currently being processed, 207 idle workers

Slot PID Stopping Connections Threads Async connections
    total accep busy idle writing keep-alive closing
0 765381 yes (old gen) 1 no 0 0 0 0 0
1 765084 yes (old gen) 2 no 0 0 0 0 0
3 778949 yes (old gen) 1 no 0 0 0 0 0
4 765088 yes (old gen) 1 no 0 0 0 0 0
5 765383 yes (old gen) 1 no 0 0 0 0 0
6 860007 no 12 yes 6 44 0 4 2
7 765385 yes (old gen) 2 no 0 0 0 0 0
8 779006 yes (old gen) 1 no 0 0 0 0 0
9 860008 no 32 yes 13 37 0 15 4
10 859952 no 9 yes 4 46 0 5 1
11 860009 no 19 yes 9 41 0 10 1
12 860010 no 14 yes 11 39 0 3 1
13 833382 yes (old gen) 1 no 0 0 0 0 0
14 833383 yes (old gen) 1 no 0 0 0 0 0
Sum 14 9 97 43 207 0 37 9

..........................................W........W............
...............W................................................
................................................................
.....W......................................W...................
..........................W.................___R______R_________
____R_____R___R_____________R_.............W....................
...W..................W.........................................
.._R_R__________R__RRR___R_R________R_R______R___R_R______R_____
_________________RRR____________________R________________R______
__R___________RRRR____RRR_R__WR___RR__R____R_______________R____
_R______R_...............................W......................
........................................W.......................
................................................................
................................................................
................................................................
........................................

Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process

Srv PID Acc M CPU SS Req Dur Conn Child Slot Client Protocol VHost Request
[..]
0-1 765381 1/1791/1791 W 271.04 295656 0 41826 0.0 34.26 34.26 46.94.XX.XXX http/1.1 www.nettolohn.de:443 POST / HTTP/1.1
1-0 765084 1/9/9 W 1.20 337112 0 2012 0.0 0.16 0.16 185.96.XXX.XX http/1.1 www.nettolohn.de:443 POST / HTTP/1.1
1-0 765084 1/9/9 W 1.24 337111 0 30 0.0 0.13 0.13 84.182.XXX.XXX http/1.1 www.nettolohn.de:443 GET /nettolohnoptimierung/3_gehaltsextras.html HTTP/1.1
3-2 778949 1/4547/6692 W 648.03 209229 0 149596 0.0 81.17 119.50 2.207.XX.XXX http/1.1 www.nettolohn.de:443 GET / HTTP/1.1
4-0 765088 1/5/5 W 0.73 337120 0 38 0.0 0.11 0.11 91.48.XX.XX http/1.1 www.nettolohn.de:443 POST /rechner/stundenlohn.html HTTP/1.1
5-1 765383 1/1806/1806 W 292.54 295653 0 45170 0.0 32.79 32.79 158.181.XX.XXX http/1.1 www.nettolohn.de:443 POST /rechner/arbeitslosengeld.html HTTP/1.1
7-1 765385 1/4800/4800 W 744.05 295645 0 103573 0.0 87.87 87.87 89.245.XX.XX http/1.1 www.nettolohn.de:443 POST /rechner/wunschnetto.html HTTP/1.1
7-1 765385 1/4682/4682 W 744.00 295656 0 109918 0.0 86.79 86.79 2.243.XXX.XXX http/1.1 www.nettolohn.de:443 GET /rechner/teilzeitarbeit-nettolohn.html HTTP/1.1
8-2 779006 1/1797/1797 W 254.93 209236 0 102323 0.0 31.17 31.17 31.16.XXX.X http/1.1 www.nettolohn.de:443 POST / HTTP/1.1
13-4 833382 1/5308/6290 W 750.80 36442 0 105750 0.0 96.14 113.63 77.25.XX.X http/1.1 www.nettolohn.de:443 POST /rechner/stundenlohn.html HTTP/1.1
14-4 833383 1/3992/3992 W 557.50 36443 0 73643 0.0 75.22 75.22 93.242.XX.XXX http/1.1 www.nettolohn.de:443 POST /rechner/stundenlohn.html HTTP/1.1

% netstat -antpe | grep -P '765084|765088|765381|765383|765385|778949|779006|833382|833383'
(no output)

#
## Additional info
#

% cat /etc/logrotate.d/apache2
/var/log/apache2/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 640 root adm
    sharedscripts
    prerotate
 if [ -d /etc/logrotate.d/httpd-prerotate ]; then
     run-parts /etc/logrotate.d/httpd-prerotate
 fi
    endscript
    postrotate
 if pgrep -f ^/usr/sbin/apache2 > /dev/null; then
     invoke-rc.d apache2 reload 2>&1 | logger -t apache2.logrotate
 fi
    endscript
}

% systemctl status logrotate.service
○ logrotate.service - Rotate log files
     Loaded: loaded (/lib/systemd/system/logrotate.service; static)
     Active: inactive (dead) since Mon 2024-02-19 00:00:20 CET; 11h ago
TriggeredBy: ● logrotate.timer
       Docs: man:logrotate(8)
             man:logrotate.conf(5)
   Main PID: 859918 (code=exited, status=0/SUCCESS)
        CPU: 3.672s

Feb 19 00:00:16 nettolohn systemd[1]: Starting Rotate log files...
Feb 19 00:00:20 nettolohn systemd[1]: logrotate.service: Deactivated successfully.
Feb 19 00:00:20 nettolohn systemd[1]: Finished Rotate log files.
Feb 19 00:00:20 nettolohn systemd[1]: logrotate.service: Consumed 3.672s CPU time.

% ls -l /var/log/apache2/
total 574120
drwxr-x--- 2 root adm 4096 Feb 19 00:00 ./
drwxrwxr-x 16 root syslog 4096 Feb 18 00:00 ../
-rw-r----- 1 root adm 119997317 Feb 19 11:28 access.log
-rw-r--r-- 1 root root 18482414 Feb 9 00:00 access.log-20240209.gz
-rw-r----- 1 root adm 22905461 Feb 10 00:00 access.log-20240210.gz
-rw-r----- 1 root adm 14505536 Feb 11 00:00 access.log-20240211.gz
-rw-r----- 1 root adm 17583621 Feb 12 00:00 access.log-20240212.gz
-rw-r----- 1 root adm 24211889 Feb 13 00:00 access.log-20240213.gz
-rw-r----- 1 root adm 24927132 Feb 14 00:00 access.log-20240214.gz
-rw-r----- 1 root adm 26833450 Feb 15 00:00 access.log-20240215.gz
-rw-r----- 1 root adm 25034516 Feb 16 00:00 access.log-20240216.gz
-rw-r----- 1 root adm 22664411 Feb 17 00:00 access.log-20240217.gz
-rw-r----- 1 root adm 14522595 Feb 18 00:00 access.log-20240218.gz
-rw-r----- 1 root adm 256082118 Feb 19 00:00 access.log-20240219
-rw-r----- 1 root adm 1849 Feb 19 10:55 error.log
-rw-r----- 1 root adm 539 Feb 9 00:00 error.log-20240209.gz
-rw-r----- 1 root adm 1349 Feb 10 00:00 error.log-20240210.gz
-rw-r----- 1 root adm 493 Feb 11 00:00 error.log-20240211.gz
-rw-r----- 1 root adm 498 Feb 12 00:00 error.log-20240212.gz
-rw-r----- 1 root adm 360 Feb 13 00:00 error.log-20240213.gz
-rw-r----- 1 root adm 360 Feb 14 00:00 error.log-20240214.gz
-rw-r----- 1 root adm 361 Feb 15 00:00 error.log-20240215.gz
-rw-r----- 1 root adm 5540 Feb 16 00:00 error.log-20240216.gz
-rw-r----- 1 root adm 857 Feb 17 00:00 error.log-20240217.gz
-rw-r----- 1 root adm 529 Feb 18 00:00 error.log-20240218.gz
-rw-r----- 1 root adm 1193 Feb 19 00:00 error.log-20240219
-rw-r----- 1 root adm 0 Jan 17 00:00 other_vhosts_access.log
-rw-r----- 1 root adm 1700 Feb 19 09:14 suexec.log
-rw-r----- 1 root adm 191 Feb 8 15:43 suexec.log-20240209.gz
-rw-r----- 1 root adm 120 Feb 9 07:24 suexec.log-20240210.gz
-rw-r----- 1 root adm 136 Feb 10 22:00 suexec.log-20240211.gz
-rw-r----- 1 root adm 115 Feb 11 17:40 suexec.log-20240212.gz
-rw-r----- 1 root adm 122 Feb 12 19:10 suexec.log-20240213.gz
-rw-r----- 1 root adm 143 Feb 13 14:28 suexec.log-20240214.gz
-rw-r----- 1 root adm 154 Feb 14 19:14 suexec.log-20240215.gz
-rw-r----- 1 root adm 570 Feb 15 18:55 suexec.log-20240216.gz
-rw-r----- 1 root adm 207 Feb 16 13:33 suexec.log-20240217.gz
-rw-r----- 1 root adm 121 Feb 17 17:02 suexec.log-20240218.gz
-rw-r----- 1 root adm 2380 Feb 18 17:29 suexec.log-20240219

% uname -a
Linux nettolohn 6.2.0-1018-aws #18~22.04.1-Ubuntu SMP Wed Jan 10 22:54:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

% lsb_release -rd
Description: Ubuntu 22.04.3 LTS
Release: 22.04

% dpkg -l | grep apache
ii apache2 2.4.52-1ubuntu4.7 amd64 Apache HTTP Server
ii apache2-bin 2.4.52-1ubuntu4.7 amd64 Apache HTTP Server (modules and other binary files)
ii apache2-data 2.4.52-1ubuntu4.7 all Apache HTTP Server (common files)
ii apache2-suexec-custom 2.4.52-1ubuntu4.7 amd64 Apache HTTP Server configurable suexec program for mod_suexec
ii apache2-utils 2.4.52-1ubuntu4.7 amd64 Apache HTTP Server (utility programs for web servers)
ii libapache2-mod-fcgid 1:2.3.9-4 amd64 FastCGI interface module for Apache 2

% apt-cache policy apache2 apache2-bin apache2-data apache2-suexec-custom apache2-utils libapache2-mod-fcgid
apache2:
  Installed: 2.4.52-1ubuntu4.7
  Candidate: 2.4.52-1ubuntu4.7
  Version table:
 *** 2.4.52-1ubuntu4.7 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
        100 /var/lib/dpkg/status
     2.4.52-1ubuntu4 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy/main amd64 Packages
apache2-bin:
  Installed: 2.4.52-1ubuntu4.7
  Candidate: 2.4.52-1ubuntu4.7
  Version table:
 *** 2.4.52-1ubuntu4.7 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
        100 /var/lib/dpkg/status
     2.4.52-1ubuntu4 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy/main amd64 Packages
apache2-data:
  Installed: 2.4.52-1ubuntu4.7
  Candidate: 2.4.52-1ubuntu4.7
  Version table:
 *** 2.4.52-1ubuntu4.7 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
        100 /var/lib/dpkg/status
     2.4.52-1ubuntu4 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy/main amd64 Packages
apache2-suexec-custom:
  Installed: 2.4.52-1ubuntu4.7
  Candidate: 2.4.52-1ubuntu4.7
  Version table:
 *** 2.4.52-1ubuntu4.7 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages
        500 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages
        100 /var/lib/dpkg/status
     2.4.52-1ubuntu4 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy/universe amd64 Packages
apache2-utils:
  Installed: 2.4.52-1ubuntu4.7
  Candidate: 2.4.52-1ubuntu4.7
  Version table:
 *** 2.4.52-1ubuntu4.7 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
        100 /var/lib/dpkg/status
     2.4.52-1ubuntu4 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy/main amd64 Packages
libapache2-mod-fcgid:
  Installed: 1:2.3.9-4
  Candidate: 1:2.3.9-4
  Version table:
 *** 1:2.3.9-4 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy/universe amd64 Packages
        100 /var/lib/dpkg/status

#
## Apache-Config
#

% a2query -m
authz_host (enabled by maintainer script)
filter (enabled by maintainer script)
deflate (enabled by maintainer script)
authn_file (enabled by maintainer script)
setenvif (enabled by maintainer script)
alias (enabled by maintainer script)
suexec (enabled by site administrator)
auth_basic (enabled by maintainer script)
rewrite (enabled by site administrator)
expires (enabled by site administrator)
headers (enabled by site administrator)
reqtimeout (enabled by site administrator)
dir (enabled by maintainer script)
status (enabled by site administrator)
socache_shmcb (enabled by site administrator)
authn_core (enabled by maintainer script)
authz_core (enabled by maintainer script)
mime (enabled by maintainer script)
ssl (enabled by site administrator)
mpm_event (enabled by site administrator)
fcgid (enabled by maintainer script)
authz_groupfile (enabled by site administrator)
authz_user (enabled by maintainer script)
env (enabled by maintainer script)

% a2query -s
40__nettolohn_backend (enabled by site administrator)
30__nettolohn (enabled by site administrator)
00__redirect_to_www (enabled by site administrator)

% a2query -c
other-vhosts-access-log (enabled by maintainer script)
serve-cgi-bin (enabled by maintainer script)
charset (enabled by maintainer script)
security (enabled by maintainer script)
localized-error-pages (enabled by maintainer script)
custom (enabled by site administrator)

% a2query -a
20120211

% a2query -v
2.4.52

% a2query -M
event

% a2query -d
/usr/lib/apache2/modules/

% grep -Pv '^\s*$|^\s*#' /etc/apache2/mods-enabled/mpm_event.conf
<IfModule mpm_event_module>
 ServerLimit 20
 ThreadsPerChild 50
 MaxRequestWorkers 1000
 StartServers 5
 MinSpareThreads 50
 MaxSpareThreads 150
 ThreadLimit 100
 MaxConnectionsPerChild 500000
</IfModule>

% grep -Pv '^\s*$|^#' /etc/apache2/apache2.conf
DefaultRuntimeDir ${APACHE_RUN_DIR}
PidFile ${APACHE_PID_FILE}
Timeout 300
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 5
User ${APACHE_RUN_USER}
Group ${APACHE_RUN_GROUP}
HostnameLookups Off
ErrorLog ${APACHE_LOG_DIR}/error.log
LogLevel warn
IncludeOptional mods-enabled/*.load
IncludeOptional mods-enabled/*.conf
Include ports.conf
<Directory />
 Options FollowSymLinks
 AllowOverride None
 Require all denied
</Directory>
AccessFileName .htaccess
<FilesMatch "^\.ht">
 Require all denied
</FilesMatch>
LogFormat "%V:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
LogFormat "%v %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vcombined
LogFormat "%V:%{local}p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\" -- %{ms}T L:\"%{Location}o\"" vcombinedplus
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %O" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent
IncludeOptional conf-enabled/*.conf
IncludeOptional sites-enabled/*.conf

% grep -Pv '^\s*$|^\s*#' /etc/apache2/conf-enabled/custom.conf
FileETag None
<Files ~ "(^\.)|((\.swp|\.inc|\~)$)">
    Require all denied
</Files>
<Directory ~ "/\.">
    Require all denied
</Directory>
Alias /.well-known /var/letsencrypt/.well-known
<Directory ~ "/var/letsencrypt/\.well-known">
    Require all granted
</Directory>

% grep -Pv '^\s*$|^\s*#' /etc/apache2/conf-enabled/security.conf
ServerTokens Prod
ServerSignature Off
TraceEnable Off

% grep -Pv '^\s*$|^\s*#' /etc/apache2/conf-enabled/other-vhosts-access-log.conf
CustomLog ${APACHE_LOG_DIR}/access.log vcombinedplus

% grep -Pv '^\s*$|^\s*#' /etc/apache2/suexec/www-data
/etc/apache2/phpfcgi-scripts
public_html/cgi-bin

% grep -Pv '^\s*$|^\s*#' /etc/apache2/sites-enabled/30__nettolohn.conf
<VirtualHost *:443>
 SSLEngine on
        SSLCertificateFile /etc/ssl/custom-certs/www.nettolohn.de.crt
        SSLCertificateKeyFile /etc/ssl/custom-private/www.nettolohn.de.key
        SSLCertificateChainFile /etc/ssl/custom-certs/www.nettolohn.de.ca.crt
 ServerName www.nettolohn.de
 DocumentRoot /home/netto/www.nettolohn.de/html
 ErrorLog /home/netto/www.nettolohn.de/www_log/error_log
 SuexecUserGroup netto netto
 FCGIWrapper /etc/apache2/phpfcgi-scripts/nettolohn/php .php
 AddHandler fcgid-script .php
 AddDefaultCharset utf-8
 <Directory "/home/netto/www.nettolohn.de/html">
  Options None +FollowSymLinks +ExecCGI
  AllowOverride All
  Require all granted
 </Directory>
 <Directory "/home/netto/www.nettolohn.de/html/magazin/wp-admin">
  Options None +FollowSymLinks +ExecCGI
  AllowOverride All
  Require all denied
 </Directory>
</VirtualHost>

% ls -lA /etc/apache2/phpfcgi-scripts/nettolohn
total 76
-r-x------ 1 netto netto 105 Jul 9 2020 php*
-r-------- 1 netto netto 73149 Jan 24 08:37 php.ini

% cat /etc/apache2/phpfcgi-scripts/nettolohn/php
#!/bin/sh
export PHPRC="`dirname "$0"`"
export PHP_FCGI_MAX_REQUESTS=100000
exec /usr/lib/cgi-bin/php7.4

% grep -Pv '^\s*$|^\s*#' /etc/apache2/mods-enabled/fcgid.conf
<IfModule mod_fcgid.c>
  FcgidConnectTimeout 20
  FcgidIOTimeout 600
  FcgidMinProcessesPerClass 10
  FcgidMaxProcessesPerClass 30
  FcgidMaxProcesses 50
  FcgidIdleTimeout 600
  FcgidIdleScanInterval 60
  FcgidMaxRequestsPerProcess 100000
  FcgidProcessLifeTime 21600
  FcgidFixPathinfo 1
  FcgidMaxRequestLen 1073741824
  <IfModule mod_mime.c>
    AddHandler fcgid-script .fcgi
  </IfModule>
</IfModule>

Revision history for this message
thermoman (thermoman) wrote :
Revision history for this message
thermoman (thermoman) wrote (last edit ): Re: [Bug 2054301]
Download full text (17.5 KiB)

Just to try a newer version of Apache I upgraded the webserver in question from official 2.4.52-1ubuntu4.7 to unofficial 2.4.58-1+ubuntu22.04.1+deb.sury.org+1 from this PPA: https://launchpad.net/~ondrej/+archive/ubuntu/apache2

We observe the same issue with 2.4.58 and MPM event.

% ps f -o user,pid,nice,%cpu,%mem,cputime,etime,thcount,command ax
USER PID NI %CPU %MEM TIME ELAPSED THCNT COMMAND
root 906764 0 0.0 0.1 00:00:00 30:30 1 /usr/sbin/apache2 -k start
www-data 906768 0 1.1 0.2 00:00:20 30:30 3 \_ /usr/sbin/apache2 -k start <-----
www-data 906771 0 1.5 0.2 00:00:27 30:30 3 \_ /usr/sbin/apache2 -k start <-----
www-data 906772 0 0.7 0.2 00:00:14 30:30 5 \_ /usr/sbin/apache2 -k start <-----
www-data 907681 0 0.0 0.0 00:00:00 15:47 1 \_ /usr/sbin/apache2 -k start
netto 907948 0 0.9 0.7 00:00:09 15:46 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907956 0 1.2 0.7 00:00:11 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907957 0 0.5 0.7 00:00:05 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907958 0 1.6 0.7 00:00:15 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907959 0 0.9 0.7 00:00:08 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907960 0 0.7 0.6 00:00:06 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907961 0 1.1 0.7 00:00:11 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907964 0 1.2 0.7 00:00:11 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907965 0 2.0 0.7 00:00:19 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907967 0 1.6 0.7 00:00:15 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907968 0 0.7 0.7 00:00:07 15:45 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907976 0 2.0 0.7 00:00:19 15:44 1 | \_ /usr/lib/cgi-bin/php7.4
netto 907979 0 0.1 0.6 00:00:01 15:42 1 | \_ /usr/lib/cgi-bin/php7.4
netto 908228 0 0.1 0.6 00:00:00 08:33 1 | \_ /usr/lib/cgi-bin/php7.4
netto 908230 0 0.0 0.6 00:00:00 08:33 1 | \_ /usr/lib/cgi-bin/php7.4
www-data 907683 0 0.8 0.3 00:00:07 15:46 52 \_ /usr/sbin/apache2 -k start
www-data 907684 0 1.0 0.3 00:00:10 15:46 52 \_ /usr/sbin/apache2 -k start
www-data 907685 0 0.9 0.3 00:00:08 15:46 52 \_ /usr/sbin/apache2 -k start
www-data 907687 0 2.3 0.4 00:00:21 15:46 52 \_ /usr/sbin/apache2 -k start
www-data 907688 0 1.7 0.4 00:00:16 15:46 52 \_ /usr/sbin/apache2 -k start

Apache was restarted 30 minutes and 30 seconds ago and reloaded
15 minutes and 46 seconds ago. The php child processes and the
apache2 child processes at the bottom of the process list have
been restarted after the graceful command - but the '<-----' marked
apache2 processes are still running.

#
## /server-status
#

Server Version: Apache/2.4.58 (Ubuntu) mod_fcgid/2.3.9 OpenSSL/3.0.2
Server MPM: event
Server Built: 2023-10-25T05:39:09

Current Time: Tuesday, 20-Feb-2024 10:44:55 CET
Restart Time: Tuesday, 20-F...

Revision history for this message
thermoman (thermoman) wrote : Re: [worker/event] Some child processes aren't killed on reload (graceful), being stuck in state "Sending Reply", leaves old logfiles open, disk filles up, server stops responding after some days

Workaround so far:

% cat kill_old_apache_processes.sh
#!/bin/sh

# see https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/2054301

wget -qO- http://localhost/server-status \
 | awk -F '</td><td>' '$3 ~ /^yes \(old gen\)$/ {print $2}' \
 | while read pid
 do
  echo "killing $pid"
  kill -9 $pid
  sleep 1
 done

This script will be started via cron every day at 2:00am, 2 hours after logrotate did run.
It will grep /server-status output and look for old processes to be killed.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thank you for the great bug report, thermoman.

My gut feeling here is that this problem is happening because "apache2ctl graceful" is restarting apache2 behind systemd's back. This is probably related to bug #1832182, although I haven't investigated it further.

Out of curiosity, what happens if you replace the "apache2ctl graceful" call with "systemctl restart apache2"?

I'm setting the bug as Triaged and tagging it as server-todo, which will put it in our queue where someone from the team will pick it up and work on it.

We will keep you posted.

Thanks.

Changed in apache2 (Ubuntu):
status: New → Triaged
tags: added: server-todo
Revision history for this message
thermoman (thermoman) wrote (last edit ):

Hi Sergio,

do you really mean "systemctl restart apache2"? Because this will disrupt service and stop the whole webserver and then start it again.

I guess you meant "systemctl reload apache2".

The issue is the same, since they all do the same:

logrotate calls "invoke-rc.d apache2 reload" which calls "/etc/init.d/apache2 reload" which calls "apache2ctl graceful":

% grep -A 6 -F 'do_reload()' /etc/init.d/apache2
do_reload() {
 if apache_conftest; then
         if ! pidofproc -p $PIDFILE "$DAEMON" > /dev/null 2>&1 ; then
                 APACHE2_INIT_MESSAGE="Apache2 is not running"
                 return 2
         fi
         $APACHE2CTL graceful > /dev/null 2>&1
-----------------------------------------------------------------

systemd does the same:

% cat /lib/systemd/system/apache2.service
...
[Service]
---
ExecStart=/usr/sbin/apachectl start
ExecStop=/usr/sbin/apachectl graceful-stop
ExecReload=/usr/sbin/apachectl graceful
-----------------------------------------------------------------

So it doesn't matter. You can even send SIGUSR1 to the apache2 father process which does the same:

AH00493: SIGUSR1 received. Doing graceful restart

summary: [worker/event] Some child processes aren't killed on reload (graceful),
- being stuck in state "Sending Reply", leaves old logfiles open, disk
- filles up, server stops responding after some days
+ being stuck in state "Sending Reply", leaving old logfiles open, disk
+ fills up, server stops responding after some days, DoS
Revision history for this message
Paride Legovini (paride) wrote :

Thanks for the additional info thermoman. I think this will be difficult to debug by looking at apache2 alone; one reason for that: we would be getting many more bug reports about this if this were a bug affecting simple/common apache2 configurations.

I have some questions that may give us some clues:

- If you disable logrotate, do you ever get apache2 processes stuck in "Sending Reply" state? What I suspect is that processes get stuck in that state independently of the graceful reload, but the reload somehow leads of accumulation of more processes.

- Given that you are using php7.4 I assume you are running Focal. Is it possible for you to try to reproduce the issue on a Jammy system? This may help us figuring out where the problem is by bisecting, and then deliver a Focal fix.

- My gut feeling is that PHP may be involved in this. Are you able to try to reproduce the issue with a different PHP version from https://launchpad.net/~ondrej/+archive/ubuntu/php ? (This is not the Ubuntu supported way, but again: we're looking for clues.)

Thanks!

Revision history for this message
thermoman (thermoman) wrote :
Download full text (5.0 KiB)

Hey Paride,

sorry for the confusion and not making this clear:

the System is latest LTS, 22.04, jammy with PHP7.4 packages from https://launchpad.net/~ondrej/+archive/ubuntu/php

I suppose the issue is with PHP, yes, because:

We have several other jammy machines with other projects running PHP 5.6 from ondrej with no issues.

We had one other jammy machine with another project running PHP 7.4 from ondrej and we had these issues with MPM worker, too. We switched that server to event and the issue disappeared. But the traffic on that other server is much much less.

So I guess this might be related to ondrey's PHP 7.4 packages.

> - If you disable logrotate, do you ever get apache2 processes stuck in "Sending Reply" state?

Nope. Without "graceful" there a never stuck apache2 processes. Only when a reload is triggered some processes don't get killed and become "old gen" and stuck.

- My gut feeling is that PHP may be involved in this. Are you able to try to reproduce the issue with a different PHP version from https://launchpad.net/~ondrej/+archive/ubuntu/php ?

We have jammy with ondrej PHP 7.4 and jammy with ondrej PHP 5.6, but only the 7.4 machines receive more than a little bit of traffic, nettolohn.de (where we see this issue) having a lot/moderate amount of traffic.

% apt-cache policy
Package files:
 100 /var/lib/dpkg/status
     release a=now
 500 https://dl.yarnpkg.com/debian stable/main all Packages
     release o=yarn,a=stable,n=stable,l=yarn-stable,c=main,b=all
     origin dl.yarnpkg.com
 500 https://dl.yarnpkg.com/debian stable/main amd64 Packages
     release o=yarn,a=stable,n=stable,l=yarn-stable,c=main,b=amd64
     origin dl.yarnpkg.com
 500 https://ppa.launchpadcontent.net/ondrej/php/ubuntu jammy/main amd64 Packages
     release v=22.04,o=LP-PPA-ondrej-php,a=jammy,n=jammy,l=***** The main PPA for supported PHP versions with many PECL extensions *****,c=main,b=amd64
     origin ppa.launchpadcontent.net
 500 https://deb.nodesource.com/node_16.x jammy/main amd64 Packages
     release o=Node Source,n=jammy,l=Node Source,c=main,b=amd64
     origin deb.nodesource.com
 500 http://security.ubuntu.com/ubuntu jammy-security/multiverse amd64 Packages
     release v=22.04,o=Ubuntu,a=jammy-security,n=jammy,l=Ubuntu,c=multiverse,b=amd64
     origin security.ubuntu.com
 500 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages
     release v=22.04,o=Ubuntu,a=jammy-security,n=jammy,l=Ubuntu,c=universe,b=amd64
     origin security.ubuntu.com
 500 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages
     release v=22.04,o=Ubuntu,a=jammy-security,n=jammy,l=Ubuntu,c=restricted,b=amd64
     origin security.ubuntu.com
 500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
     release v=22.04,o=Ubuntu,a=jammy-security,n=jammy,l=Ubuntu,c=main,b=amd64
     origin security.ubuntu.com
 100 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy-backports/universe amd64 Packages
     release v=22.04,o=Ubuntu,a=jammy-backports,n=jammy,l=Ubuntu,c=universe,b=amd64
     origin eu-central-1.ec2.archive.ubuntu.com
 100 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy-backports/main amd64 Pac...

Read more...

Revision history for this message
thermoman (thermoman) wrote :

I opened an issue at https://github.com/oerdnj/deb.sury.org/issues/2083 but I'm not 100% sure it's really PHP that's causing this issue.

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Thanks for the additional info!

Is there a reason you need to use ondrej's PHP package? I'd be curious if this could be reproduced using php 8.1.2-1ubuntu2.14 from the Ubuntu archives.

Revision history for this message
thermoman (thermoman) wrote :

Hi Mitchell,

the application - for now - needs PHP 7.4 but the dev says it should be possible to make it compatible with 8.1.

Will report back when it's running with 8.1 from the official Ubuntu jammy repositories.

Revision history for this message
thermoman (thermoman) wrote :
Download full text (12.4 KiB)

We upgraded the application to run with PHP 8.1 and just removed all packages from Ondrej's Repository and installed PHP 8.1 from the official jammy Repo.

Same issue: After reload of apache2 some apache2 processes remain in state "Sending Reply", netstat showing inbound HTTPS connections in state CLOSE_WAIT.

Will report back if the processes vanish as soon as the tcp connections are dropped.

% apt-cache policy $(dpkg -l | grep php | awk '{print $2}')
php-common:
  Installed: 2:92ubuntu1
  Candidate: 2:92ubuntu1
  Version table:
 *** 2:92ubuntu1 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy/main amd64 Packages
        100 /var/lib/dpkg/status
php-db:
  Installed: 1.10.0-1build4
  Candidate: 1.10.0-1build4
  Version table:
 *** 1.10.0-1build4 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy/universe amd64 Packages
        100 /var/lib/dpkg/status
php-pear:
  Installed: 1:1.10.12+submodules+notgz+20210212-1ubuntu3
  Candidate: 1:1.10.12+submodules+notgz+20210212-1ubuntu3
  Version table:
 *** 1:1.10.12+submodules+notgz+20210212-1ubuntu3 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy/main amd64 Packages
        100 /var/lib/dpkg/status
php8.1-apcu:
  Installed: 5.1.21+4.0.11-7ubuntu1
  Candidate: 5.1.21+4.0.11-7ubuntu1
  Version table:
 *** 5.1.21+4.0.11-7ubuntu1 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy/universe amd64 Packages
        100 /var/lib/dpkg/status
php8.1-bcmath:
  Installed: 8.1.2-1ubuntu2.14
  Candidate: 8.1.2-1ubuntu2.14
  Version table:
 *** 8.1.2-1ubuntu2.14 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages
        500 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages
        100 /var/lib/dpkg/status
     8.1.2-1ubuntu2 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy/universe amd64 Packages
php8.1-cgi:
  Installed: 8.1.2-1ubuntu2.14
  Candidate: 8.1.2-1ubuntu2.14
  Version table:
 *** 8.1.2-1ubuntu2.14 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
        100 /var/lib/dpkg/status
     8.1.2-1ubuntu2 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy/main amd64 Packages
php8.1-cli:
  Installed: 8.1.2-1ubuntu2.14
  Candidate: 8.1.2-1ubuntu2.14
  Version table:
 *** 8.1.2-1ubuntu2.14 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
        100 /var/lib/dpkg/status
     8.1.2-1ubuntu2 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy/main amd64 Packages
php8.1-common:
  Installed: 8.1.2-1ubuntu2.14
  Candidate: 8.1.2-1ubuntu2.14
  Version table:
 *** 8.1.2-1ubuntu2.14 500
        500 http://eu-central-1.ec2.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages
        100 /var/lib/dpkg/status
     8.1.2-1ubuntu2 500
        500 http://eu-ce...

Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

Hi thermoman,

Would you mind trying to reproduce this in a fresh Ubuntu jammy installation without the Ondrej's packages (instead of removing those)?

Could you provide a short reproducer (with minimal configs) for the case above so we can help assessing this one?

tags: removed: server-todo
Revision history for this message
ekomc (ekomc) wrote :

Hi,

Did you manage to solve this ?

We're currently in the process of upgrading all of our Debian 10 servers before the end of the month. Process is Debian 10 > Debian 11 > Debian 12.

Right now we're encountering the same-ish issue on one of our servers, the only difference on this one is the server being a wmware virtual machine.

At midnight logrotate reload the Apache service and all of our websites are down, depsite the fact that the Apache service is still running.

Manually calling a systemctl reload apache2 has the same effect.

If I restart the Apache service, everything is back to normal.

We're using ondrej/php repository as it is a mutuallized server and we need to have multiple PHP versions available but I don't think this is related since the 10 other machines we've upgraded so far are not having this issue and are using these repositories as well.

Revision history for this message
thermoman (thermoman) wrote :

Hi,

I just used my workaround script on the server that has this issue and put the issue aside in my head because of only me being affected - until now.

Just checked the cron mails and it seems it doesn't occur every day:

at 0:00 each day apache reload does happen via cron.daily
at 2:00 each day my script looks for old processes and kills them (and prints the PID which triggers cron mails)

I got

mail on 2024/05/22 through 2024/05/26
no mail on 2024/05/27 and /28
mail on 2024/05/29 through 2024/06/02
no mail on 2024/06/03 and /04
mail on 2024/06/05
no mail on 2024/06/06 through 2024/06/08

and so on.

So it doesn't happen each and every day and not on all machines running this setup.

I can't share code from the production server for developers to reproduce.

Revision history for this message
ekomc (ekomc) wrote (last edit ):

Thanks a lot for the reply.

It seem a bit different then, the "crash" we experienced occurred immediately after reloads.
The leftover processes seem to prevent it from completing properly and websites aren't accessible even if the Apache status reports the service as running.

I didn't find someone with this exact issue but every similar case I found pointed toward the mpm & php modules.

I tried migrating from mpm_prefork to mpm_event, disabled mod_php and switched to php-fpm.
So far, the issue seems solved. I'll keep an eye on it and report back if it happens again.

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Thanks for the discussion thermoman and ekomc.

thermoman, are your machines identically set up? Is it just a single machine that is experiencing this issue, or a few out of your fleet?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.