apache2+ssl hangs on high load
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| apache2 (Ubuntu) |
Medium
|
Unassigned |
Bug Description
Apache2 stops accepting connections when using mod_ssl and having more than 1000 processes running. This is only happening on ubuntu 12.04 and only with mod_ssl enabled.
Steps to reproduce:
- take a clean install of ubuntu 12.04 server 64bit (i use english installer and all standard settings)
- execute following commands as root:
$ apt-get update
$ apt-get upgrade
$ apt-get install apache2-mpm-prefork
- change /etc/apache2/
<IfModule mpm_prefork_module>
ServerLimit 1500
StartServers 1500
MinSpareServers 1400
MaxSpareServers 1500
MaxClients 1500
MaxRequests
</IfModule>
- enable mod_ssl and restart apache:
$ a2enmod ssl
$ service apache2 restart
[no further configuration changes requred,
i did not configure any ssl hosts,
only enabled the module]
- verify, that apache is running at least 1001 processes
$ ps ax | grep apache | wc -l
1502
- verify you can connect to localhost:
$ curl http://
<html><body><h1>It works!</h1>
<p>This is the default web page for this server.</p>
<p>The web server software is running but no content has been added, yet.</p>
</body></html>
- start high load:
$ ab -n 5000 -c 1000 http://
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://
Licensed to The Apache Software Foundation, http://
Benchmarking localhost (be patient)
Completed 500 requests
apr_poll: The timeout specified has expired (70007)
Total of 998 requests completed
- ready, now apache is not working properly:
$ curl -v http://
* About to connect() to localhost port 80 (#0)
* Trying 127.0.0.1... connected
> GET / HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-
> Host: localhost
> Accept: */*
>
..... silence
There are no errors to find in the logs. After restarting apache it will work for some time,
but continue crashing regurally, if you have some traffic coming to the server.
In my tests i sometimes had crashes even with very few users connecting to the servers.
For greater reproducibility however you will need this high connections number for ab.
This is reproducible, happening every time and i also tested this on 3 different machines.
This is specific to 12.04, as i have the same setup working properly on 11.10 and 12.10.
I'm aware of that 1000 Processes will consume a lot of ram. The machine that is supposed
to run this config has 32GB, so this should not be the problem here.
Notice:
- apache crashed only with mod_ssl enabled
- apache crashed only with >1000 processes: 1000 processes runs fine, 1001 will produce a crash
Additional information:
1) The release of Ubuntu you are using
$ lsb_release -rd
Description: Ubuntu 12.04 LTS
Release: 12.04
2) The version of the package you are using
$ apt-cache policy apache2-mpm-prefork
apache2-
Installed: 2.2.22-1ubuntu1
Candidate: 2.2.22-1ubuntu1
Version table:
*** 2.2.22-1ubuntu1 0
500 http://
100 /var/lib/
3) What you expected to happen
i expect apache to handle the 5000 requests as usual and continue accepting connections afterwards
4) What happened instead
apache handles only 1000 requests and stops accepting new connections at all, which is a disaster for any website running on the host
description: | updated |
Stefan Fritsch (sf-sfritsch) wrote : | #1 |
Evgeny Anisiforov (jeff-h670zbtsl) wrote : | #2 |
As i have written, this is a clean ubuntu install, i did not change any config files upon mentioned.
/etc/security/
I can't figure out how to determine the pid of apache child process that curl connects to. If you have an idea, please let me know.
Here is some output, that i get with gdb for a hanging apache:
main apache process (determined with pstree -p):
#0 0x00007f9b07745803 in select () from /lib/x86_
No symbol table info available.
#1 0x00007f9b07c630fd in apr_sleep () from /usr/lib/
No symbol table info available.
#2 0x00007f9b0853bc69 in ap_wait_or_timeout ()
No symbol table info available.
#3 0x00007f9b08548e79 in ap_mpm_run ()
No symbol table info available.
#4 0x00007f9b0851e4a4 in main ()
No symbol table info available.
some child process
(gdb) bt full
#0 0x00007f9b0774e5b7 in semop () from /lib/x86_
No symbol table info available.
#1 0x00007f9b07c4f68e in ?? () from /usr/lib/
No symbol table info available.
#2 0x00007f9b07c504a6 in apr_proc_mutex_lock () from /usr/lib/
No symbol table info available.
#3 0x00007f9b085480dd in ?? ()
No symbol table info available.
#4 0x00007f9b0854893a in ?? ()
No symbol table info available.
#5 0x00007f9b085489f7 in ?? ()
No symbol table info available.
#6 0x00007f9b08549374 in ap_mpm_run ()
No symbol table info available.
#7 0x00007f9b0851e4a4 in main ()
No symbol table info available.
This is basically the same as for the working instance before the crash. The output changes, if i disable mod_ssl.
Any ideas?
Changed in apache2 (Ubuntu): | |
status: | New → Confirmed |
importance: | Undecided → Medium |
Clint Byrum (clint-fewbar) wrote : | #3 |
This appears to be legitimate, I was able to reproduce it on an HP cloud instance with the given parameters. The first 1000 actual requests always finish, but after that all fail.
I notice these kernel messages:
[ 1131.976324] TCP: Possible SYN flooding on port 80. Dropping request. Check SNMP counters.
But I don't think it is related.
I see this as well *sometimes*:
[Wed Jul 25 20:20:10 2012] [error] server reached MaxClients setting, consider raising the MaxClients setting
But MaxClients is set to 1500 so I'm not sure what that is.
The one difference mod_ssl would introduce would be the use of shared memory for statistical gathering. So maybe the stats are running into a shm limit.
I tried raising shmall to 4194304, but that just slowed things down a bit, it still fails right at 1000. I also tried raising shmmni to 8192, and that did nothing. Same for doubling shmmax.
On comparing strace's with and without mod_ssl enabled, the problem most likely lies with shared memory or semaphore opertaions, which only seem to be happening with mod_ssl. I also tried adjusting the numbers in /proc/sys/
Also its worth noting that 1000 processes is inefficient for more reasons than just memory. Context switching at the process level will be far more expensive than a threaded model. For that reason alone I've set this to "Medium", as its really just not a great way to configure apache.
tags: | added: precise |
Evgeny Anisiforov (jeff-h670zbtsl) wrote : | #4 |
I could verify getting the same log messages on my system.
This however seems to be not directly related to the problem. I see the same messages, when testing it on Ubuntu 11.10. But the apache remains stable and repsonsive on this older version of ubuntu.
I have found some interesting behavior, when using the -k switch with ab to ensure there are really only the specified amount of processes used and no process is busy waiting for the connection to close:
$ ab -k -n 5000 -c 999 http://
.... all requests will complete without error
$ curl http://
<html><body><h1>It works!</h1>
<p>This is the default web page for this server.</p>
<p>The web server software is running but no content has been added, yet.</p>
</body></html>
$ curl http://
... no answer
so here you can literally see how the connection #1001 (the "magic number", that appeared before) is breaking apache. i think maybe its some kind of buffer running full?
Evgeny Anisiforov (jeff-h670zbtsl) wrote : | #5 |
I also agree with you, that this feature has medium importance. I think our install of apache is not very common. We are running mod_php and as a consequence relying on the prefork modell, because php is not thread safe. Also please notice that the presented config was changed to maximize reproducibility. On our production system we have about 700 processes running the most of the time, but on peak traffic this number rises to 1000-1200 producing the hanging that i have described.
So while systems with such load may be uncommon, the reported problem is still existing in the real world. Thanks for paying attention to it!
Stefan Fritsch (sf-sfritsch) wrote : | #6 |
Evgeny, you can use "netstat -tnp |grep curl " to get the other port number of the connection from curl to apache2. With that, you can look for the other end of the connection in "netstat -tnp" output. The last column should give be "123/apache2" where 123 is the pid of the apache2 process. You will have to execute netstat -tnp as root to get the info.
The backtrace of the child process you posted looks more like a process that is waiting for a connection. But one would need the debug info installed to be absolutely sure.
Clint Byrum (clint-fewbar) wrote : | #7 |
One thing to consider as a workaround to this is to use php5-fpm for per-user PHP, running a daemon per user. This has the additional benefit of being able to limit each user's memory usage individually. You can then switch to apache worker, which I'm sure does not have this issue. This should also be quite a bit more memory efficient as static files will be served from the apache threads rather than all 1000+ processes.
Evgeny Anisiforov (jeff-h670zbtsl) wrote : | #8 |
Unfortunately i do not get any PID with this method. The other end of the connection is simply "-", not associated with any apache2 process:
root@ubuntu:
tcp 0 161 127.0.0.1:33399 127.0.0.1:80 ESTABLISHED 347/curl
root@ubuntu:
tcp 0 0 127.0.0.1:80 127.0.0.1:33399 SYN_RECV -
tcp 0 161 127.0.0.1:33399 127.0.0.1:80 ESTABLISHED 347/curl
I have tried capturing the http traffic to get some insight: tcpdump -p -s0 -w dump.cap -i lo port 80
This is the dump of the last curl request. Wireshark shows me multiple tcp retransmits, but no reply from the server. So it may be something on the tcp level, that is going wrong. Could someone with a deeper understanding of tcp take a look on the dump?
If someone is interested, i can provide a downloadable virtual appliance running ubuntu 12.04 with the reported bug (.OVF) from my virtualbox for debugging purposes.
I have made some further experiments:
the bug is still occuring with the latest updates.
I have traced the problem down to the main apache functions. It is not a mod_ssl issue!
What is actually happening, is that due to "a2enmod ssl", the server have to listen on two ports: 80 and 443. This activates AcceptMutex to synchronize running processes. And here some problem occures under Ubuntu 12.04
So an alternative way to reproduce is:
$ a2dismod ssl
$ echo "Listen 81" > /etc/apache2/
$ service apache2 restart
$ ab -c 1000 -n 5000 http://
Changing the "AcceptMutex" config option causes different results. For example with:
AcceptMutex posixsem
i get all 5000 requests done without error. Then trying to run ab again causes apache to stop accepting requests.
description: | updated |
The issue is in part kernel related
this commit (introduced in 3.2.9) http://
This commit (3.2.17) http://
Apache shouldn't hang but updating your kernel will solve the issue.
I've opened an apache ticket (https:/
I can confirm, that upgrading the kernel to the most current version solves the problem:
$ sudo apt-get dist-upgrade
Thank you!
I cannot reproduce this on Debian unstable with either 2.2.22-9 or 2.2.22-1.
Wild guess: Do you have a per-user process limit configured in /etc/security/ limits. conf ?
If no, it would be helpful if you could provide a backtrace of the process that curl connects to and hangs. There is some documentation about how to do that in doc/apache2. 2-common/ README. backtrace. But the doc is for Debian. For Ubuntu, the installing of the debugging symbols works differently (maybe someone else can provide a pointer).
/usr/share/