Apache-2.4.27 massive slow down for files >8192 bytes long

Bug #1704168 reported by David Favor on 2017-07-13
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Apache2 Web Server
Fix Released
apache2 (Ubuntu)

Bug Description

Tracking this down took forever + I believe I now have a test case.

Top priority, is finding a temporary fix for this problem.

Here's the symptom...

lxd: net12-david-favor # ab -vvv -k -t 10 -n 100000 -c 5 https://DavidFavor.com/foo.txt 2>&1 | egrep -e ^Requests -e ^Failed -e ^Time -e "^Document Length:"
Document Length: 8192 bytes
Time taken for tests: 8.540 seconds
Failed requests: 0
Requests per second: 11710.24 [#/sec] (mean)
Time per request: 0.427 [ms] (mean)
Time per request: 0.085 [ms] (mean, across all concurrent requests)

ab -vvv -k -t 10 -n 100000 -c 5 https://DavidFavor.com/foo.txt 2>&1 | egrep -e ^Requests -e ^Failed -e ^Time -e "^Document Length:"
Document Length: 8193 bytes
Time taken for tests: 10.087 seconds
Failed requests: 5
Requests per second: 1.19 [#/sec] (mean)
Time per request: 4202.710 [ms] (mean)
Time per request: 840.542 [ms] (mean, across all concurrent requests)

First run 1-8192 byte files work great. They serve fast with 0 failed requests.

Second run of 8193-... byte files slow to a crawl + produce request failures + notice the "Time taken for tests", which seems to relate to socket housekeeping tests, becomes very slow.

This seems to be an actual apache problem, rather than an ab problem, as all sites I host seem to slow down, when Apache2-4.27 is installed.

I'll setup a testbed system running Zesty, for which I can provide a login, for debugging, if required.

lxd: net12-david-favor # uname -a
Linux net12-david-favor 4.8.0-30-generic #32-Ubuntu SMP Fri Dec 2 03:43:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

lxd: net12-david-favor # lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.10
Release: 16.10
Codename: yakkety

lxd: net12-david-favor # apt-cache policy apache2
  Installed: 2.4.27-0.1+ubuntu16.10.1+deb.sury.org+1
  Candidate: 2.4.27-0.1+ubuntu16.10.1+deb.sury.org+1
  Version table:
 *** 2.4.27-0.1+ubuntu16.10.1+deb.sury.org+1 500
        500 http://ppa.launchpad.net/ondrej/apache2/ubuntu yakkety/main amd64 Packages
        100 /var/lib/dpkg/status
     2.4.18-2ubuntu4.2 500
        500 http://archive.ubuntu.com/ubuntu yakkety-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu yakkety-security/main amd64 Packages
     2.4.18-2ubuntu4 500
        500 http://archive.ubuntu.com/ubuntu yakkety/main amd64 Packages

David Favor (davidfavor) wrote :

After a bit more testing, problem seems to be with ab.

Just compiled up https://curl.haxx.se/libcurl/c/chkspeed.html which produces this...

lxd: net12-david-favor # ./chkspeed https://DavidFavor.com/foo-8192.txt
Data downloaded: 8192 bytes.
Total download time: 0.089 sec.
Average download speed: 90.153 kbyte/sec.

lxd: net12-david-favor # ./chkspeed https://DavidFavor.com/foo-8193.txt
Data downloaded: 8193 bytes.
Total download time: 0.089 sec.
Average download speed: 89.793 kbyte/sec.

So same time for either 8192 or 8193 byte files, so looks like Apache-2.4.27 is working... whew...

Just ab is having some sort of problem.

Hi David,
I can at least confirm the issue on ab.
And the stress can be reduced a lot for better debugg-ability:

$ ab -k -n 1 -c 1 https://DavidFavor.com/foo-8192.txt | grep '^Time per request.*mean)$'
Time per request: 397.659 [ms] (mean)
$ ab -k -n 1 -c 1 https://DavidFavor.com/foo-8193.txt | grep '^Time per request.*mean)$'
Time per request: 5400.170 [ms] (mean)

So one byte makes the request x13 longer.

But as one single request shows the same issue, we can compare much easier:
$ time wget https://DavidFavor.com/foo-8192.txt --quiet
real 0m0.416s
$ time wget https://DavidFavor.com/foo-8193.txt --quiet
real 0m0.416s

So yeah, something in ab most likely.

I checked that with the version in Artful to not face a known issue - but it still is occuring there.

Checking with "strace -rT" reveals that it is just about as fast with 8193 than 8192.
Except there is a last final call on epoll_wait with 5 seconds.

That together with many definitions in ab being on 8192 I'd assume it fails to understand that it is done by exceeding some buffer and then opens up a new epoll which is on a 5 second timeout.

That is clearly an upstream problem (or they consider >8192 not supported).
Would you mind reporting there and posting the bug reference here then?
That would be very kind.
In terms of an "ubuntu bug" this is "confirmed" but not really actionable.

Thanks btw for creating an online setup one can reuse to test this!

Changed in apache2 (Ubuntu):
status: New → Confirmed
David Favor (davidfavor) wrote :

This verifies what I was seeing.

It appears all's well with >8192 bytes until ab gets into the "Time taken for tests" which appears to be doing socket housekeeping, then this testing seems to add roughly 5 secs/request.

I'll try again + open an upstream bug. I was having problems last time I attempted this.

David Favor (davidfavor) wrote :
Changed in apache2:
importance: Unknown → Medium
status: Unknown → Confirmed
Changed in apache2:
status: Confirmed → Fix Released

Fixed upstream now and even backported to 2.4 series.
Fortunately assumptions were confirmed and this was only a fix in ab that was needed (not the server) which lowers severity a lot IMHO.

Fix for 2.4 is: http://svn.apache.org/viewvc?view=revision&revision=1814468
This will be in 2.4.30 and I'd think given that it is only the benchmark that should be enough - as it will be fixed whenever we merge it.

If you think this should be SRU'ed [1] into older releases please let me know and help me to make a good case for it - e.g. fill the SRU template you can find in the link and add it here to the bug description. In general the patch seems to apply with a lot of offsets but no fuzz - so it could be rather smooth. For SRU we would need the fix in X/Z/A/B releases.

[1]: https://wiki.ubuntu.com/StableReleaseUpdates

Changed in apache2 (Ubuntu):
status: Confirmed → Triaged

Also thank you a lot for driving this further on the upstream bug David!

Cosmic is on 2.4.33 so this is resolved.

Changed in apache2 (Ubuntu):
status: Triaged → Fix Released
David Favor (davidfavor) wrote :

I can verify this is fixed in Apache 2.4.33 as I just tested this version.


To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.