Slow memory leak, seen on two machines, appears to be dupe of 224945 even after -updates

Bug #422138 reported by Joe McDonagh on 2009-08-31
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
apache2 (Ubuntu)
Medium
Chuck Short

Bug Description

Binary package hint: apache2

1. Ubuntu Release Info:

Description: Ubuntu 8.04.3 LTS
Release: 8.04

2. Package Information:

apache2-mpm-prefork:
  Installed: 2.2.8-1ubuntu0.11
  Candidate: 2.2.8-1ubuntu0.11
  Version table:
 *** 2.2.8-1ubuntu0.11 0
        500 http://aptproxy hardy-security/main Packages
        500 http://aptproxy hardy-updates/main Packages
        100 /var/lib/dpkg/status
     2.2.8-1 0
        500 http://aptproxy hardy/main Packages

3. I expect apache to not chew up to a couple gigs per process when it runs for a long time.

4. Apache chews threw a couple gigs per process when it runs for a long time

Enabled modules:

alias.conf
alias.load
auth_basic.load
authn_file.load
authz_default.load
authz_groupfile.load
authz_host.load
authz_user.load
autoindex.conf
autoindex.load
cgi.load
deflate.conf
deflate.load
dir.conf
dir.load
env.load
mime.conf
mime.load
negotiation.conf
negotiation.load
php5.conf
php5.load
rewrite.load
setenvif.conf
setenvif.load
ssl.conf
ssl.load
status.conf
status.load

This DOES appear to be a dupe of 224945 (reproducible with ab), however as you can see I am on the hardy-updates version. Is there something obvious I am missing here?

Chuck Short (zulcss) wrote :

Interesting,

Can you specify the ab command that you used to reproduce this?

Thanks
chuck

Changed in apache2 (Ubuntu):
assignee: nobody → Chuck Short (zulcss)
status: New → Incomplete
Download full text (4.6 KiB)

Hey Chuck, thanks for the response. The command is:

ab -n 2000000 -c 100 -f TLS11 https://webserver/

At ten seconds:

30253 www-data 20 0 254m 75m 4432 S 31 0.5 0:03.50 apache2
30252 www-data 20 0 251m 72m 4464 S 30 0.4 0:03.34 apache2
30250 www-data 20 0 250m 70m 4420 S 29 0.4 0:03.26 apache2
30251 www-data 20 0 250m 70m 4448 S 29 0.4 0:03.28 apache2
30249 www-data 20 0 249m 70m 4444 S 28 0.4 0:03.20 apache2
30331 www-data 20 0 244m 65m 4404 R 30 0.4 0:02.96 apache2
30369 www-data 20 0 233m 53m 4432 R 24 0.3 0:02.36 apache2
30410 www-data 20 0 231m 51m 4348 S 22 0.3 0:02.18 apache2
30458 www-data 20 0 224m 45m 4372 S 19 0.3 0:01.86 apache2
30506 www-data 20 0 217m 38m 4324 R 15 0.2 0:01.50 apache2
30557 www-data 20 0 212m 32m 4356 R 12 0.2 0:01.20 apache2
30620 www-data 20 0 207m 27m 4356 S 9 0.2 0:00.92 apache2

This is a significant jump in memory allocation, and children get spawned. I expect the children per my configuration, but the memory use is obscene. At three minutes (stopped ab at this point)

30253 www-data 20 0 366m 187m 4636 S 3 1.2 0:09.52 apache2
30252 www-data 20 0 364m 185m 4656 S 3 1.2 0:09.46 apache2
30251 www-data 20 0 363m 184m 4668 S 3 1.1 0:09.40 apache2
30250 www-data 20 0 361m 182m 4632 S 3 1.1 0:09.30 apache2
30249 www-data 20 0 359m 179m 4676 S 3 1.1 0:09.14 apache2
30331 www-data 20 0 355m 176m 4668 S 3 1.1 0:08.94 apache2
30369 www-data 20 0 343m 164m 4636 S 3 1.0 0:08.34 apache2
30410 www-data 20 0 343m 163m 4628 S 3 1.0 0:08.26 apache2
30458 www-data 20 0 337m 158m 4684 S 3 1.0 0:07.96 apache2
30557 www-data 20 0 320m 141m 4608 R 3 0.9 0:07.08 apache2
30620 www-data 20 0 320m 140m 4676 S 3 0.9 0:07.06 apache2
30675 www-data 20 0 315m 135m 4620 S 3 0.8 0:06.74 apache2
30731 www-data 20 0 311m 131m 4624 S 3 0.8 0:06.54 apache2
30841 www-data 20 0 300m 121m 4656 S 3 0.8 0:05.96 apache2
30901 www-data 20 0 295m 116m 4608 S 3 0.7 0:05.70 apache2
30957 www-data 20 0 294m 114m 4624 S 3 0.7 0:05.62 apache2

Now ab is not running, but my stopwatch program is. Now at 5 minutes:

30458 www-data 20 0 341m 162m 4692 S 0 1.0 0:08.16 apache2
31126 www-data 20 0 290m 111m 4596 S 0 0.7 0:05.42 apache2
31112 www-data 20 0 288m 109m 4604 S 0 0.7 0:05.30 apache2
31210 www-data 20 0 287m 108m 4612 S 0 0.7 0:05.24 apache2
31257 www-data 20 0 287m 107m 4600 S 0 0.7 0:05.24 apache2
31222 www-data 20 0 286m 107m 4636 S 0 0.7 0:05.20 apache2
31410 www-data 20 0 285m 105m 4588 S 0 0.7 0:05....

Read more...

Sorry forgot to attach the output of AB after I killed it:

ab -n 2000000 -c 100 -f TLS11 https://scan-stag.osdc/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking scan-stag.osdc (be patient)
^C

Server Software: Apache
Server Hostname: scan-stag.osdc
Server Port: 443
SSL/TLS Protocol: TLSv1/SSLv3,DHE-RSA-AES256-SHA,4096,256

Document Path: /
Document Length: 0 bytes

Concurrency Level: 100
Time taken for tests: 185.122 seconds
Complete requests: 9751
Failed requests: 0
Write errors: 0
Non-2xx responses: 9751
Total transferred: 7906663 bytes
HTML transferred: 0 bytes
Requests per second: 52.67 [#/sec] (mean)
Time per request: 1898.496 [ms] (mean)
Time per request: 18.985 [ms] (mean, across all concurrent requests)
Transfer rate: 41.71 [Kbytes/sec] received

Connection Times (ms)
              min mean[+/-sd] median max
Connect: 88 319 487.6 156 3229
Processing: 20 1571 446.0 1700 2235
Waiting: 20 1570 445.9 1699 2235
Total: 140 1889 202.1 1874 5089

Percentage of the requests served within a certain time (ms)
  50% 1874
  66% 1916
  75% 1948
  80% 1972
  90% 2050
  95% 2144
  98% 2293
  99% 2408
 100% 5089 (longest request)

Chuck, is there anything else you need from me to promote this from incomplete?

This took down a box today, had to visit the DC for it... is there any more information-gathering I can do for you or myself here?

Stefan Fritsch (sf-sfritsch) wrote :

Is the parameter -f TLS1 necessary to reproduce the problem?

Is the URL / of your webserver a php page? If yes, why is the content length of the page 0, what does the php script do? If not php, what is it? A simple redirect?

A workaround may be to add MaxRequestsPerChild 1000 (or an even lower value) to your configuration.

Hi Stefan, thanks for the response. My responses are in-line.
> Is the parameter -f TLS1 necessary to reproduce the problem?
>
No, same behavior. Long-running children just never free up the memory
as long as I am hitting the SSL port, seemingly regardless if I pass an
algorithm to AB or not. I see radically different (as in this time,
normal) behavior if I hit the non-SSL port.
> Is the URL / of your webserver a php page? If yes, why is the content
> length of the page 0, what does the php script do? If not php, what is
> it? A simple redirect?
>
It is the index.php for the CodeIgniter framework, the redirect is
somewhat contrived, as it's handled by some internal CodeIgniter stuff
that I am not very familiar with. I don't think PHP is involved at all
in the leak, since the non-HTTPS port works fine.
> A workaround may be to add MaxRequestsPerChild 1000 (or an even lower
> value) to your configuration.
>
>
That wouldn't really help since long-running children just hold onto
memory no matter what it seems.

I feel like this issue would be huge and brought up by someone else by
now if it's an actual regression. Does the package information I put in
the beginning look right?

--
Joe McDonagh
Operations Engineer
www.colonfail.com

Stefan Fritsch (sf-sfritsch) wrote :

> > A workaround may be to add MaxRequestsPerChild 1000 (or an even lower
> > value) to your configuration.
> >
> >
> That wouldn't really help since long-running children just hold onto
> memory no matter what it seems.

A lower MaxRequestsPerChild should make the children terminate earlier.

> Does the package information I put in the beginning look right?

I think so.

Do you use SSLSessionCache dbm? If yes, try switching to shmcb. I seem to remeber a known memory leak in dbm.

You can also try if disabling mod_deflate helps.

Stefan, thanks for re-iterating the MaxRequestsPerChild tidbit, I had forgotten that option was for lifetime, not concurrency. That'll probably help workaround momentarily.

Also, I am not currently doing any tweaking wrt the SSLSessionCache, I recall messing with it when I first started seeing this problem, but had removed the configuration options since it didn't seem to help much.

I would just like to see some other pair of eyes confirm that a default install with SSL still leaks. This is apparent because when I run the same tests against the HTTP only port things look normal.

Chuck Short (zulcss) on 2009-10-06
Changed in apache2 (Ubuntu):
importance: Undecided → Low

Chuck Short wrote:
> ** Changed in: apache2 (Ubuntu)
> Importance: Undecided => Low
>
>

Chuck, was wondering, should this be marked Low from Undecided if it
hasn't been confirmed nor denied by anyone?

--
Joe McDonagh
Operations Engineer
www.colonfail.com

Chuck Short (zulcss) wrote :

@Joe:

Its marked Low so our qa scripts pick it up. Ill take a look at this again after karmic has been released.

Regards
chuck

Chuck Short wrote:
> @Joe:
>
> Its marked Low so our qa scripts pick it up. Ill take a look at this
> again after karmic has been released.
>
> Regards
> chuck
>
>
Ah, ok, cool. Thanks Chuck.

Chuck Short (zulcss) wrote :

@Joe,

have you tried the configuratoin change as suggested?

Thanks
chuck

Chuck Short wrote:
> @Joe,
>
> have you tried the configuratoin change as suggested?
>
> Thanks
> chuck
>
>
Hey Chuck, yea to the extreme I think it's at maybe 1 or 100 for
MaxRequestsPerChild, but it seems to resemble the old bug on the
surface. Either way it's quieted down for me, thanks for your attention
to the issue.

--
Joe McDonagh
Operations Engineer
Silent Penguin Services
AIM: YoosingYoonickz
IRC: joe-mac on freenode
www.colonfail.com

Chuck Short (zulcss) on 2010-01-25
Changed in apache2 (Ubuntu):
assignee: Chuck Short (zulcss) → nobody
assignee: nobody → Chuck Short (zulcss)
importance: Low → Medium
status: Incomplete → Triaged

Chuck, you're thinking this is that Debian bug that had an advisory last week right?

Chuck Short (zulcss) wrote :

probably

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers