Ubuntu
apache2 package

Slow memory leak, seen on two machines, appears to be dupe of 224945 even after -updates

Bug #422138 reported by Joe McDonagh on 2009-08-31

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	apache2 (Ubuntu)	Triaged	Medium	Chuck Short

Bug Description

Binary package hint: apache2

1. Ubuntu Release Info:

Description: Ubuntu 8.04.3 LTS
Release: 8.04

2. Package Information:

apache2-mpm-prefork:
  Installed: 2.2.8-1ubuntu0.11
  Candidate: 2.2.8-1ubuntu0.11
  Version table:
*** 2.2.8-1ubuntu0.11 0
        500 http://aptproxy hardy-security/main Packages
        500 http://aptproxy hardy-updates/main Packages
        100 /var/lib/dpkg/status
     2.2.8-1 0
        500 http://aptproxy hardy/main Packages

3. I expect apache to not chew up to a couple gigs per process when it runs for a long time.

4. Apache chews threw a couple gigs per process when it runs for a long time

Enabled modules:

alias.conf
alias.load
auth_basic.load
authn_file.load
authz_default.load
authz_groupfile.load
authz_host.load
authz_user.load
autoindex.conf
autoindex.load
cgi.load
deflate.conf
deflate.load
dir.conf
dir.load
env.load
mime.conf
mime.load
negotiation.conf
negotiation.load
php5.conf
php5.load
rewrite.load
setenvif.conf
setenvif.load
ssl.conf
ssl.load
status.conf
status.load

This DOES appear to be a dupe of 224945 (reproducible with ab), however as you can see I am on the hardy-updates version. Is there something obvious I am missing here?

Revision history for this message

Chuck Short (zulcss) wrote on 2009-09-09:

Interesting,

Can you specify the ab command that you used to reproduce this?

Thanks
chuck

Changed in apache2 (Ubuntu):
assignee:	nobody → Chuck Short (zulcss)
status:	New → Incomplete

Revision history for this message

Joe McDonagh (joseph-e-mcdonagh) wrote on 2009-09-09:

Download full text (4.6 KiB)

Hey Chuck, thanks for the response. The command is:

ab -n 2000000 -c 100 -f TLS11 https://webserver/

At ten seconds:

30253 www-data 20 0 254m 75m 4432 S 31 0.5 0:03.50 apache2
30252 www-data 20 0 251m 72m 4464 S 30 0.4 0:03.34 apache2
30250 www-data 20 0 250m 70m 4420 S 29 0.4 0:03.26 apache2
30251 www-data 20 0 250m 70m 4448 S 29 0.4 0:03.28 apache2
30249 www-data 20 0 249m 70m 4444 S 28 0.4 0:03.20 apache2
30331 www-data 20 0 244m 65m 4404 R 30 0.4 0:02.96 apache2
30369 www-data 20 0 233m 53m 4432 R 24 0.3 0:02.36 apache2
30410 www-data 20 0 231m 51m 4348 S 22 0.3 0:02.18 apache2
30458 www-data 20 0 224m 45m 4372 S 19 0.3 0:01.86 apache2
30506 www-data 20 0 217m 38m 4324 R 15 0.2 0:01.50 apache2
30557 www-data 20 0 212m 32m 4356 R 12 0.2 0:01.20 apache2
30620 www-data 20 0 207m 27m 4356 S 9 0.2 0:00.92 apache2

This is a significant jump in memory allocation, and children get spawned. I expect the children per my configuration, but the memory use is obscene. At three minutes (stopped ab at this point)

30253 www-data 20 0 366m 187m 4636 S 3 1.2 0:09.52 apache2
30252 www-data 20 0 364m 185m 4656 S 3 1.2 0:09.46 apache2
30251 www-data 20 0 363m 184m 4668 S 3 1.1 0:09.40 apache2
30250 www-data 20 0 361m 182m 4632 S 3 1.1 0:09.30 apache2
30249 www-data 20 0 359m 179m 4676 S 3 1.1 0:09.14 apache2
30331 www-data 20 0 355m 176m 4668 S 3 1.1 0:08.94 apache2
30369 www-data 20 0 343m 164m 4636 S 3 1.0 0:08.34 apache2
30410 www-data 20 0 343m 163m 4628 S 3 1.0 0:08.26 apache2
30458 www-data 20 0 337m 158m 4684 S 3 1.0 0:07.96 apache2
30557 www-data 20 0 320m 141m 4608 R 3 0.9 0:07.08 apache2
30620 www-data 20 0 320m 140m 4676 S 3 0.9 0:07.06 apache2
30675 www-data 20 0 315m 135m 4620 S 3 0.8 0:06.74 apache2
30731 www-data 20 0 311m 131m 4624 S 3 0.8 0:06.54 apache2
30841 www-data 20 0 300m 121m 4656 S 3 0.8 0:05.96 apache2
30901 www-data 20 0 295m 116m 4608 S 3 0.7 0:05.70 apache2
30957 www-data 20 0 294m 114m 4624 S 3 0.7 0:05.62 apache2

Now ab is not running, but my stopwatch program is. Now at 5 minutes:

Hey Chuck, thanks for the response. The command is:

ab -n 2000000 -c 100 -f TLS11 https://webserver/

At ten seconds:

30253 www-data  20   0  254m  75m 4432 S   31  0.5   0:03.50 apache2            
30252 www-data  20   0  251m  72m 4464 S   30  0.4   0:03.34 apache2            
30250 www-data  20   0  250m  70m 4420 S   29  0.4   0:03.26 apache2            
30251 www-data  20   0  250m  70m 4448 S   29  0.4   0:03.28 apache2            
30249 www-data  20   0  249m  70m 4444 S   28  0.4   0:03.20 apache2            
30331 www-data  20   0  244m  65m 4404 R   30  0.4   0:02.96 apache2            
30369 www-data  20   0  233m  53m 4432 R   24  0.3   0:02.36 apache2            
30410 www-data  20   0  231m  51m 4348 S   22  0.3   0:02.18 apache2            
30458 www-data  20   0  224m  45m 4372 S   19  0.3   0:01.86 apache2            
30506 www-data  20   0  217m  38m 4324 R   15  0.2   0:01.50 apache2            
30557 www-data  20   0  212m  32m 4356 R   12  0.2   0:01.20 apache2            
30620 www-data  20   0  207m  27m 4356 S    9  0.2   0:00.92 apache2

This is a significant jump in memory allocation, and children get spawned. I expect the children per my configuration, but the memory use is obscene. At three minutes (stopped ab at this point)

30253 www-data  20   0  366m 187m 4636 S    3  1.2   0:09.52 apache2            
30252 www-data  20   0  364m 185m 4656 S    3  1.2   0:09.46 apache2            
30251 www-data  20   0  363m 184m 4668 S    3  1.1   0:09.40 apache2            
30250 www-data  20   0  361m 182m 4632 S    3  1.1   0:09.30 apache2            
30249 www-data  20   0  359m 179m 4676 S    3  1.1   0:09.14 apache2            
30331 www-data  20   0  355m 176m 4668 S    3  1.1   0:08.94 apache2            
30369 www-data  20   0  343m 164m 4636 S    3  1.0   0:08.34 apache2            
30410 www-data  20   0  343m 163m 4628 S    3  1.0   0:08.26 apache2            
30458 www-data  20   0  337m 158m 4684 S    3  1.0   0:07.96 apache2            
30557 www-data  20   0  320m 141m 4608 R    3  0.9   0:07.08 apache2            
30620 www-data  20   0  320m 140m 4676 S    3  0.9   0:07.06 apache2            
30675 www-data  20   0  315m 135m 4620 S    3  0.8   0:06.74 apache2            
30731 www-data  20   0  311m 131m 4624 S    3  0.8   0:06.54 apache2            
30841 www-data  20   0  300m 121m 4656 S    3  0.8   0:05.96 apache2            
30901 www-data  20   0  295m 116m 4608 S    3  0.7   0:05.70 apache2            
30957 www-data  20   0  294m 114m 4624 S    3  0.7   0:05.62 apache2

Now ab is not running, but my stopwatch program is. Now at 5 minutes:

30458 www-data  20   0  341m 162m 4692 S    0  1.0   0:08.16 apache2            
31126 www-data  20   0  290m 111m 4596 S    0  0.7   0:05.42 apache2            
31112 www-data  20   0  288m 109m 4604 S    0  0.7   0:05.30 apache2            
31210 www-data  20   0  287m 108m 4612 S    0  0.7   0:05.24 apache2            
31257 www-data  20   0  287m 107m 4600 S    0  0.7   0:05.24 apache2            
31222 www-data  20   0  286m 107m 4636 S    0  0.7   0:05.20 apache2            
31410 www-data  20   0  285m 105m 4588 S    0  0.7   0:05.10 apache2            
31412 www-data  20   0  284m 105m 4656 S    0  0.7   0:05.12 apache2            
31396 www-data  20   0  284m 105m 4632 S    0  0.7   0:05.12 apache2

Note that although some threads do disappear (normal, expect) the ones that stay never free up any RAM. If I were to let this just chill by itself for a few days, res would be a gig or so and virt would be up there.

And for good measure, ten minutes:

30458 www-data  20   0  341m 162m 4692 S    0  1.0   0:08.22 apache2            
31126 www-data  20   0  291m 111m 4604 S    0  0.7   0:05.48 apache2            
31112 www-data  20   0  290m 110m 4620 S    0  0.7   0:05.42 apache2            
31222 www-data  20   0  288m 109m 4652 S    0  0.7   0:05.34 apache2            
31210 www-data  20   0  288m 109m 4616 S    0  0.7   0:05.36 apache2            
31257 www-data  20   0  287m 108m 4604 S    0  0.7   0:05.30 apache2            
31410 www-data  20   0  286m 107m 4604 S    0  0.7   0:05.22 apache2            
31396 www-data  20   0  286m 106m 4636 S    0  0.7   0:05.24 apache2            
31384 www-data  20   0  286m 106m 4616 S    0  0.7   0:05.22 apache2            
31412 www-data  20   0  285m 106m 4660 S    0  0.7   0:05.18 apache2

The server isn't being used much right now, but as you can see, no memory has really been free'd. Right now I am working around this by reloading apache at 11:59 every night (the saddest cron job on earth), which appears to bring memory usage back down.

Revision history for this message

Joe McDonagh (joseph-e-mcdonagh) wrote on 2009-09-09:

Sorry forgot to attach the output of AB after I killed it:

ab -n 2000000 -c 100 -f TLS11 https://scan-stag.osdc/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking scan-stag.osdc (be patient)
^C

Server Software: Apache
Server Hostname: scan-stag.osdc
Server Port: 443
SSL/TLS Protocol: TLSv1/SSLv3,DHE-RSA-AES256-SHA,4096,256

Document Path: /
Document Length: 0 bytes

Concurrency Level: 100
Time taken for tests: 185.122 seconds
Complete requests: 9751
Failed requests: 0
Write errors: 0
Non-2xx responses: 9751
Total transferred: 7906663 bytes
HTML transferred: 0 bytes
Requests per second: 52.67 [#/sec] (mean)
Time per request: 1898.496 [ms] (mean)
Time per request: 18.985 [ms] (mean, across all concurrent requests)
Transfer rate: 41.71 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 88 319 487.6 156 3229
Processing: 20 1571 446.0 1700 2235
Waiting: 20 1570 445.9 1699 2235
Total: 140 1889 202.1 1874 5089

Percentage of the requests served within a certain time (ms)
  50% 1874
  66% 1916
  75% 1948
  80% 1972
  90% 2050
  95% 2144
  98% 2293
  99% 2408
100% 5089 (longest request)

Revision history for this message

Joe McDonagh (joseph-e-mcdonagh) wrote on 2009-09-20:

Chuck, is there anything else you need from me to promote this from incomplete?

Revision history for this message

Joe McDonagh (joseph-e-mcdonagh) wrote on 2009-09-23:

This took down a box today, had to visit the DC for it... is there any more information-gathering I can do for you or myself here?

Revision history for this message

Stefan Fritsch (sf-sfritsch) wrote on 2009-09-24:

Is the parameter -f TLS1 necessary to reproduce the problem?

Is the URL / of your webserver a php page? If yes, why is the content length of the page 0, what does the php script do? If not php, what is it? A simple redirect?

A workaround may be to add MaxRequestsPerChild 1000 (or an even lower value) to your configuration.

Revision history for this message

Joe McDonagh (joseph-e-mcdonagh) wrote on 2009-09-24: Re: [Bug 422138] Re: Slow memory leak, seen on two machines, appears to be dupe of 224945 even after -updates

Hi Stefan, thanks for the response. My responses are in-line.
> Is the parameter -f TLS1 necessary to reproduce the problem?
>
No, same behavior. Long-running children just never free up the memory
as long as I am hitting the SSL port, seemingly regardless if I pass an
algorithm to AB or not. I see radically different (as in this time,
normal) behavior if I hit the non-SSL port.
> Is the URL / of your webserver a php page? If yes, why is the content
> length of the page 0, what does the php script do? If not php, what is
> it? A simple redirect?
>
It is the index.php for the CodeIgniter framework, the redirect is
somewhat contrived, as it's handled by some internal CodeIgniter stuff
that I am not very familiar with. I don't think PHP is involved at all
in the leak, since the non-HTTPS port works fine.
> A workaround may be to add MaxRequestsPerChild 1000 (or an even lower
> value) to your configuration.
>
>
That wouldn't really help since long-running children just hold onto
memory no matter what it seems.

I feel like this issue would be huge and brought up by someone else by
now if it's an actual regression. Does the package information I put in
the beginning look right?

--
Joe McDonagh
Operations Engineer
www.colonfail.com

Revision history for this message

Stefan Fritsch (sf-sfritsch) wrote on 2009-09-26:

> > A workaround may be to add MaxRequestsPerChild 1000 (or an even lower
> > value) to your configuration.
> >
> >
> That wouldn't really help since long-running children just hold onto
> memory no matter what it seems.

A lower MaxRequestsPerChild should make the children terminate earlier.

> Does the package information I put in the beginning look right?

I think so.

Do you use SSLSessionCache dbm? If yes, try switching to shmcb. I seem to remeber a known memory leak in dbm.

You can also try if disabling mod_deflate helps.

Revision history for this message

Joe McDonagh (joseph-e-mcdonagh) wrote on 2009-09-28:

Stefan, thanks for re-iterating the MaxRequestsPerChild tidbit, I had forgotten that option was for lifetime, not concurrency. That'll probably help workaround momentarily.

Also, I am not currently doing any tweaking wrt the SSLSessionCache, I recall messing with it when I first started seeing this problem, but had removed the configuration options since it didn't seem to help much.

I would just like to see some other pair of eyes confirm that a default install with SSL still leaks. This is apparent because when I run the same tests against the HTTP only port things look normal.

Chuck Short (zulcss) on 2009-10-06

Changed in apache2 (Ubuntu):
importance:	Undecided → Low

Revision history for this message

Joe McDonagh (joseph-e-mcdonagh) wrote on 2009-10-07:

#10

Chuck Short wrote:
> ** Changed in: apache2 (Ubuntu)
> Importance: Undecided => Low
>
>

Chuck, was wondering, should this be marked Low from Undecided if it
hasn't been confirmed nor denied by anyone?

--
Joe McDonagh
Operations Engineer
www.colonfail.com

Revision history for this message

Chuck Short (zulcss) wrote on 2009-10-23:

#11

@Joe:

Its marked Low so our qa scripts pick it up. Ill take a look at this again after karmic has been released.

Regards
chuck

Revision history for this message

Joe McDonagh (joseph-e-mcdonagh) wrote on 2009-10-25:

#12

Chuck Short wrote:
> @Joe:
>
> Its marked Low so our qa scripts pick it up. Ill take a look at this
> again after karmic has been released.
>
> Regards
> chuck
>
>
Ah, ok, cool. Thanks Chuck.

Revision history for this message

Chuck Short (zulcss) wrote on 2009-11-25:

#13

@Joe,

have you tried the configuratoin change as suggested?

Thanks
chuck

Revision history for this message

Joe McDonagh (joseph-e-mcdonagh) wrote on 2009-11-25:

#14

Chuck Short wrote:
> @Joe,
>
> have you tried the configuratoin change as suggested?
>
> Thanks
> chuck
>
>
Hey Chuck, yea to the extreme I think it's at maybe 1 or 100 for
MaxRequestsPerChild, but it seems to resemble the old bug on the
surface. Either way it's quieted down for me, thanks for your attention
to the issue.

--
Joe McDonagh
Operations Engineer
Silent Penguin Services
AIM: YoosingYoonickz
IRC: joe-mac on freenode
www.colonfail.com

Chuck Short (zulcss) on 2010-01-25

Changed in apache2 (Ubuntu):
assignee:	Chuck Short (zulcss) → nobody
assignee:	nobody → Chuck Short (zulcss)
importance:	Low → Medium
status:	Incomplete → Triaged

Revision history for this message

Joe McDonagh (joseph-e-mcdonagh) wrote on 2010-01-25:

#15

Chuck, you're thinking this is that Debian bug that had an advisory last week right?

Revision history for this message

Chuck Short (zulcss) wrote on 2010-01-25:

#16

probably

Revision history for this message

Tomas Emma Johansson (etletlien) wrote on 2024-07-27:

#17

this bug started happening a few weeks ago on poeter.se. the admin said he fixed it, and it worked for a day or two, but the errors are actually worse now, day by day. in the mornings i just got timeouts. otherwise its a 10 sec loading time (from 2-3 secs before). people on smartphones report still getting a SSL error and 2-3 minute loading times. the admin seems to not care though, as usual. the weird part is i also got this on many other small sites, and it all stopped when they "fixed" it on poeter.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #199144

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntuapache2 package

Slow memory leak, seen on two machines, appears to be dupe of 224945 even after -updates

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches

Ubuntu
apache2 package