reload apache2 with mpm_event cause scoreboard is full

Bug #1466926 reported by Branislav Staron on 2015-06-19
74
This bug affects 13 people
Affects Status Importance Assigned to Milestone
apache2 (Ubuntu)
Medium
Unassigned
Trusty
Undecided
Unassigned
Xenial
Undecided
Unassigned
Zesty
Undecided
Unassigned

Bug Description

On the clean install Ubuntu 14.04 with Apache without almost any client load the Apache server with the command "service apache2 reload" itself allocates slots marked with "Gracefully finishing" for which rejects new connections.

For full rejection of new requests is sufficient to perform 4x command "service apache2 reload".

Ubuntu 14.04.2 LTS
Apache 2.4.7-ubuntu4.4 (mpm_event)
Kernel 2.16.0-30-generic

Reproduce problem:
#################################################
1/ service apache2 start
______________________________________________________W_________
___________.....................................................
......................

2/ service apache2 reload

.........................GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGG__________________________________________________W__
______________________

3/ service apache2 reload

___W_____________________GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGG__________________________________________________...
......................

4/ service apache2 reload

GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG___
W_____________________

5/ service apache2 reload -> Server Apache not responding
With logs in apache error log file:
... [mpm_event:error] [pid 9381:tid 1234563234] AH00485: scoreboard is full, not at MaxRequestWorkers
...
#################################################

My workaround was change to MPM module from "mpm_event" to "mpm_worker".

affects: installation-report (Ubuntu) → apache2 (Ubuntu)
Henti Smith (henti) wrote :

This bug has been discussed on the apache bug tracker :

https://bz.apache.org/bugzilla/show_bug.cgi?id=53555

There seems to be no movement to fix this that I can see. there is a patch which seems to fix it for users up to a higher usage level.

https://bz.apache.org/bugzilla/attachment.cgi?id=33158

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in apache2 (Ubuntu):
status: New → Confirmed
Stewart Campbell (sc-pulsion) wrote :

A patch has now been committed to trunk for this bug:
https://bz.apache.org/bugzilla/show_bug.cgi?id=53555#c65

ChristianEhrhardt (paelzer) wrote :

Thanks Steward to ping with the issue upstream now being resolved and a patch available!

Changed in apache2 (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → Medium
Robie Basak (racb) on 2016-12-12
tags: added: server-next
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package apache2 - 2.4.25-3ubuntu2

---------------
apache2 (2.4.25-3ubuntu2) zesty; urgency=medium

  * Undrop (LP 1658469):
    - Don't build experimental http2 module for LTS:
      + debian/control: removed libnghttp2-dev Build-Depends (in universe).
      + debian/config-dir/mods-available/http2.load: removed.
      + debian/rules: removed proxy_http2 from configure.
      + debian/apache2.maintscript: remove http2 conffile.

 -- Nishanth Aravamudan <email address hidden> Fri, 10 Feb 2017 08:53:43 -0800

Changed in apache2 (Ubuntu):
status: Triaged → Fix Released
Nick (n6ck) wrote :

Will this be backported to Trusty or Xenial?

Yes this is impacting on Xenial - when will this be backported to currently supported LTS releases?

Haw Loeung (hloeung) on 2017-07-10
Changed in apache2 (Ubuntu Xenial):
status: New → Confirmed
Changed in apache2 (Ubuntu Trusty):
status: New → Confirmed
Haw Loeung (hloeung) on 2017-07-10
Changed in apache2 (Ubuntu Zesty):
status: New → Fix Released
Changed in apache2 (Ubuntu Xenial):
status: Confirmed → Triaged
ChristianEhrhardt (paelzer) wrote :
Download full text (3.9 KiB)

First of all I beg your pardon, for this bug being dormant.
We started to clear this kind of bugs recently, but obviously one can't do all in one day :-/
In this in particular I was made aware of by others being affected.

## CASE ##
For an SRU we need a reproducible case of some sorts.
On first try in the mpm event config as it is installed by default I can't see this issue.
Tried on Trusty and Xenial, but this stays at all waiting for connection.

I'm through some iterations on this and while not complete yet have some lessons learned, we need:
1. long running requests
2. a graceful restart that puts all those into "G" for a while
3. a lot requests that fail due to most/all slots being blocked

After some iterations I got this two system setup:

# Server
# Prep somewhat large file non compressible file on server
$ dd if=/dev/urandom of=/var/www/html/test1 bs=1M count=32
$ dd if=/dev/urandom of=/var/www/html/test1 bs=1k count=4

# Client
# slow down to somewhat like an internet connection
$ tc qdisc add dev eth0 root handle 1: htb default 12
$ tc class add dev eth0 parent 1:1 classid 1:12 htb rate 4000kbps ceil 12000kbps
$ tc qdisc add dev eth0 parent 1:12 netem delay 200ms

# Client - 150 slow requests
$ ab -q -S -c 150 -n 150 10.0.4.30/test1
# Server reload to cause "G" state
$ apache2ctl status; apache2ctl graceful; apache2ctl status; sleep 5s; apache2ctl status
# Client many fast exceeding the few/no remaining workers
$ ab -q -S -c 150 -n 5000 10.0.4.30/test2

# I can see the status being clogged up in "G" on most workers {1} but things are still working fine :-/
There must be a way to reproduce this that isn't "be a webhoster for 4000 people".
If one of the affected has something better please let me know.

## FIX ##
On the fix itself it is also a bit messy as there were multiple revision, splits of PRs and such.
What I found is that initial proposal of the fix that eventually got into 2.4.25 is attached as [2], but was broken up upstream. There on the 2.4 branch it actually is [3] plus some doc fixups [4],[5] to be correct after the fix.

## Testing ##
For now I have made a ppa available for testing at [6].
This is a backport of the referred fix for Xenial - yet untested.
Since I can't reproduce yet I'm depending on you to:
a) test from the ppa if the fix is working (and showing no related regression
b) helping me with or without the ppa to create some working steps to reproduce

[1]: https://bz.apache.org/bugzilla/show_bug.cgi?id=53555#c39
[2]: https://bz.apache.org/bugzilla/attachment.cgi?id=34202&action=diff&collapsed=&headers=1&format=raw
[3]: https://github.com/apache/httpd/commit/e7407f84ec2a1b7f2c04775a230f147c08860c7c
[4]: https://github.com/apache/httpd/commit/86db1247c70699df6acad75f2491b8baa0030ff6
[5]: https://github.com/apache/httpd/commit/1a7e2114393c9dd9f8d87e53dfd74ce9ede3c3c0
[6]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3034

{1}
18.8 requests/sec - 22.7 MB/second - 1.2 MB/request
1 requests currently being processed, 24 idle workers

PID Connections Threads Async connections
     total accepting busy idle writing keep-alive closing
2661 15 no 0 0 0 0 ...

Read more...

ChristianEhrhardt (paelzer) wrote :

To make it very clear that I have to rely on feedback of other affected users in regard to the testcase and the ppa backport to test I'll mark the X/T tasks incomplete.

Changed in apache2 (Ubuntu Xenial):
status: Triaged → Incomplete
Changed in apache2 (Ubuntu Trusty):
status: Confirmed → Incomplete
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.