apache settings can yield frequent short outages
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Repository Cache Charm |
Fix Released
|
High
|
Haw Loeung |
Bug Description
While investigating a brief self-resolved alert for one of our cloud mirror regions, I zeroed in on this charm's Apache settings as one possible source of problems.
All of the units in the region are logging frequent scoreboard errors: "AH00288: scoreboard is full, not at MaxRequestWorkers".
Apache is configured with 1 child process which is recycled every 10000 connections:
ubuntu@
# [...]
StartServers 1
MinSpareThreads 1280
MaxSpareThreads 2560
ThreadLimit 2560
ThreadsPerChild 2560
ServerLimit 1
MaxRequestW
MaxConnecti
ubuntu@
Generally you want MaxRequestWorkers == ServerLimit * ThreadsPerChild, which is the case here.
However, if the traffic is reasonably evenly distributed across all of the backends, they could easily hit their 10,000 request limits around the same time and all be trying to cycle out at the same time, which could lead to haproxy seeing them all down, which would cause these alerts and result in service being briefly unavailable.
For the environment I was looking at, the 10,000 connenction limit per child yields the equivalent of a graceful every 7-8 minutes.
And since there is only one child, which has to wait for pending requests to finish and logging to complete before it can be restarted, meanwhile not allowing empty scoreboard slots to be used for fresh requests, we can easily end up with the main apache process seeing a full scoreboard and yet not have reached MaxRequestWorkers.
The sole child will also cause a similar problem at logrotate time, since that happens at the same time on all units (in the whole world, even!).
I think at least the following needs to be done:
- review whether MaxConnectionsP
- configure Apache to have more than one child process
Related branches
- Barry Price: Approve
- Canonical IS Reviewers: Pending requested
-
Diff: 12 lines (+1/-1)1 file modifiedconfig.yaml (+1/-1)
- Paul Collins: Approve (lgtm)
- Canonical IS Reviewers: Pending requested
-
Diff: 112 lines (+24/-22)3 files modifiedlib/ubuntu_repository_cache/apache.py (+5/-3)
lib/ubuntu_repository_cache/tests/test_apache.py (+15/-15)
tests/unit/test_apache.py (+4/-4)
- Colin Misare: Approve
- Canonical IS Reviewers: Pending requested
-
Diff: 48 lines (+28/-0)2 files modifiedfiles/cron_random_sleep.sh (+23/-0)
reactive/ubuntu_repository_cache.py (+5/-0)
- Canonical IS Reviewers: Pending requested
- Ubuntu Repository Cache Charmers, Canonical: Pending requested
-
Diff: 65 lines (+6/-6)3 files modifiedlib/ubuntu_repository_cache/apache.py (+1/-1)
lib/ubuntu_repository_cache/tests/test_apache.py (+4/-4)
tests/unit/test_apache.py (+1/-1)
Changed in ubuntu-repository-cache: | |
status: | New → In Progress |
importance: | Undecided → High |
assignee: | nobody → Haw Loeung (hloeung) |
Changed in ubuntu-repository-cache: | |
status: | In Progress → Fix Committed |
Changed in ubuntu-repository-cache: | |
status: | Fix Committed → Fix Released |
This may be related to LP:1917317.