Evergreen Denial of Service easily accomplished
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
Fix Released
|
Medium
|
Unassigned | ||
3.8 |
Fix Released
|
Medium
|
Unassigned | ||
3.9 |
Fix Released
|
Medium
|
Unassigned |
Bug Description
Evergreen 2.5
Other product versions irrelevant
This afternoon, all our brick heads got saturated and users got Network Errors when attempting to log in. Load was normal on the database server. Investigation by Equinox revealed that:
"it looks like there is a search for 'glorias way' being conducted over and over at the [branch deleted to protect the guilty] ip address. 4932 times when last counted. We temporarily blocked the ip address and reloaded apache on the brick heads. We will wait a few minutes and unblock the ip address. "
Their actions resolved the problem.
It is unclear whether the problem was caused by someone falling asleep with their finger on "Enter" (autorepeat), or if an object was pressing down on the key. What is clear is that this is a not-unlikely occurrence.
Changed in evergreen: | |
assignee: | nobody → Jason Stephenson (jstephenson) |
Changed in evergreen: | |
milestone: | 3.11-beta → 3.10.1 |
no longer affects: | evergreen/3.10 |
information type: | Private Security → Public Security |
Changed in evergreen: | |
status: | Fix Committed → Fix Released |
Funny enough, this happened to us about two weeks ago on our production systems (master as of 2.6.1-ish era) too. Same symptoms, a library ended up doing the same search page request about 1000+ times and ate up all our apache workers on our bricks after which all the rest of the libraries were getting errors or dead pages.
We blocked that library's PC by IP, and then we found out they were still using Windows XP and kindly "suggested" that they upgrade at least to Windows 7 before we allow them back to Evergreen. We do not believe that was the real problem, but it was a good excuse at the time to get them to upgrade.
I've seen this sort of effect occur with other approaches too. Like the time we "load tested" production by pointing a small script at it to request the library home page 2000 times (which overloaded all the workers).
I've wondered if this is also something we can mitigate with more apache configuration best practices? Like adding some sort of reasonable rate limiter to requests by the same IP address so that we don't burn all our apache resources on any one person or bot.
That said if there's an Evergreen related issue, we should find that too....