Evergreen

Evergreen Denial of Service easily accomplished

Bug #1361782 reported by Dan Pearl on 2014-08-26

262

This bug affects 2 people

	Status	Importance	Assigned to	Milestone
Evergreen	Fix Released	Medium	Unassigned	Evergreen 3.10.1
3.8	Fix Released	Medium	Unassigned	Evergreen 3.8.3
3.9	Fix Released	Medium	Unassigned	Evergreen 3.9.2

Bug Description

Evergreen 2.5
Other product versions irrelevant

This afternoon, all our brick heads got saturated and users got Network Errors when attempting to log in. Load was normal on the database server. Investigation by Equinox revealed that:

"it looks like there is a search for 'glorias way' being conducted over and over at the [branch deleted to protect the guilty] ip address. 4932 times when last counted. We temporarily blocked the ip address and reloaded apache on the brick heads. We will wait a few minutes and unblock the ip address. "

Their actions resolved the problem.

It is unclear whether the problem was caused by someone falling asleep with their finger on "Enter" (autorepeat), or if an object was pressing down on the key. What is clear is that this is a not-unlikely occurrence.

Tags:

Revision history for this message

Ben Shum (bshum) wrote on 2014-08-26:

Funny enough, this happened to us about two weeks ago on our production systems (master as of 2.6.1-ish era) too. Same symptoms, a library ended up doing the same search page request about 1000+ times and ate up all our apache workers on our bricks after which all the rest of the libraries were getting errors or dead pages.

We blocked that library's PC by IP, and then we found out they were still using Windows XP and kindly "suggested" that they upgrade at least to Windows 7 before we allow them back to Evergreen. We do not believe that was the real problem, but it was a good excuse at the time to get them to upgrade.

I've seen this sort of effect occur with other approaches too. Like the time we "load tested" production by pointing a small script at it to request the library home page 2000 times (which overloaded all the workers).

I've wondered if this is also something we can mitigate with more apache configuration best practices? Like adding some sort of reasonable rate limiter to requests by the same IP address so that we don't burn all our apache resources on any one person or bot.

That said if there's an Evergreen related issue, we should find that too....

Changed in evergreen:
status:	New → Confirmed
importance:	Undecided → Medium

Revision history for this message

Dan Pearl (dpearl) wrote on 2014-08-26:

This page has some good ideas on this: http://stackoverflow.com/questions/131681/how-can-i-implement-rate-limiting-with-apache-requests-per-second

Revision history for this message

Dan Scott (denials) wrote on 2014-08-26:

I'm sure there's a previous security bug that Mike Rylander worked on which caches search queries to prevent exactly this sort of attack. Possibly some variation on it which makes it not effective in this situation, but we should at the very least link to the old bug...

Revision history for this message

Dan Scott (denials) wrote on 2014-08-26:

Bug 1200770 was a follow up to it, but it wasn't the one I was thinking of...

Revision history for this message

Dan Scott (denials) wrote on 2014-08-26:

Bug 1172936 was what I was thinking of that sounds very very similar to this.

Revision history for this message

Jason Stephenson (jstephenson) wrote on 2015-12-02:

This is happening to us, and I don't think caching queries will be enough.

We have logs with the following search spammed over 22 times per second from a single host leading up to a time of load greater than 100 and over 140 apache drones running:

GET /eg/opac/results?query=the+cleaner&qtype=title&fi%3Asearch_format=&locg=1&sort= HTTP/1.1

Revision history for this message

Mike Rylander (mrylander) wrote on 2015-12-02:

Comments from Jason and others are correct, the "queue compression" code only protects the backend from an avalanche of identical searches, but does not stop apache from being overwhelmed.

There are ways to configure apache for rate limiting with modules. However, most work based on total connections per IP or similar metrics. The cost per search is much higher than the cost of other, normal requests, meaning that a relatively tiny number of search requests from an particular IP can cause a problem. IOW, the traffic that causes the problem is very difficult to detect because there is so little of it compared to other, non-problematic traffic. A cluster of apache servers only makes this problem worse, of course, because those multiple apache instances don't cooperate to compare total traffic.

What's more, in order to identify the "bad" traffic, the full URL needs to be inspected and interpreted. We do that in the "queue compression" code, but at that layer we don't have access to the IP address of the client. Perhaps we can teach the existing code to inform the mod_perl layer of the fact that there are existing requests for the search in question (IOW, that there is, in fact, a compressed search queue), as well as augment the queue compression code to count the number of concurrent searches in the queue. Then the mod_perl layer can correlate the IP to the queue size, and, if a threshold is passed, set a flag in memcache so that future requests from that IP address for any searches in general can be dropped for some amount of time. The effect would be similar to a human blocking the IP as described above.

The biggest drawback to this is that we risk blocking an entire branch without human intervention. So, maybe any requests that receive a "search is queued" message with a threshold-passing count from the search API are just cut short with a "too many requests" response to the client. That, at least, would only kill the identical searches. However, some resources would still be used to service the requests. IOW, the bar to DoS would be raised, but not removed.

It's a tricky problem...

Comments from Jason and others are correct, the "queue compression" code only protects the backend from an avalanche of identical searches, but does not stop apache from being overwhelmed.

There are ways to configure apache for rate limiting with modules.   However, most work based on total connections per IP or similar metrics.   The cost per search is much higher than the cost of other, normal requests, meaning that a relatively tiny number of search requests from an particular IP can cause a problem.  IOW, the traffic that causes the problem is very difficult to detect because there is so little of it compared to other, non-problematic traffic.  A cluster of apache servers only makes this problem worse, of course, because those multiple apache instances don't cooperate to compare total traffic.

What's more, in order to identify the "bad" traffic, the full URL needs to be inspected and interpreted.  We do that in the "queue compression" code, but at that layer we don't have access to the IP address of the client.  Perhaps we can teach the existing code to inform the mod_perl layer of the fact that there are existing requests for the search in question (IOW, that there is, in fact, a compressed search queue), as well as augment the queue compression code to count the number of concurrent searches in the queue.  Then the mod_perl layer can correlate the IP to the queue size, and, if a threshold is passed, set a flag in memcache so that future requests from that IP address for any searches in general can be dropped for some amount of time.  The effect would be similar to a human blocking the IP as described above.

The biggest drawback to this is that we risk blocking an entire branch without human intervention.  So, maybe any requests that receive a "search is queued" message with a threshold-passing count from the search API are just cut short with a "too many requests" response to the client.  That, at least, would only kill the identical searches.  However, some resources would still be used to service the requests.  IOW, the bar to DoS would be raised, but not removed.

It's a tricky problem...

Revision history for this message

Mike Rylander (mrylander) wrote on 2022-12-15:

... and, 7 years later, I have a branch that should move us in the right direction to mitigate these sorts of problems. Branch at security/user/miker/lp-1361782-restrict-concurrent-searches and from the commit message:

This commit adds two types of simple DoS protection:

* Limit concurrent search requests per client IP address, regardless of the searches being performed. This helps address issues of accidental spamming from a malfunctioning OPAC workstation, or crawlers of various types. The limit is controlled by a global flag called "opac.max_concurrent_search.ip".

* Limit the global concurrent search requests for the same query. This helps address both simple and distributed DoS that send the same search request over and over. The limit is controlled by a global flag called "opac.max_concurrent_search.query", and defaults to 20. When the limit is exceeded in either case the client receives an HTTP 429 "Too many requests" response from the web server, and the connection is ended.

tags:

added: pullrequest

Jason Stephenson (jstephenson) on 2022-12-22

Changed in evergreen:
assignee:	nobody → Jason Stephenson (jstephenson)

Revision history for this message

Jason Stephenson (jstephenson) wrote on 2022-12-27:

I have tested this branch on an EOLI test server and one of my own. It works for me with Apache bench spamming requests. It can also be tested by going to an OPAC search page and holding down the "Enter" key after entering search terms. You'll get a 429 response if the settings are set.

I'd say this needs a release note, and the settings should be documented. Other than that, the functionality works, so I've pushed a signoff of Mike's branch to the security repository:

user/dyrcona/lp-1361782-restrict-concurrent-searches-signoff

tags:	added: needsreleasenote signedoff
Changed in evergreen:
assignee:	Jason Stephenson (jstephenson) → nobody
milestone:	none → 3.11-beta

Revision history for this message

Mike Rylander (mrylander) wrote on 2023-02-21:

#10

I've rebased the branch against master and added release notes to document the global flags beyond the commit message. New branch up at security/user/miker/lp-1361782-restrict-concurrent-searches-rebase

tags:

removed: needsreleasenote

Galen Charlton (gmc) on 2023-03-23

Changed in evergreen:
milestone:	3.11-beta → 3.10.1
no longer affects:	evergreen/3.10

Revision history for this message

Galen Charlton (gmc) wrote on 2023-03-23:

#11

Committed in the branches that will be used to build the March 2023 releases. Thanks, Mike and Jason!

Changed in evergreen:
status:	Confirmed → Fix Committed

Galen Charlton (gmc) on 2023-03-27

information type:

Private Security → Public Security

Evergreen Bug Maintenance (bugmaster) on 2023-03-27

Changed in evergreen:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public Security information

Everyone can see this security related information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.