allow requests to be queued if max_children limit is hit

Bug #1729610 reported by Galen Charlton
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenSRF
Fix Released
Wishlist
Unassigned

Bug Description

If the listener reaches the max_children limit on the number of drones actively servicing requests, it blocks until a drone becomes available. If that condition lasts long enough and enough additional requests get sent that listener, ejabberd will ultimately notice and sever the connection, leading to the service failing even if drones subsequently become available.

To address this, we propose to teach the listener how to maintain a queue of incoming requests so that the incoming socket can stay clear; this would improve resiliency in the case of a transitory surge in requests that take a long time to be serviced (as can be the case with Evergreen search).

OpenSRF master

Revision history for this message
Galen Charlton (gmc) wrote :

I've marked this bug as a wishlist, but since it addresses a failure mode, patches should be considered for backporting once they're validated.

Revision history for this message
Galen Charlton (gmc) wrote :

A WIP branch is available as collab/gmcharlt/lp1729610_request_queuing:

http://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/collab/gmcharlt/lp1729610_request_queuing

The patches so far do the following:

* add additional logging
* implement request queueing
* add a new example service, OpenSRF::Application::Slooooooow, that can be used to exercise the problem

Additional work being consider:

* once the queue is full, have the listener respond back to clients indicating that the request cannot be processed. We're processing defining a new status akin to HTTP 503 to allow for clients to potentially moderate the number of requests they send in an intelligent fashion.

Galen Charlton (gmc)
Changed in opensrf:
assignee: nobody → Galen Charlton (gmc)
milestone: none → 3.1-beta
Revision history for this message
Galen Charlton (gmc) wrote :

A patch series is ready for review in the user/gmcharlt/lp1729610_request_queuing_mark2 branch:

http://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/gmcharlt/lp1729610_request_queuing_mark2

tags: added: pullrequest
Changed in opensrf:
assignee: Galen Charlton (gmc) → nobody
Bill Erickson (berick)
Changed in opensrf:
assignee: nobody → Bill Erickson (berick)
status: New → Confirmed
Revision history for this message
Bill Erickson (berick) wrote :

Eyeballed. Looks good so far. Galen, I believe you need to git add osrf_cslow.c. Will test after that's available.

Changed in opensrf:
assignee: Bill Erickson (berick) → nobody
Revision history for this message
Galen Charlton (gmc) wrote :

Whoops. I've corrected that and force-pushed to the branch. Thanks, Bill!

Bill Erickson (berick)
Changed in opensrf:
assignee: nobody → Bill Erickson (berick)
Revision history for this message
Bill Erickson (berick) wrote :

Tested Perl and C slow services. Also tested some existing APIs to ensure no regressions were introduced. Looks good. Thanks, All! I have pushed a sign-off branch:

http://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/berick/lp1729610-request-queueing-signoff

Changed in opensrf:
assignee: Bill Erickson (berick) → nobody
tags: added: signedoff
Revision history for this message
Mike Rylander (mrylander) wrote :

Merged to master. Thanks, all, for the teamwork -- it made the request queuing dream work!

Changed in opensrf:
status: Confirmed → Fix Committed
Galen Charlton (gmc)
Changed in opensrf:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.