Gateway timeout on retrieving many shelving locations

Bug #1754164 reported by Jeff Davis on 2018-03-07
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evergreen
Medium
Unassigned
2.12
Undecided
Unassigned
3.0
Undecided
Unassigned
3.1
Undecided
Unassigned

Bug Description

EG 2.12.1 and 3.0.3
OpenSRF 2.5 and 3.0

When requesting shelving locations (open-ils.circ.copy_location.retrieve.all) via the HTTP gateway, the request can time out prematurely if you have a large number of shelving locations.

To reproduce, add a large number of shelving locations to asset.copy_location (over 5,000 in my test environment). Then, request all shelving locations via the gateway:

https://example.com/osrf-gateway-v1?service=open-ils.circ&method=open-ils.circ.copy_location.retrieve.all

On my test server, this returns a null response:

{"payload":[],"status":200}

The open-ils.circ.copy_location.retrieve.all API call itself is not failing: OpenSRF logs show no errors, and the request succeeds via srfsh. However, the gateway logs show a timeout error:

Returning NULL from app_request_recv after timeout: open-ils.circ.copy_location.retrieve.all [null]

The default timeout value is 60s, but we get our null response in only a few seconds. The gateway request succeeds if we supply a larger timeout value as a URL param, e.g.:

https://example.com/osrf-gateway-v1?service=open-ils.circ&method=open-ils.circ.copy_location.retrieve.all&timeout=300

So far I haven't been able to replicate the issue with other API calls that return large chunked responses, but I don't know why there would be anything special about open-ils.circ.copy_location.retrieve.all.

Jeff Davis (jdavis-sitka) wrote :

See attachment for osrfsys and gateway logs for a failed request.

description: updated
Dan Wells (dbw2) on 2018-03-20
Changed in evergreen:
milestone: none → 3.1-rc
no longer affects: evergreen/3.1
Changed in evergreen:
importance: Undecided → Medium
Bill Erickson (berick) wrote :

Beware, changing an API from non-streaming to streaming changes how the client interacts with the call. Any clients (e.g. Open-ILS/web/js/dojo/openils/CopyLocation.js) that call the API will need to be taught to expect a stream instead of a single array of things.

Also, while we're addressing the issue of fetching too-large data blobs, I suggest we add {substream => 1} to the underlying cstore editor call so it also fetches the big list of stuff via streaming call from cstore.

Removing pullrequest until at least the first issue is resolved.

tags: removed: pullrequest
Changed in evergreen:
milestone: 3.1-rc → none
Jeff Davis (jdavis-sitka) wrote :

Thanks for reviewing, Bill. I should have known it wouldn't be so easy. :)

I don't know what is required to handle a streaming response from this API. Any assistance would be appreciated.

Elaine Hardy (ehardy) on 2019-03-07
tags: added: copylocations performance
Andrea Neiman (aneiman) on 2019-04-23
tags: added: itemlocations
removed: copylocations
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments