Failure to display large report schedule/template folders

Bug #1756142 reported by Blake GH on 2018-03-15
28
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Evergreen
Medium
Unassigned
3.0
Medium
Unassigned
3.1
Medium
Unassigned
3.2
Undecided
Unassigned

Bug Description

When displaying folders within the staff client, it's sometimes possible for this nasty error to show in gateway.log

Returning NULL from app_request_recv after timeout: open-ils.reporter.folder_data.retrieve

This query can help identify large output folders

select rof.id,rof.name,count(*)
from
reporter.schedule rs,
reporter.output_folder rof
where
rof.id=rs.folder
group by 1,2
having count(*) > 68

and for template folders:

select rtf.id,rtf.name,count(*)
from
reporter.template rt,
reporter.template_folder rtf
where
rtf.id=rt.folder
group by 1,2
having count(*) > 68

Sometimes opening these folders (with limit output set to "ALL") can throw an error.

Blake GH (bmagic) on 2018-03-15
tags: added: reports
Dan Wells (dbw2) wrote :

Here is a fix for master (top two commits):

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/dbwells/stream_reporter_retreive_by_folder

The diff for 2.12 will be a little different due to the paging added in 3.0. I'll push that version shortly.

Jason Stephenson (jstephenson) wrote :

We've seen this since our upgrade to 2.12 last fall with one folder in particular. We split it up into four separate folders.

From my notes on our internal ticket, I'm seeing the same log messages as Blake:

osrf_json_gw 2017-10-23 16:32:53 [INFO:30632:osrf_app_session.c:394:1508754310306324] Returning NULL from app_request_recv after timeout: open-ils.reporter.folder_data.retrieve ["{AUTHTOKEN}","template",342,"50"]

Changed in evergreen:
status: New → Confirmed
Changed in evergreen:
milestone: none → 3.1-rc
Dan Wells (dbw2) on 2018-03-15
tags: added: pullrequest
Blake GH (bmagic) wrote :

After loading the patch, I can sometimes get an error in the XUL client when opening an output folder with only one result:

Error: TypeError: res is null
Source File: oils://remote/reports/oils_rpt_folder_window.js
Line: 513

Error: TypeError: sched is undefined
Source File: oils://remote/reports/oils_rpt_folder_window.js
Line: 583

These errors are lifted from the javascript console

Blake GH (bmagic) wrote :

But now it's working.... not sure what's up with that.

Blake GH (bmagic) wrote :

Yeah, it's back to broken again. I am feeling like it's responding to the javascript call before it has the complete set of data.

Jason Stephenson (jstephenson) wrote :

Dan's branch seems to definitively resolve the issue with template folders for me.

I have not been able to thoroughly test it with output folders, yet, because my normal account is apparently assigned to a library that doesn't use reports. I will have to find an account to see if I can reproduce Blake's previous comment.

Mike Rylander (mrylander) wrote :

I suspect that this may be an old-xul-client-has-old-opensrf-JS issue. Blake, does your staff client match the server's version of opensrf?

Jason Stephenson (jstephenson) wrote :

I see no output at all and get this in the console over and over again:

Timestamp: 03/19/2018 03:49:21 PM
Error: TypeError: res is null
Source File: oils://remote/reports/oils_rpt_folder_window.js
Line: 513

The above is with a client built specifically for this VM from a git check out.

Jason Stephenson (jstephenson) wrote :

NOTE: The error message also appears with line 560.

Dan Wells (dbw2) wrote :

I am wondering if we might have two different bugs in this interface. I've done much of my testing using srfsh, as the original problem seems to manifest there as well using the following command:

request open-ils.reporter open-ils.reporter.schedule.retrieve_by_folder "AUTH_TOKEN_HERE", 3, "", 1

Of course, the various IDs would need adjusting for your particular instance. Without the offered patch, this command reliably fails for me in srfsh, but with the patch, it reliably succeeds. Sometimes it takes quite a while, though (up to 20 seconds in my case for ~6000 results).

Are folks still getting failures when testing via srfsh with the patch? Here is the similar command for a list of templates:

request open-ils.reporter open-ils.reporter.folder_data.retrieve "AUTH_TOKEN_HERE", "template", 1, "", 0

I *think* this fetches all for a given folder, but may have grabbed the wrong one, so please double-check if you try this.

Thanks!

Dan Wells (dbw2) on 2018-03-20
Changed in evergreen:
importance: Undecided → Medium
Blake GH (bmagic) wrote :

Mike,

The XUL client is installed from the pre-compiled version available at updates/manualupdate.html

Changed in evergreen:
milestone: 3.1-rc → 3.1.1
Bill Erickson (berick) wrote :

Another data point, I just updated from OpenSRF 2.4 to OpenSRF 2.5. One large API call (open-ils.circ.copy_location.retrieve.all) that worked fine in 2.4 is now failing in the same way as the reporter templates, with an apparent timeout after 6 seconds.

Bill Erickson (berick) wrote :

Maybe not the best solution, but a quick solution in my case was to add max_chunk_size => 0 to the API definition.

Dan Wells (dbw2) wrote :

Bill, that's interesting. I was beginning to wonder if there might be some regression lurking in here somewhere. Also, since there is no official way to link that I know of, putting a reference to bug #1754164 on here as well (copy location specific).

I also still wonder if we are dealing with two bugs, one the six second timeout, the other a too large response. The failures I have seen (when testing with srfsh) are more or less instant. It might just so happen that the same situation can stumble into both cases?

Mike Rylander (mrylander) wrote :

In addition to max_chunk_size (for supporting older OpenSRFs), I think it would be a good idea to also add:

  max_bundle_count => 1

in the API definition. That's the modern way to say "send me every response without bundling".

Jason Stephenson (jstephenson) wrote :

So, these max_chunk_size and max_bundle_count values would go in the API definition for the affected open-ils services? If that's the case, I'd be willing to try it in conjunction with Dan's patches to see if that resolves the issues for me.

Changed in evergreen:
milestone: 3.1.1 → 3.1.2
Changed in evergreen:
milestone: 3.1.2 → 3.1.3
Changed in evergreen:
milestone: 3.1.3 → 3.1.4
Changed in evergreen:
milestone: 3.1.4 → 3.1.5
Changed in evergreen:
milestone: 3.1.5 → 3.1.6
no longer affects: evergreen/2.12
Changed in evergreen:
milestone: 3.1.6 → 3.2.1
Changed in evergreen:
milestone: 3.2.1 → 3.2.2
Changed in evergreen:
milestone: 3.2.2 → 3.2.3
Changed in evergreen:
status: Confirmed → New
milestone: 3.2.3 → 3.3-beta1
Changed in evergreen:
milestone: 3.3-beta1 → 3.3-rc
Changed in evergreen:
milestone: 3.3-rc → 3.3.1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers