OPAC: My Account: Download Checkout History CSV breaks when there are a large number of items in the history

Bug #1208875 reported by Jim Keenan on 2013-08-06
80
This bug affects 16 people
Affects Status Importance Assigned to Milestone
Evergreen
Medium
Unassigned
2.12
Medium
Unassigned

Bug Description

Evergreen 2.3

After a patron logs in and tries to download their checkout history in CSV format, if there are many items in the history, the generated CSV file is blank.

In the initial case reported in C/W MARS, the patron had over 500 items in check out history over a 14 months period. This patron was a very active patron but sooner or later this issue will effect every avid library user.

Ben Shum (bshum) wrote :

Confirmed with our systems too.

Changed in evergreen:
status: New → Confirmed
importance: Undecided → Medium
tags: added: opac tpac
Jason Stephenson (jstephenson) wrote :

I'm working on a fix that bypasses the action trigger mechanism and generates the CSV directly in the OPAC. My experiments with the basic generation code show this is much faster than going through action trigger.

I just need to make it work in tpac, now.

Jason Stephenson (jstephenson) wrote :

I have had some time to work on this in fits and starts over the past couple of months. My current direction involves adding a view for circulation with title information and exposing that view in the IDL.

This code will also not be backward compatible with Evergreen < 2.6.0. It makes use of MVF database structures to retrieve the multiple icon format names for each copy. I believe this information, Book vs. Textual Material, DVD vs. Projected Medium, etc. is far more useful for the patron.

Changed in evergreen:
milestone: none → 2.next
Jason Stephenson (jstephenson) wrote :

I will be pushing a working branch either later today or early tomorrow. I have abandoned the notion of adding a new Fieldmapper entry because it cannot do what I want. Instead, I've gone with a series of JSON queries and cached results.

I will also add an additional commit that uses the same approach for downloading the My List CSV reports.

My approach also adds a Template Toolkit 2 plugin for a CSV field filter in Template::Plugin::CSVFilter. This is a generically useful filter that could be used by projects other than Evergreen. It could conceivably also be used to replace the csv_datum subroutine that gets exported to Template Toolkit 2 in OpenILS::Application::Trigger::Reactor.

I have tested the circ history export code on my development machine using a patron account that has 4,196 retained circulations. While it takes over a minute to generate the CSV file, the complete file is eventually returned to the user, unlike the current process where it times out and returns an empty CSV.

Finally, I have altered the columns in the output file in two ways.

First, I have added the checkout date, due date, and date returned columns from the web form to the CSV output.

Second, the format column has been changed from using the relatively cryptic MARC item type to returning the text description of the icon format. This change means that if you have customized icon formats, your patrons will see the descriptions for those. It also means that they should see more meaningful text, such as DVD or Blu-ray, instead of "Projected Medium."

This second change takes advantage of the Multi-Value Fields code added in 2.6. This means that the code cannot be backported prior to 2.6.0 without some modification. If multiple icon formats are retrieved for a copy, the values are separated by a + on output, i.e. a DVD and Blu-ray combo usually shows up as Blu-ray+DVD.

Jason Stephenson (jstephenson) wrote :

Never mind book lists for now. There are too many issues with that to deal with in a day.

I've pushed a branch to address the circ history CSV to working/user/dyrcona/lp1208875

tags: added: pullrequest
Jason Stephenson (jstephenson) wrote :

I have removed the pullrequest tag to rework the code in this branch based on the concerns expressed in https://bugs.launchpad.net/evergreen/+bug/1347774

However, it might be easier to tackle these changes after the rest of TPAC is fixed up.

tags: removed: pullrequest
Holly Brennan (hollyfromhomer) wrote :

Ugh. Any update on a fix for this?

We're facing yet another budget crunch and I want to advertise the "Library Value Calculator" as a way for patrons to enter the actual number of various items (books, movies, etc) that they checked out and see the dollar value of their library use. I thought the Checkout History spreadsheet would make it really easy for patrons to do this, but now I can't find a single person whose history is not breaking the report request.....

So not only is this just a bummer, our annual budget depends on this feature!

Jason Stephenson (jstephenson) wrote :

I won't be fixing it any time soon. My employers apparently have other priorities at the moment.

My initial approach needs to be rewritten in light of the desire to remove back end code from the OPAC and to use only public API calls as expressed in https://bugs.launchpad.net/evergreen/+bug/1347774.

I was going to wait on the work for that bug to be more or less complete, but I wouldn't hold my breath waiting for that, either.

Jason Stephenson (jstephenson) wrote :

Given code changes introduced in Evergreen 2.10 via bug 1527342, the fix for this bug will need a new approach. Dunno when I can get to it, but I will still take into account the goals of bug 1347774.

My previous approach "works" if you're on Evergreen 2.9 or earlier, but I don't recommend using it.

Jason Stephenson (jstephenson) wrote :

I have confirmed that this bug is still an issue in master/rel_2_10. I received an empty CSV download after about 30 seconds to a minute for a patron with 676 circulations in their history.

I am inclined to add a back end call, perhaps in open-ils.actor, to retrieve a patron's circ history. This could be used to write a CSV from the OPAC code itself and bypass the action trigger mechanism that is currently used. It looks to me that using action trigger is the main reason for the timeouts.

As for security, I'll implement the new routine to require authorization and the authtoken must be owned by the user whose circulation history is being retrieved.

Holly Brennan (hollyfromhomer) wrote :

I don't have anything technical to add, but wanted to break up this years-long conversation with yourself. :) Thanks, Jason!

I wonder if another approach to fixing this issue would be to make the csv generation take effect in the background. It might not be possible to say that every csv export will take less time than the page timeout. If someone has 20K entries in their history for instance.

Something like:

1. Click Generate CSV Export.
2. Add a single pending event to an event_def that exists just to generate csv history exports.
3. Return a message to user that the csv export will take one to 5 minutes, the page will automatically reload every 30 seconds.
4. action_trigger_runner for the proper granularity runs every minute.
5. There is a new reactor that processes the template and stores the results in a file. Although the output exists in the DB ayway, so maybe that isn't needed. Maybe just an apache handler that can serve up action_trigger.event_output results by a GUID.
6. Once the results are done, the next page reload, or simply loading up the history page again would show a link to download the export.
7. Expire the export after a certain amount of time.

Or maybe the reporter system would work better, since it already has the guts to create export files and to serve them up to those with the correct authorization? It would be kind of nice to be able to tell the user that they will get an email when their export is done.

Josh

Jason Stephenson (jstephenson) wrote :

The #1 problem is this already goes through action trigger. It is processed immediately, but still takes too long for the browser and/or OPAC layer so times out. Adding all those additional layers will slow it more and will require a JavaScript-heavy solution in the OPAC.

My solution was to make the cstore calls direct and generate the CSV immediately in the TPAC code. This works very well for the patrons who were having problems before. MVLC actually installed that patch. Then the conversation happened that tpac shouldn't be making cstore and other, private backend calls. So I thought I should make some suitable OpenSRF calls in Evergreen somewhere. This stalled out for lack of time on my part.

I'll see if I can dig up the branch that MVLC used and throw up in the working repo if it is not there already. If someone else wants to take that and make better calls out of it, feel free. If you just want to push that in, I'm OK with that, too.

Jason Stephenson (jstephenson) wrote :

Turns out, my branch, mentioned in comment #5 is still there. It probably needs a rebase, but have it and feel free to make a collab branch out of it.

Changed in evergreen:
assignee: nobody → Jason Etheridge (phasefx)
Jason Etheridge (phasefx) wrote :

I was poking at this, trying to pick into master, and ran into a change from bug 1527342 with the introduction of the action.usr_circ_history table. Do we know if this issue still stands with the latest code? I'll try to reproduce it...

Jason Etheridge (phasefx) wrote :

And if were to do more than skim the comments I would have seen the answer sooner :)

Jason Etheridge (phasefx) wrote :

Just for reference, currently in master on my desktop-hardware dev vm, trying to download 1,024 historical circs took about 3 minutes to generate an empty CSV file, while the A/T process itself took a bit longer to complete.

Jason Etheridge (phasefx) wrote :

I rebased Jason's branch against master and added a simple tweak to reconcile it with the newer code that fetches actor.usr_circ_history objects.

For testing, I used the concerto test data and mocked up some circ history data with the admin user in psql like so:

insert into action.usr_circ_history (usr,xact_start,target_copy,due_date,checkin_time,source_circ) select 1,xact_start,target_copy,due_date,checkin_time,id from action.circulation;

I didn't bother changing the circ user on those circs, but I did enable the user setting for recording circ history.

In master, using Download CSV against these takes about 2 minutes and produces an empty CSV file. With the bugfix branch, it took about 5 seconds for almost 500 entries. Thanks Jason!

Top 3 commits from collab/phasefx/lp1208875

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/phasefx/lp1208875

I'm willing to push it to master but would like just one more sanity check from someone. Thanks!

tags: added: pullrequest
Galen Charlton (gmc) wrote :

Swooping in to take a look.

tags: added: signedoff
Changed in evergreen:
assignee: Jason Etheridge (phasefx) → Galen Charlton (gmc)
Galen Charlton (gmc) wrote :

I've pushed to working a branch called user/gmcharlt/lp1208875_revised_implementation that does some surgery. From the commit message:

    This patch removes the proposed custom methods for extracting
    title, author, and record format in favor of tweaking
    ->fetch_user_circ_history to invoke unapi.bre and adjusting
    the template to use get_marc_attrs. Also, nowadays
    ->fetch_user_circ_history can flesh what we need it to without
    having to rely on the existance of an action.circulation row,
    which won't be present if the circ was aged but was otherwise
    retained in the user circ history.

    The result is slower than the previous approach, but still
    retains the core idea of getting A/T out of the equation, and remains
    much faster than the A/T approach.

    Dropping use of unapi.bre would speed things up a bit more, as it
    was added only to match the addition of the record format column
    in the CSV output. Drop the column, and we no longer need to worry
    about MVFs.

    There would also be opportunities to improve caching further. Bib
    display fields, when it comes, will likely help even more, as it
    would mean being able to drop a lot of the XML parsing currently used.

    This patch also adjusts misc_util.tt2 so that including it doesn't
    result in an unwanted blank line.

Removing signoff tag and asking Jason S for feedback; if we come to consensus on this approach, I can clean up the branch further.

tags: removed: signedoff
Changed in evergreen:
assignee: Galen Charlton (gmc) → nobody
tags: added: needsreleasenote
Galen Charlton (gmc) wrote :

Also added the needsreleasenote tag; since this is at heart a performance fix (even though it /also/ adds a few columns to the default output), I think this is a plausible candidate for backporting. However, sysadmins will need to be warned that the A/T event definition will no longer do anything.

Jason Stephenson (jstephenson) wrote :

Galen, I'll add this to my test branch for 2.12 and get back to you, hopefully next week.

Jason Stephenson (jstephenson) wrote :

When testing this on a vm with 2.12 and production data upgraded from 2.10, I get an internal server error with the following in the Apache error log:

[Fri Jun 09 12:33:47.506117 2017] [perl:error] [pid 19750] [client 192.168.100.100:43658] egweb: Context Loader error: Can't locate object method "content" via package "OpenSRF::DomainObject::oilsMethodException" at /usr/local/share/perl/5.22.1/OpenILS/Utils/CStoreEditor.pm line 458.\n, referer: https://10.250.150.80/eg/opac/myopac/circ_history

Unfortunately, my osrfsys.log is bereft of content. I believe the logging maybe turned way down on this VM.

I can't say if this peculiar to my situation or not. Something could be amiss on the VM, but I don't have the time to rebuild it to make sure right now.

Jason Etheridge (phasefx) wrote :

The new branch works well for me using my same test, no errors.

I've pushed my signoff here:
http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/phasefx/lp1208875_revised_implementation

Jason, do you want to give it another go and/or push to master? Thanks!

Jason Stephenson (jstephenson) wrote :

I'll test this again with the signoff branch on a fresh vm. There may have been some issues with the test environment because of other branches installed, etc.

I am using production data and rel_2_12 with our local customizations installed, fwiw.

I should be able to get to this by Friday. I know that's too late for 2.12.3, but that's the best I can do given what else I have going on. Jason's signoff is good enough if you want to push it. That won't stop me from testing again in my environment. I'll wait to assign myself until I'm really ready to test in case it goes in before hand.

Kathy Lussier (klussier) wrote :

I'm going to add a target to backport to 2.12 as well. Although it does add a few columns to the download, I primarily see this as a much-needed bug fix.

Jason Stephenson (jstephenson) wrote :

Well, I didn't get to test it by last Friday, but I did get to it yesterday. It works for me on a fresh VM with a branch based on 2.12.3 with local customizations and a copy of production data. I don't know what was wrong with the old vm.

I'm going to write a release note and push this to master and rel_2_12 today. This is not the ideal solution in my mind, but this has sat for too long and I think it should go in anyway.

Thanks Jason and Galen for keeping the torch lit!

Jason Stephenson (jstephenson) wrote :

Release note written and branch pushed to master and rel_2_12 for circ history downloading happiness!

Changed in evergreen:
status: Confirmed → Fix Committed
tags: removed: needsreleasenote
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers