Triggered event log times out for large-data sites

Bug #1207533 reported by Mike Rylander
160
This bug affects 32 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Medium
Unassigned
2.8
Won't Fix
Medium
Unassigned
2.9
Won't Fix
Medium
Unassigned

Bug Description

The reason is that we tried to improve usability by combining all of a user's events into one stream. However, there's just no good way, with current data, to make the queries needed to drive the UI fast.

Current proposal for fix:

  1) add 2 new fields to the event definition table: context_user_path and context_ou_path. These will store the path, from the target object, for retrieving the id of the relevant context user and context org -- or, against whom the event occurred, and where (which may be different that the definition owner).

  2) add 2 new fields to the event table to record the event-specific instances of the context ou and context user

  3) have the "atul" and related IDL-views use /those/ values instead of having to look up hold.usr or circ.usr, and hold.pickup_lib or circ.circ_lib, for the various "usr" and "perm_lib" values.

  4) profit

Here's the current query generated:

SELECT "atul".hook, "atul".name, "atul".reactor, "atul".id, "atul".event_def, "atul".add_time, "atul".run_time, "atul".start_time, "atul".update_time, "atul".complete_time, "atul".update_process, "atul".state, "atul".user_data, "atul".template_output, "atul".error_output, "atul".async_output, "atul".target_circ, "atul".target_hold, "atul".perm_lib FROM (
SELECT atevdef.hook,
atevdef.name,
atevdef.reactor,
atev.id,
atev.event_def,
atev.add_time,
atev.run_time,
atev.start_time,
atev.update_time,
atev.complete_time,
atev.update_process,
atev.state,
atev.user_data,
atev.template_output,
atev.error_output,
atev.async_output,
targ_circ.id AS target_circ,
targ_ahr.id AS target_hold,
COALESCE(
targ_circ.circ_lib,
targ_ahr.pickup_lib) AS perm_lib
FROM action_trigger.event atev
JOIN action_trigger.event_definition atevdef ON (atevdef.id = atev.event_def)
JOIN action_trigger.hook ath ON (ath.key = atevdef.hook)
LEFT JOIN action.circulation targ_circ ON (ath.core_type = 'circ' AND targ_circ.id = atev.target)
LEFT JOIN action.hold_request targ_ahr ON (ath.core_type = 'ahr' AND targ_ahr.id = atev.target)
WHERE atev.add_time > NOW() - (SELECT MAX(value) FROM (
SELECT value::INTERVAL FROM actor.org_unit_ancestor_setting('circ.staff.max_visible_event_age', COALESCE(targ_circ.circ_lib, targ_ahr.pickup_lib)) UNION
SELECT '1000 YEARS'::INTERVAL AS value) ous)) AS "atul"
LEFT JOIN action_trigger.hook AS "__core_type_ath" ON ( "__core_type_ath".key = "atul".hook )
LEFT JOIN action.circulation AS "__AUTO_target_circ_desk_renewal_circ" ON ( "__AUTO_target_circ_desk_renewal_circ".id = "atul".target_circ )
LEFT JOIN action.hold_request AS "__AUTO_target_hold_current_shelf_lib_ahr" ON ( "__AUTO_target_hold_current_shelf_lib_ahr".id = "atul".target_hold )
LEFT JOIN action.circulation AS "__target_circ_copy_author_circ" ON ( "__target_circ_copy_author_circ".id = "atul".target_circ )
LEFT JOIN asset.copy AS "__target_circ_copy_author_acp" ON ( "__target_circ_copy_author_acp".id = "__target_circ_copy_author_circ".target_copy )
LEFT JOIN asset.call_number AS "__target_circ_copy_author_acn" ON ( "__target_circ_copy_author_acn".id = "__target_circ_copy_author_acp".call_number )
LEFT JOIN biblio.record_entry AS "__target_circ_copy_author_bre" ON ( "__target_circ_copy_author_bre".id = "__target_circ_copy_author_acn".record )
LEFT JOIN reporter.materialized_simple_record AS "__target_circ_copy_author_rmsr" ON ( "__target_circ_copy_author_rmsr".id = "__target_circ_copy_author_bre".id )
LEFT JOIN action.hold_request AS "__target_hold_copy_title_ahr" ON ( "__target_hold_copy_title_ahr".id = "atul".target_hold )
LEFT JOIN asset.copy AS "__target_hold_copy_title_acp" ON ( "__target_hold_copy_title_acp".id = "__target_hold_copy_title_ahr".current_copy )
LEFT JOIN asset.call_number AS "__target_hold_copy_title_acn" ON ( "__target_hold_copy_title_acn".id = "__target_hold_copy_title_acp".call_number )
LEFT JOIN biblio.record_entry AS "__target_hold_copy_title_bre" ON ( "__target_hold_copy_title_bre".id = "__target_hold_copy_title_acn".record )
LEFT JOIN reporter.materialized_simple_record AS "__target_hold_copy_title_rmsr" ON ( "__target_hold_copy_title_rmsr".id = "__target_hold_copy_title_bre".id )
LEFT JOIN action.circulation AS "__target_circ_patron_barcode_circ" ON ( "__target_circ_patron_barcode_circ".id = "atul".target_circ )
LEFT JOIN actor.usr AS "__target_circ_patron_barcode_au" ON ( "__target_circ_patron_barcode_au".id = "__target_circ_patron_barcode_circ".usr )
LEFT JOIN actor.card AS "__target_circ_patron_barcode_ac" ON ( "__target_circ_patron_barcode_ac".id = "__target_circ_patron_barcode_au".card )
LEFT JOIN action.circulation AS "__target_circ_copy_barcode_circ" ON ( "__target_circ_copy_barcode_circ".id = "atul".target_circ )
LEFT JOIN asset.copy AS "__target_circ_copy_barcode_acp" ON ( "__target_circ_copy_barcode_acp".id = "__target_circ_copy_barcode_circ".target_copy )
LEFT JOIN action.hold_request AS "__target_hold_patron_barcode_ahr" ON ( "__target_hold_patron_barcode_ahr".id = "atul".target_hold )
LEFT JOIN actor.usr AS "__target_hold_patron_barcode_au" ON ( "__target_hold_patron_barcode_au".id = "__target_hold_patron_barcode_ahr".usr )
LEFT JOIN actor.card AS "__target_hold_patron_barcode_ac" ON ( "__target_hold_patron_barcode_ac".id = "__target_hold_patron_barcode_au".card )
WHERE ( ( "__AUTO_target_circ_desk_renewal_circ".usr = '466086' ) OR ( "__AUTO_target_hold_current_shelf_lib_ahr".usr = '466086' ) ) AND ( "__core_type_ath".core_type IN ('circ', 'ahr') ) AND ( ( "atul".state = 'complete' ) AND ( ( "__core_type_ath".core_type = 'circ' ) ) ) AND "atul".perm_lib IN (120) ORDER BY "atul".run_time ASC LIMIT 25 OFFSET 0;

Revision history for this message
Mike Rylander (mrylander) wrote :

Another possible angle of attack is to split the hold and circ UI components into separate tabs, as in the previous interface. This /might/ be enough to address the speed, but I think materializing more state information up front will pay larger rewards in the long run.

Revision history for this message
Rogan Hamby (rogan-hamby) wrote :

I don't have any objection to splitting the hold and circ components into separate tabs. It's less convenient but only mildly and if it alleviates the problem it's well worth it. In some scenarios with a lot of data it might help staff with sorting it anyway.

But, it sounds like materializing the information up front is needed whether we do it or not.

Revision history for this message
Ben Shum (bshum) wrote :

Marking confirmed due to high "affects me" counter

Changed in evergreen:
status: New → Confirmed
importance: Undecided → Medium
tags: added: performance
Revision history for this message
Srey Seng (sreyseng) wrote :

Just wondering, (even though the "Assigned to" is set as   Unassigned) whether anyone is currently looking into this?

For those with large data sets being affected, is there a workaround to successfully navigate the Triggered event log interface? Or is that interface being avoided in favor of some other way of viewing these logs?

Revision history for this message
Chris Sharp (chrissharp123) wrote :

Just hitting this bug in PINES. MIke, I believe that your inital proposed approach is the correct one. At this point, even retrieving information directly from the database requires a similarly (sorry, but) convoluted query as the generated one above. Having a hard link between usr and context OU would go far when having to track these.

Not sure if the lack of recent updates here indicates a perceived lack of need or not, but please know that PINES would need this fixed in order to move off of legacy overdue notice generation and onto A/T (which still works well for us).

Thanks for listening!

Chris

Revision history for this message
Chris Sharp (chrissharp123) wrote :

Updating the series targets - this still affects us in 2.9, FYI.

no longer affects: evergreen/2.5
no longer affects: evergreen/2.4
Revision history for this message
Scott Thomas (scott-thomas-9) wrote :

This is a big problem for us in Pennsylvania Integrated Library System. We would be in favor of splitting Circ and Holds into separate tabs or any other solutions that addresses the problem.

Revision history for this message
Chris Sharp (chrissharp123) wrote :

Still an issue in 2.11, FYI.

Revision history for this message
Michele Morgan (mmorgan) wrote :

Still an issue in 2.12.

Andrea Neiman (aneiman)
Changed in evergreen:
milestone: none → 3.next
Andrea Neiman (aneiman)
Changed in evergreen:
milestone: 3.next → none
Revision history for this message
Scott Thomas (scott-thomas-9) wrote :

This is still a big problem for us. We were hoping it would be fixed in Webby.

Revision history for this message
Jason Boyer (jboyer) wrote :

By all outward appearances this is the bug that won't die but I may have accidentally wounded it. Now that we have a fix for bug 1672775 that removes outdated events and their output, the branch in bug 1778940 speeds up access to event outputs to the point that this interface is sometimes functional again in Evergreen Indiana (7940263 events and 2428921 outputs) though it still takes almost a full minute to return results. There's still no way to page to additional results, though I expect the angular-ization of this interface to take care of that in time.

Revision history for this message
Andrea Neiman (aneiman) wrote :

PaILS funded a specification to address this bug, and said spec can be seen here:

http://yeti.esilibrary.com/dev/public/techspecs/tel_rewrite.pdf

If anyone is interested in a quote for developing this, please contact me at <email address hidden> - note that the spec is dated Dec. 2018 and will need at least minor revisions to account for code changes since then, potentially including the pending Angular port in bug 1859707. However, the main aspects of our plan including query refinement still stand.

Changed in evergreen:
assignee: nobody → Jason Etheridge (phasefx)
Revision history for this message
Andrea Neiman (aneiman) wrote :

PaILS has signed a contract with Equinox to address this bug per the specification noted in comment #12.

Revision history for this message
Jason Etheridge (phasefx) wrote :

I've pushed a branch to collab/phasefx/lp1207533-triggered-events-log

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=commitdiff;h=bdf3f1e1d893c5af76032b3d748d7f9f3087f318

One way to test this is in the staff client:

Administration -> Local Administration -> Notifications / Action Triggers
Filter -> Hook Is hold_request.cancel.staff -> Apply
Double-click the "Hold Cancelled (Staff) User Message" row, but not the hyperlink.
Check Enabled and change Processing Delay to 00:00:00, and then Save.

Retrieve a patron account, go to Holds, Place Hold. Find a title and place a hold on it.

Re-retrieve the patron account and cancel the hold.

On the server as user opensrf (assuming a typical stock setup), manually run:
/openils/bin/action_trigger_runner.pl --osrf-config /openils/conf/opensrf_core.xml --run-pending

In the patron account, go to Other -> Triggered Events / Notifications

Click on the Holds tab in the spawned interface. You should see the title you placed a hold on.

tags: added: pullrequest
Changed in evergreen:
assignee: Jason Etheridge (phasefx) → nobody
Andrea Neiman (aneiman)
Changed in evergreen:
milestone: none → 3.7-beta
Revision history for this message
Andrea Neiman (aneiman) wrote :
Revision history for this message
Ruth Frasur Davis (redavis) wrote :

Just a question - is the data set that we're using to test this set up in such a way as to replicate the issues with large data sites?

Revision history for this message
Terran McCanna (tmccanna) wrote :

Ruth - it would definitely be better to test it on a server that has a copy of a real data set.

Changed in evergreen:
assignee: nobody → Jason Stephenson (jstephenson)
Revision history for this message
Jason Stephenson (jstephenson) wrote :

I failed to test this branch. I first tried a back port to 3.5 on a test vm that was mostly setup already, but the feature did nothing. I never saw the user's triggered events in the event log, even though I had two in the database.

I chalked that up to a problem with the back port and attempted to upgrade the VM and database to master so I could try again.

After deleting everything under /openils, updating prerequisites, installing OpenSRF and Evergreen fresh, updating the configuration files, clearing browser storage, etc., I was not able to login via the staff client. This could be a problem with anything and possibly not related to this branch. (Actually, I think I botched changing db settings in opensrf.xml.)

I moved on to another virtual machine running stock master. I installed a rebased version of this branch and pointed it at the database used in the previous test so that I would have production data to look at.

Once this was set up, I could login, retrieve the patron that had tow hold cancelled by staff messages added following the steps in comment #14 with two holds. When I go to look at the patron's triggered events, I see "Nothing to display" for both Circ and Holds. It could be that I'm not logged in at the same library that is listed as the "Creating Library" of the messages. It is the library in the "owner" field of the action_trigger.event_definition.

Given this behavior, I cannot say this code works. I also cannot say that it doesn't work.

The branch also has a conflict with Open-ILS/src/sql/Pg/950.data.seed-values.sql when rebased on master. It seems rather simple to fix, but this could warrant a needsrepatch tag.

I have run out of time to look at this for now, so I'm removing myself from the bug. Maybe someone else who has fewer hoops to jump through will have better luck testing this branch.

Changed in evergreen:
assignee: Jason Stephenson (jstephenson) → nobody
Revision history for this message
Mike Rylander (mrylander) wrote :

I've added a commit to build upon Jason's patron version of Triggered Events Log with an item-focused version, and signed off his commit. The branch is here:

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/lp1207533-triggered-events-log

From the commit:

This commit adds an item-oriented version of the Triggered Event Log
interface. The primary interface differences between the two are:
  * Instead of showing the item barcodes for events related to a
  patron, it shows the patron barcodes for events related to an item
  * It is labeled as "Item Specific" instead of "Patron Specific"

The main interface addition can be tested in the same way as the parent
commit. To get to the new item TEL interface, use the Actions menu from
the patron Items Out interface, the Item Status interface, or the
Holdings list in the record cataloging interface.

Revision history for this message
Andrea Neiman (aneiman) wrote :

Noting for future reference that when this is merged, both PaILS (Patron TEL) and ECDI (Item TEL) should get credit as development sponsors.

Changed in evergreen:
milestone: 3.7-beta → 3.8-beta
Revision history for this message
Mike Rylander (mrylander) wrote :

I've pushed a rebased, updated branch containing both TELs to: working/user/miker/lp-1207533-triggered-events-log-rebased / https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/lp-1207533-triggered-events-log-rebased

Revision history for this message
Andrea Neiman (aneiman) wrote :
Revision history for this message
Terran McCanna (tmccanna) wrote :

Chris applied this and did the recursive work onto a PINES test server so that we could look at it with a very large existing data set. The new interface loads quickly and looks good overall, but I have some questions and problems:

1. Am I correct in assuming that the Retention Interval in the Notifications/Action Triggers definition is what controls how long a log entry will be available in the Triggered Events Log? (This seems logical, I just wanted to verify - that'd probably be good to add to the new docs as well.)

2. Several column headers marked "time" only show the date, but the time would actually be useful to display as well (Event Complete Time, Event Add Time, Event Run Time, Event Start Time)

3. On the Patron triggered events page the Context Item column is empty - could this be an error with the script that does the recursive change? (The column is not empty on the Equinox test server.)

4. On the Item triggered events page I'm getting no data - possibly related to #3?

5. It would be nice to have the title linked to the title record and the item linked to the item record.

6. Pagination isn't working for us (but it is working on the Equinox test server).

7. It would be nice for the docs to also include a description of the Cancel / Reset actions that are available and how to expect them to work.

Revision history for this message
Chris Sharp (chrissharp123) wrote :

Noting here that our testing was done with the code backported to 3.6.1-ish.

Revision history for this message
Mike Rylander (mrylander) wrote :

Hi Terran and Chris,

Thanks for testing! Some answers for the non-doc questions:

1) Yes, that should be the /minimum/ time they'll be visible, depending on how often you're purging events.

2) That may be a general grid thing, and might even be different in 3.7.

3 and 4) There is an update to run in a comment at the bottom of the file Open-ILS/src/sql/Pg/upgrade/ZZZZ.schema.item_triggered_event_log.sql that will fill in the data. It is not run by default because, well, big data. :)

5) Fair, but I think that needs to be a wishlist LP (or if anyone wants to add it as a follow-on, I wouldn't object).

6) I suspect fairly strongly that's a grid issue in older grid code.

Thanks again!

Revision history for this message
Andrea Neiman (aneiman) wrote :

Hi Terran - I've updated the docs draft with your suggestions. Thanks!

Revision history for this message
Mike Rylander (mrylander) wrote :

Hi all. I've rebased the branch here in hopes of getting more eyes on this for feedback fest.

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/lp-1207533-triggered-events-log-rebased-again

Revision history for this message
Mike Rylander (mrylander) wrote :

Hi again, all. I've rebased this again for ease of testing/merging. The two commits merged completely cleanly.

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/lp-1207533-triggered-events-log-rebased-again-squared

Changed in evergreen:
assignee: nobody → Chris Sharp (chrissharp123)
Revision history for this message
Dawn Dale (ddale) wrote :

I have tested this code and consent to signing off on it with my name, Dawn Dale and my email address, <email address hidden>.

Revision history for this message
Terran McCanna (tmccanna) wrote :

(Noting that I'll add some follow-up wish-lists to display the template output, sort the rows by run time, and display the times rather than just the dates.)

tags: added: signedoff
Revision history for this message
Andrea Neiman (aneiman) wrote :

Thanks Dawn & Terran.

I had filed a few follow ups for Triggered Events Log after ECDI tested this spring:

Bug 1931597 - default sort by run time
Bug 1930627 - include event count
Bug 1930628 - additional column options

Revision history for this message
Terran McCanna (tmccanna) wrote :

aneiman++

Revision history for this message
Terran McCanna (tmccanna) wrote :
Revision history for this message
Chris Sharp (chrissharp123) wrote :

Forget to update the bug report. Committed to master. Thanks to all involved!

Changed in evergreen:
status: Confirmed → Fix Committed
Changed in evergreen:
assignee: Chris Sharp (chrissharp123) → nobody
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.