Eliminate staged search

Bug #1698206 reported by Mike Rylander on 2017-06-15
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Wishlist
Unassigned

Bug Description

We've tried in the past to move from our current search mechanism, which trades completeness of results for performance, to a pure-SQL, completely accurate search. It didn't work out then, but, hope springing eternal, a new attempt was made. The forthcoming branch implements the output of a research project funded by MassLNC over the course of the last year or so. The result is:

 * Faster core search, esp. for searches that are not over-broad
 * Faster page rendering, esp. for search results, though across the board to some degree
 * Fully accurate search results, including in the face of consortial configurations where, previously, small library's items could be omitted from result sets
 * Reliance on and, more positively, access to new Postgres features

Mike Rylander (mrylander) wrote :

Branch pushed at http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/lp-1698206-eliminate_staged_search ... The baseline schema has /not/ been synced to the upgrade script, so testing will require running that after installation. I'm putting that off until any bugs are worked out of the DB-level code.

Mike Rylander (mrylander) wrote :

And, finally, the commit message should make for a good release notes entry.

tags: added: pullrequest
Kathy Lussier (klussier) wrote :

A note for whoever does the release note entry. The commit message should probably be put in the TechRef directory with a more concise note as the release note entry. The release note entry could then point to the TechRef information for those who want more information.

Mike Rylander (mrylander) wrote :

Kathy, I agree and did the TechRef part. I also addressed the reported issue with pagination failing across superpage boundaries. Same same bat-time, bat-branch.

Mike Rylander (mrylander) wrote :

Note: I've added an additional commit which teaches autosuggest how to use the current search scope when providing suggestions. This matches the previous behavior.

Kathy Lussier (klussier) wrote :

Thanks Mike!

I'm having trouble loading the Concerto dataset on my test system with this branch.

I'm getting the following error:

psql:assets_concerto.sql:14: ERROR: record "new" has no field "target_copy"
CONTEXT: SQL statement "INSERT INTO asset.copy_vis_attr_cache (record, target_copy, vis_attr_vector) VALUES (
                ncn.record,
                NEW.target_copy,
                asset.calculate_copy_visibility_attribute_set(NEW.id)
            )"
PL/pgSQL function asset.cache_copy_visibility() line 28 at SQL statement
SQL function "populate_copy" statement 1
Writing offline database configuration to /openils/conf/offline-config.pl

Mike Rylander (mrylander) wrote :

Ah, yep, there's the paste-o, right there. Update force-pushed to the branch. To continue from your current point, just reload the function asset.cache_copy_visibility() from the branch.

Kathy Lussier (klussier) wrote :

I came across another problem Mike. When I loaded this on fresh Evergreen system using the baseline schema, I see the following problems:

- the search results page is not displaying copy availability counts.
- the record summary page is not displaying copy availability counts.
- the record summary page is not displaying the copy table.

I also upgraded an existing Evergreen VM using the upgrade scripts, but did not see the same behavior.

I didn't see anything in the logs that indicated what the problem may be.

Mike Rylander (mrylander) wrote :

Kathy, I found the problem. The public (non-staff) copy counts were depending on deprecated caching tables. The upgrade version worked fine because it had the old data.

Working on a change now.

Mike Rylander (mrylander) wrote :

Branch is updated. If you apply the four new function definitions at the end of the upgrade script to your fresh install, it will have the same effect as a complete new install.

Mike Rylander (mrylander) wrote :

And now I've pushed one more commit to remove any remaining uses of the old cache table.

Kathy Lussier (klussier) on 2017-08-07
Changed in evergreen:
assignee: nobody → Kathy Lussier (klussier)
Kathy Lussier (klussier) wrote :

Thanks Mike! The availability counts are now displaying, but I've lost the "View Other Formats & Editions" block on the record summary page. This was on a fresh install again.

Changed in evergreen:
assignee: Kathy Lussier (klussier) → nobody
Mike Rylander (mrylander) wrote :

I'll look into that now, Kathy. Thanks for the continual testing.

Mike Rylander (mrylander) wrote :

I found two remaining uses of the old copy visibility cache in SQL functions. I've updated those to use the new infrastructure and pushed the commit to the branch. As before, simply updating the two functions directly is all that's needed -- no data changes, and no Perl changes, so no need to do the full reinstall dance.

Kathy Lussier (klussier) wrote :

Unfortunately, I had technical difficulties with my test environment for most of this week, and I am going to be unavailable to test this again until August 23. If somebody else has the tuits to look at this branch in the meantime, please do. Otherwise, it's on the top of my list for testing (along with bug 1706107) when I get back.

I also wanted to add a note that when I previously tested this branch on an upgraded system about a month ago, it worked very well. We performed our testing on two systems with identical datasets with the only difference being that one had the new code and the other was vanilla master.

The speed of retrieving search results were mostly improved with this branch. We noticed the biggest performance gains in copy location group searches and searches that were limited by availability. We didn't see any unexpected changes in the records that were retrieved or the sort order. However, we were able to retrieve large results sets (up to the 100,000 maximum we had configured) without any hits on performance when compared to the control test server. We also verified that search result counts on the search results page were always accurate. Once we hit the 100,000 maximum, the search results page displayed a count of 100,000+.

Jeff Davis (jdavis-sitka) wrote :

I'll see if Sitka can find some time to test Mike's branch next week.

Galen Charlton (gmc) wrote :

I've tested Mike's branch and pushed a new one, user/gmcharlt/lp1698206_eliminate_staged_search_reviewed, that tidies up an issue with the upgrade script, verifies that there is no regression on bug 1624443, and adds (basic) release notes.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers