In-db unAPI has little control over subobject inclusion

Reported by Mike Rylander on 2011-12-20
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
High
Unassigned

Bug Description

Currently, the in-db unAPI stored procedures have no control over paging of subobjects. IOW, one cannot limit the number of, say, call numbers to fetch when pulling a record.

In order to per-subobject limit and offset, we move the limit and offset params to an HSTORE, allowing each subobject type to have its own limit and offset.

Also included in this branch, at the request of Dan Scott, is the branch at bug #901976

https://bugs.launchpad.net/evergreen/+bug/901976

Thomas Berezansky (tsbere) wrote :

Ummm....."this branch" referring to what branch?

I am assuming you mean collab/miker/unapi2_subobject_improvement but I can't be certain, not seeing a branch listed.

Dan Scott (denials) wrote :

Thomas - yes, based on IRC, that's the branch in question.

Note that I pushed a few commits onto the branch (it's collab, fair game right?) to resolve some problems I found when testing it out:

  * 47d3516f579f3d925dd06272ab807bf034971f49 - Fix type conflict in unapi.circ - the branch wouldn't work at all without this.

  * 1d350df1519aada0557e2fa109591e1833eeef9e - Enable TPAC to display records via HSTORE-based unapi (note that on reflection this "fix" is wrong, in that it does enable records to be displayed but the limit gets confused with the flesh depth)

   * ec3f67f59999c423feff973424a9678838e3ffa5 - unapi: avoid deleted call numbers, copies, and sunits (the rewrite introduced a regression when dealing with deleted objects)

In addition, given that the unapi rewrite branch will introduce conflicts with some of my other pre-existing pullrequest feature branches, I had merged some of my branches on top of this branch in the hopes of avoiding conflicts.

Jason Stephenson (jstephenson) wrote :

When searching TPac with this branch applied, searches always turn up 0 results.

I get the following in open-ils.search_stderr.log:

Caught error from 'run' method: Can't use an undefined value as an ARRAY reference at /usr/local/share/perl/5.14.2/OpenILS/Application/Search/Biblio.pm line 1273.

open-ils.supercat_stderr.log reports the following:

Caught error from 'run' method: Exception: OpenSRF::EX::Session 2011-12-28T10:23:11 OpenSRF::Application /usr/local/share/perl/5.14.2/OpenSRF/Application.pm:209 Session Error: <email address hidden>/open-ils.cstore IS NOT CONNECTED TO THE NETWORK!!!

It might just a result of the biblio search error or it might be cstore crashing. I didn't actually verify if cstore crashed. (Someone wants to use my dev image for TPac testing, so I had to back this branch out.)

Jason Stephenson (jstephenson) wrote :

Information from more testing:

[2011-12-28 13:50:05] open-ils.search [INFO:6255:Biblio.pm:910:] compiled search
 is {"estimation_strategy":"inclusion","skip_check":0,"limit":10,"check_limit":"
1000","core_limit":10000,"offset":0,"searches":{"author":{"term":"chris martenso
n depth(0)"}}}
[2011-12-28 13:50:05] open-ils.search [INFO:6255:Transport.pm:161:] Message proc
essing duration: 0.619
open-ils.cstore 2011-12-28 13:50:05 [INFO:6248:osrf_application.c:1040:] CALL: o
pen-ils.cstore open-ils.cstore.json_query {"from":["unapi.bre","1290277","marcxm
l","record","{holdings_xml,mra,acp,acnp,acns,bmp}","MVLC",0,"acp=>5"]}
open-ils.cstore 2011-12-28 13:50:05 [ERR :6248:oils_sql.c:5479:] open-ils.cstore
: Error with query [SELECT * FROM unapi.bre( '1290277', 'marcxml', 'record', '{holdings_xml,mra,acp,acnp,acns,bmp}', 'MVLC', '0', 'acp=>5' ) AS "unapi.bre" ;]: 0 ERROR: invalid input syntax for integer: "acp=>5"
LINE 1: ...oldings_xml,mra,acp,acnp,acns,bmp}', 'MVLC', '0', 'acp=>5' )...

Jason Stephenson (jstephenson) wrote :

After making sure I've updated the scema, that error goes away.

I have these in open-ils.search_stderr.log

Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl/5.14.2/OpenILS/Application/Search/Biblio.pm line 1540.
Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl/5.14.2/OpenILS/Application/Search/Biblio.pm line 1313.

Still no search results.

Dan Scott (denials) wrote :

As noted in IRC, on current master if I:
 * git merge working/user/dbs/tpac_user_ou_pref
 * git merge working/collab/miker/unapi2_subobject_improvement
  * rebuild database schema
  * load Open-ILS/tests/concerto.sql
  * osrf_ctl.sh -a start_all
  * start Apache
  * and search

... then I get results in both JSPAC and TPAC. Jason mentioned having merged other branches, so there likely are conflicts at some integration point, but in and of itself this branch begins and ends with working code (although as noted above, the limit fix needs improvement. Also, the cut-over to using unapi for record details display undoubtedly needs more work).

Jason Stephenson (jstephenson) wrote :

I got this to work by changing the order of merges on a fresh checkout of master.

Applying the branches in the order mentioned by Dan in the previous comment, then applying my other branches after leads to a working setup. I thought git was smarter than that.....

I'll sign off on this branch, too, since I'm told it is working by my tester.

Jason Stephenson (jstephenson) wrote :

Since this branch does rebase cleanly on master. I'm going to leave the rebase to someone who knows more about what this branch is supposed to do. After its rebased, I'll be happy to check it out again and give it a test run. It seems to work so far on my development machine.

Dan Scott (denials) wrote :

Jason: I pushed a rebased version to user/dbs/unapi_improve_limit in the working repo, if you care to try it out.

For an upgrade script, I suspect we could pretty much reissue the entire Open-ILS/src/sql/Pg/990.schema.unapi.sql file. If we we're not comfortable dropping an entire schema, then dropping all of the functions with the old INT-based slimit/soffset arguments and reissuing all of the CREATE OR REPLACE FUNCTION statements should do the trick.

Dan Scott (denials) wrote :

Added another commit to limit both copies and call numbers to 5 in OpenILS::WWW::EGCatLoader::Util to address (somewhat) the hundreds of copies that could be returned if you had, say, 50 call numbers in scope on a given bib.

Rebased against current master and pushed once again to user/dbs/unapi_improve_limit

Dan Scott (denials) wrote :

Pushed another commit onto the pile. This one includes library ranking for holdings, specifically volumes and copies, such that copies will be returned in ordered preference as follows:
  1. circ_lib matches the (new parameter) preferred library;
  2. circ_lib matches the search_library or its descendants, in order of depth
  3. All other copies in scope but not matching either the circ_lib or the search_library/descendants are ordered by library name

This also creates new function, evergreen.rank_ou(), for the purpose of generating a ranking value for the copies.

Dan Scott (denials) wrote :

But wait! There's more! First, rebased against current master (now that copy location groups are in the mix).

Now the TPAC search results know about ranking results by search library & preferred library, pass the preferred library (if any) to unapi so that we can have nicely sorted search results.

user/dbs/unapi_improve_limit in working.

Dan Scott (denials) wrote :

And another commit, this time for sorting copies in record details. Use the evergreen.rank_ou() routine to order the results in the copy query.

user/dbs/unapi_improve_limit in working.

Dan Scott (denials) wrote :

And a (final?) commit, adding an unwrapped upgrade script, along with an example at the bottom of 990.schema.unapi.sql showing how to invoke the limit / pref_lib options from SQL if one is so inclined.

user/dbs/unapi_improve_limit in working.

Dan Scott (denials) wrote :

Noticed a minor regression in search results (no located URIs were showing up), so rectified that with one more commit - which also shows located URIs from the preferred library (if any) even if the search scope has been changed to a consortial level.

user/dbs/unapi_improve_limit in working.

Kathy Lussier (klussier) wrote :

Hi Dan!

Thanks for improving these displays!

When logging in, if a user selects "stay logged in", exits the catalog at some point, and then returns, the search results and record details display don't seem to be recognizing the presence of the plib parameter.

Kathy

Dan Scott (denials) wrote :

Fixed the upgrade script incompatibility with PostgreSQL 9.0 that Ben Shum and Jason Stephenson ran into, and rebased the branch:

user/dbs/unapi_improve_limit in working

Dan Scott (denials) wrote :

Kathy: Here's what I'm seeing (now that Thomas' patch for leaky org selector has been integrated to master):

1. Log in with "Stay logged in"
2. Change my search preference to BR3
3. Close the browser tab
4. Open a new browser tab at the default HTTP location (http://localhost/eg/opac/home) - the org selector is set to "Consortium"
5. Click "Your Account Login" and I'm redirected to a logged in session at https://localhost/eg/opac/myopac/main - the org selector changes to "BR3"
6. Change the search scope org selector to "Consortium" and launch a search
7. Click "Show more details" and the copies are displayed with BR3 first
8. Click on the title of a record to show record details; while the org selector shows "CONS", in the record details copies are displayed with BR3 first

Does that match what you're seeing? Using the "Stay logged in" function, after coming back to the TPAC I'm getting the org selector set to my preferred branch and search results at the consortial level are displaying my preferred branch's copies first, so that seems to be working...

Kathy Lussier (klussier) wrote :

Dan,

I was skipping step 5 in your process under the premise that a user would likely start by immediately searching the catalog rather than clicking "Your Account Login." However, as I'm looking at this, I see this issue goes beyond the plib not being recognized and is really a case of the catalog not recognizing that the user is logged in until certain actions are performed. For example, the "Your Account Login" button displays instead of the account information that displays for logged-in users. So it really isn't an issue with this branch, but maybe I can submit a separate LP report for it. Not sure if it's a bug or a wishlist item.

Thomas Berezansky (tsbere) wrote :

For the record, the template toolkit opac doesn't recognize logins on non-SSL connections. At all.

So you need to connect to https://blah instead of http://blah to keep a login session going.

We could fix this by removing all "redirect back to http under whatever circumstances" checks and adding in "force redirect to https" code.

Dan Scott (denials) wrote :

The ordering of copies via pref_lib, even when a different search library had been chosen, was introducing a little too much "WTF?" to the user results.

Ergo, a new branch: user/dbs/unapi_improve_limits_and_tpac_display in working. Rebased a little bit to get rid of a few trivial commits, with a new commit on top.

Dan Scott (denials) wrote :

Okay. Feeling way better about this branch now. The log from the last commit sums it up:

    TPAC: Preferred library display - less magic, more functionality

    A boatload of changes that make the "preferred search library" really,
    truly work in TPAC:

    1) Display multiple URIs correctly in detailed results and record
       details. We weren't clearing the hash each time that we added it
       to the list of URIs, and that ended up duplicating the URI rather
       than adding distinct URIs.

    2) Display the preferred search library copy count if it hasn't already
       been displayed. For example, if your preferred search library is BR3
       and your chosen search library is BR1, an additional line with the
       count for BR3 will be displayed. However, if your chosen search
       library is BM1, the count for BR3 will already have been displayed
       so no duplicate line will be displayed.

    3) In record details, give users the ability to jump to their preferred
       library scope via "Show preferred library" beside the preferred
       library copy count, if applicable. If your search library is located
       beneath your pref library in the org hierarchy, then you'll get just
       the "copy depth" link instead.

    4) Untangle the located URIs from physical call numbers in in-database
       unapi. We had been facing the limitation of URIs being limited by
       the maximum number of call numbers, with the result that URIs may
       or may not have been displayed. Now, we take the bold tack of
       returning all URIs that are in scope for both the search library
       and the preferred library. No limiting is currently possible, but in
       the realistic worst case scenario of both search lib and pref lib
       being set to an OU at depth 4, we'll get a maximum of 8 different
       URIs back to display. (More realistically, it's unlikely that a
       different URI will be set for each level of the hierarchy.)

Please see user/dbs/unapi_improve_limits_and_tpac_display in working.

Dan Scott (denials) wrote :

Added another commit to the branch:

TPAC: Take copy availability into account for copy order

    This is most important in the LIMITed list of copies returned to the
    TPAC search results page, but also useful on the record details page.

    Prior to this commit, copies were sorted in order of:

    1. distance from the specified search library
    2. search library name
    3. call number label

    This commit adds "copy status ranking" into the result ordering, such
    that more available copies will be returned first in the list (and thus,
    in the case of search results where call numbers and copies are
    typically limited (currently to 5 of 5), ensuring that the most
    available copies are likely to be seen by users).

    The "availability" ranking for a copy is defined by a new function,
    evergreen.rank_cp_status(), that takes criteria from the
    config.copy_status table and ranks it in three tiers, as follows:

    1. Most available = opac_visible and copy_active are both TRUE
    2. Almost available = holdable and opac_visible are TRUE, but
       copy_active is FALSE
    3. Unavailable = anything else.

    Thus, changing copy availability ranking (such as demoting "On holds
    shelf" from the "most available" status to "almost available"), is
    simply a matter of tweaking the evergreen.rank_cp_status() function.

Still user/dbs/unapi_improve_limits_and_tpac_display in working.

Dan Scott (denials) wrote :
Download full text (5.2 KiB)

Updated the branch with two commits:

1) Apply STABLE qualifier to all CREATE FUNCTION statements in unapi. This helps the planner, as the default VOLATILE can result in additional table scans. Thanks to Thomas Berezansky for the suggestion on one function, which lead us to realize that every unapi function could benefit.

2) Fix duplicate call numbers that were being returned after the addition of the copy status ranking. Moving to a windowing function for the ranking and pushing the core query into a subselect enables us to weed out the duplicates. I had some initial concern about the subselect having a deleterious impact on performance (as the LIMIT / OFFSET get pushed into the outer loop), but testing on a large system with a call number that has > 400 copies attached to it showed good results:

EXPLAIN ANALYZE SELECT ua.id, ua.name, ua.label_sortkey, MIN(ua.rank) AS rank FROM (
                                    SELECT acn.id, aou.name, acn.label_sortkey, RANK() OVER w
                                    FROM asset.call_number acn
                                        JOIN asset.copy acp ON (acn.id = acp.call_number)
                                        JOIN actor.org_unit_descendants( 1 )aou ON (acp.circ_lib = aou.id)
                                    WHERE acn.record = 764717 AND acn.deleted IS FALSE
                                        AND acp.deleted IS FALSE
                                    GROUP BY acn.id, acp.status, aou.name, acn.label_sortkey, aou.id
                                    WINDOW w AS (
                                        PARTITION BY evergreen.rank_ou(aou.id, 1), evergreen.rank_cp_status(acp.status)
                                        ORDER BY evergreen.rank_ou(aou.id, 1) , evergreen.rank_cp_status(acp.status)
                                    )
                                ) AS ua
                                GROUP BY ua.id, ua.name, ua.label_sortkey
                                ORDER BY rank, ua.name, ua.label_sortkey
                                LIMIT 5 OFFSET 0;
                                                                                           QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (cost=65.45..65.46 rows=1 width=80) (actual time=30.448..30.449 rows=1 loops=1)
   -> Sort (cost=65.45..65.46 rows=1 width=80) (actual time=30.446..30.446 rows=1 loops=1)
         Sort Key: (min((rank() OVER (?)))), aou.name, acn.label_sortkey
         Sort Method: quicksort Memory: 25kB
         -> HashAggregate (cost=65.43..65.44 rows=1 width=80) (actual time=30.438..30.439 rows=1 loops=1)
               -> WindowAgg (cost=64.89..65.41 rows=1 width=69) (actual time=30.427..30.431 rows=2 loops=1)
                     -> Sort (cost=64.89..64.89 rows=1 width=69) (actual time=30.420..30.421 rows=2 loops=1)
                           Sort Key: (evergreen.rank_ou(aou.id, 1, NULL::integer)), (evergreen.rank_cp_status(acp.st...

Read more...

Kathy Lussier (klussier) wrote :

I have a couple of issues to report from my testing:

 1. When I click on "Show preferred library" on the record details page, the search scope is changing to the preferred library.

2. The sort for copy availability doesn't seem to be working as expected on the search results page. I don't see that availability is affecting the sort (I've attached a screenshot).

3. I wasn't sure if a change was made to the sort on the record details page, but we're seeing some unexpected sorts there too. I'll attach a screenshot in a separate comment.

4. I'm not sure it is necessary to add (Preferred) after the copy counts on the search results page. If others think it's important to include, we can always customize our catalog to remove it, but I don't think it's needed.

Thanks for your continued work on this Dan!

Kathy Lussier (klussier) wrote :

Adding a screenshot of the odd sorting on the record details page. In this case, a library that should be first in the list based on its name is appearing last. On another record, we had an on-order item from a non-OPAC-visible OU sorting at the top of the list.

Dan Scott (denials) wrote :

1. Changing search library when you click on "Show preferred library" was intentional. How is a user supposed to predict when a change wrought by a click that they make has a temporary effect (and know what the boundaries of that temporariness are) vs. sticky? The change affects the org selector, so they have direct feedback for their action, and they have the choice of either hitting the Back button or selecting a different scope in the org selector if they want to change scope back from their preferred library to a different scope. I think it's weird and unexpected for the scope to simply snap back to whatever it was before they hit "Show preferred library" if they click on any of the other links on the page; I also think it would be weird if "Show preferred library" _didn't_ change the org selector.

2. Hmm. The search results in that screen shot don't really look sorted by copy availability, do they? I'm sure that was working using the concerto set on a clean master database, but I'll try to replicate that copy availability sorting issue again here. Just to check: do your config.copy_status settings match the defaults from master? (I kind of hate trying to debug problems when I don't have access to the same set of data, but as mentioned, I'll try to reproduce here with a clean set of concerto data first and report back).

3. Waaiiit. I thought you wanted availability to be the determining factor for displaying copies. Thus it makes sense to me that the library with only a less-available "on order" copy would display after the available copies.

Do you mean to say that you _always_ want libraries to appear in name order (or wait - isn't it actually in order of proximity to the search library, then by library name within each tier of proximity) and that the availability of the copies within those libraries should only determine the display order of the copies within the bounds of that given library?

Also... I offered up a fix for ensuring that copies from non-opac-visible-OUs stay properly hidden earlier today (already tested and committed today). I'm having a hard time believing that that "on order" copy was at the top in that case, unless all of the other copies were checked out.

4. I strongly believe that it's important to include "(Preferred)" or "(Preferred library)" to give a hint to the user _why_ the copy counts for that library, which is outside their chosen scope, is being displayed. Otherwise, it's throwing more guessing games at the user as they attempt to build a mental model of how this thing is constructed.

Dan Scott (denials) wrote :

On 2) search results disorder - confirmed. *sigh* Time for me to go looking for a missed commit.

Dan Scott (denials) on 2012-03-20
tags: removed: pullrequest
Dan Scott (denials) wrote :

Ah. Aha. For search results availability ordering, "Checked out" has a config.copy_status setting of copy_active = TRUE and opac_visible = TRUE, which puts it in the "Most available" category by the criteria I listed a few comments above.

I'll add special logic to deal with "Checked out" to push it into the "less available" category. Ultimately we should probably have a separate branch extend config.copy_status and handle ranking directly as an additional column; but there's long-established practice of using the ccs IDs directly.

Dan Scott (denials) wrote :

AND YET ANOTHER COMMIT.

Sort copy availability within call numbers; sort call number availability within libraries; sort libraries by distance from selected search library.

Fixed copy availability sorting in search results - tricksy little bugs just happened to be masked by my previous test data. Also refactored in-db unapi somewhat to cut down on code duplication and make it easier to test subsets of the code.

"Checked out" has been bumped down the the second tier of availability, which is more in keeping with expectations.

Record details availability sorting has been fixed as noted above; all copies of a given call number are kept together, but more available copies are shown first.

Added a fix for the fix for opac-invisible OUs; we had successfully prevented invisible OUs from showing up in the public TPAC in record details, but inadvertently ended up showing all opac-visible OUs in record details.

user/dbs/unapi_improve_limits_and_tpac_display in working. (I call it "unapi_zombie" now locally, because it won't die)

Dan Scott (denials) wrote :

Another day, another rebase and commit.

I've squashed most of my commits together, as the meandering of the code won't help someone who's trying to follow the evolution of this later on.

Tonight's commits add the following exciting properties:

1) (64d55f94a) A preferred library, if set and not identical to the search library, gets its copies (if any) pushed to the top of the list of copies in search results and record details. Only that library's copies get moved up; children of the preferred library are not affected, so as to prevent preferred library copies from preventing any of the chosen search library copies from being displayed in search results.

2) (0ff2680d08) A fix for the incorrect syntax for filtering non-opac-visible org units that ended up displaying all org units in copy results. I've kept this as a separate commit in case it gets applied independently in the hopes that it makes merging easier...

Note that sorting of call numbers by most available copy status is restricted to search results, as the number of copies to display is limited. Search result copies are sorted by: preferred library, search library depth, library name, call number copy availability, call number label, copy availability.

Record detail copies are sorted by: preferred library, search library depth, library name, call number label, copy availability.

user/dbs/unapi_improve_limits_and_tpac_display in working

Dan Scott (denials) wrote :

Another commit, nice and early tonight.

This one pares down the preferred library floating exuberance to only float preferred library copies to the top of the copy display if the search scope is consortial. This also enables us to get rid of the actor.org_unit_descendants_pref_lib() function.

user/dbs/unapi_improve_limits_and_tpac_display in working

tags: added: pullrequest
Changed in evergreen:
importance: Undecided → High
milestone: none → 2.2.0beta1
Jason Stephenson (jstephenson) wrote :

kmlussier says its working, and that is good enough for me.

Signed off:

collab/dyrcona/unapi_improve_limits_and_tpac_display in the working repo.

Dan Scott (denials) wrote :

Argh, I failed to actually push that commit last night. Rebased to current master and did so now in working:

user/dbs/unapi_improve_limits_and_tpac_display

Mike Rylander (mrylander) wrote :

Picked into master.

Changed in evergreen:
status: New → Fix Committed
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers