possible optimization for evergreen.ranked_volumes database function

Bug #1234845 reported by Jason Stephenson on 2013-10-03
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evergreen
Medium
Unassigned

Bug Description

From depesz's email to the developer list:

my name is Hubert Lubaczewski (a.k.a. depesz). I'm PostgreSQL DBA from
Poland, that was hired to do review for Evergreen installations, and
suggest changes in database "things" (queries, structure).

For the client I analyzed logs from production Pg instance for ~ 10
days.

During this time, the single most time consuming query (summarized time
for all instances of the query, with different parameters) was:

SELECT * FROM unapi.bre ( ... ) AS "unapi.bre";

I profiled this function, and found that in my test case most of the
time (2.04s out of 2.06s, so ~ 99%) was spent in call to
unapi.holdings_xml() function.

When I profiled this function, I found that most of the time (sorry,
don't have the number now with me) was spent in call to
evergreen.ranked_volumes() function.

At this moment in my research something changed on the server I was
testing on, and all subsequent times were ~ 4-5 times lower, but the
ratios were more or less the same.

Anyway - call to evergreen.ranked_volumes() showed repeatable time (with
full caches/buffers) of ~ 380ms.

I modified the function by:
1. inlining actor.org_unit_descendants(?, ?)
2. inlining evergreen.rank_ou(?, ?, ?)
3. extracting depth calculation to separate call
4. switched to plpgsql (which gives me ability to use variables)
5. removed evergreen.rank_ou() and evergreen.rank_cp_status() from
   select clause - these are still in WINDOW definition, but they
   weren't used in the SELECT, so it's better to remove from there.
6. in passing renamed arguments to avoid name clash (argument depth vs.
   field depth)
7. in passing changed usage of $* to access parameters to using named
   parameters, for readability.

New function did the same work in ~ 18ms.

Now - after I finished my tests, I was unfortunately not able to repeat
slow performance of unapi.bre(), but I think that the optimization of
evergreen.ranked_volumes() is good on its own.

If you'll agree, I would appreciate modifying the function in project.
If I should do/provide something else, please let me know.

Best regards,

depesz

NOTE: I (Jason Stephenson) have depesz's permission to create a git branch of his function changes with his name and email as the author. We both think this will get more attention if there is a branch and an associated Launchpad bug.

description: updated
Jason Stephenson (jstephenson) wrote :

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/dyrcona/lp1234845_ranked_volumes

I'll add a pullrequest tag when I've had a chance to look at the above and signed off on it, myself.

Ben Shum (bshum) on 2013-11-06
Changed in evergreen:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 2.next
Kathy Lussier (klussier) wrote :

This code has been on Jason's development server for well over a month now without any problems. I gave it closer scrutiny both on his server (running master) as well as another test server running 2.4.

My understanding is that unapi is used in the display for the search results page to verify that copies were retrieved in the expected sort order (preferred library -> search library name -> call number label -> availability) and that visibility checks were applied correctly. I did this search the consortium, individual branches and as a logged-in user with a preferred search library.

Functionally, it appears to be working as expected. I didn't do any before and after timed searches to note whether there was an improvement in performance, but there wasn't a notable decrease either.

Dan Wells (dbw2) wrote :

Added pullrequest to help move this along.

tags: added: pullrequest
Changed in evergreen:
milestone: 2.6.0-alpha1 → 2.6.0-beta1
Mike Rylander (mrylander) wrote :

I've converted this back to an SQL-language function to reduce overhead and allow the possibility of planning improvements, as well as avoid a little schema churn caused by parameter renaming. It is meant to keep all the improvements from Depesz, and add a little in the way of a ROWS hint for the common case of returning only a few rows instead of the default 1000. See:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/miker/lp1234845_ranked_volumes

Mike Rylander (mrylander) wrote :

I'll rebase this in light of the recent changes to master. Updates soon!

Dan Wells (dbw2) on 2014-02-27
Changed in evergreen:
milestone: 2.6.0-beta1 → 2.6.0-rc1
Kathy Lussier (klussier) wrote :

Thanks Mike! I've done the same testing on this branch. The sort order for items on the search results page appears to be correct and the it looks like the proper visibility checks are being applied. I've signed off on this branch at

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/kmlussier/lp1234845_ranked_volumes-rebased

Kathy Lussier (klussier) on 2014-03-11
tags: added: signedoff
Changed in evergreen:
milestone: 2.6.0-rc1 → 2.next
Ben Shum (bshum) wrote :

Going to review this, but I'd like to get sign-off / DCO (http://wiki.evergreen-ils.org/doku.php?id=dev:standing_dco) from depesz and Dyrcona who started work on this, further refined with eeevil and signed off by kmlussier. Just making sure we get all our ducks in a row before we commit it.

Sounds intriguing!

Ben Shum (bshum) wrote :

Excellent, updated the commit with all the notes and this has been pushed to master for 2.7

Thanks everyone!

Changed in evergreen:
milestone: 2.next → 2.7.0-alpha1
status: Triaged → Fix Committed
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers