Statistically generated record ratings

Bug #1549505 reported by Galen Charlton
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Wishlist
Unassigned

Bug Description

For the purpose of supplying non-bibliographic popularity adjustment to the ranking of search results, Equinox, via a project sponsored by MassLNC, proposes to implement a set of statistical modelling algorithms which will identify bibliographic records of particular note based on derivable parameters.

Generally, factors such as to circulation and hold activity, record and item age, and item ownership counts will available for statistical consideration. Each factor will embody a "popularity badge" that the bibliographic record can earn, and each badge will have a 5-point scale, where more points indicates a more popular record. The average of the badge points earned by each record will constitute a "popularity rating". The number and types of badges will break ties for average popularity, and relevance will sort items with like popularity.

A new sort axis of Popularity will be created to sort first on the weighted average popularity of each record, followed by the query-specific relevance available today. A new option will be created in the dropdown that sorts on the combination of "activity metric" (aka badge ranking, aka popularity) first and the existing, stock relevance ranking when those are equal. For instance, given two records that both have a badge ranking of "4.5", they will be sorted in the order of the query relevance ranking that is calculated today as a tie breaker. Those two records will sort above other records with lower badge rankings regardless of what today’s relevance ranking says about them.

In addition, a new sort axis of Popularity and Relevance will be created that augments the normal Relevance sort with a normalized popularity value by multiplying the base relevance by a value between 1 and 2.

Finally, there will continue to be a pure Relevance sort option, which is the version that exists today.

A global flag will allow the selection of the default sort axis.

Full technical specifications can be found here:

http://yeti.esilibrary.com/dev/public/techspecs/popularity-rating.html

Revision history for this message
Galen Charlton (gmc) wrote :

A WIP branch that is feature-complete is available as collab/gmcharlt/lp1549505_wip_popularity_ratings in the working repository.

tags: added: search
Revision history for this message
Galen Charlton (gmc) wrote :

A series of patches implementing this feature is available in the user/gmcharlt/lp1549505_popularity_ratings-signedoff branch in the working/Evergreen repository:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/gmcharlt/lp1549505_popularity_ratings-signedoff

tags: added: pullrequest
Revision history for this message
Jason Stephenson (jstephenson) wrote :

I found a syntax error on line 72 of OpenILS/WWW/EGCatLoader/Record.pm, or rather Perl did.

I have fixed it and pushed to collab/dyrcona/lp1549505_popularity_ratings_signedoff_fix

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/dyrcona/lp1549505_popularity_ratings_signedoff_fix

Revision history for this message
Kathy Lussier (klussier) wrote :

I've tested this code quite extensively on an upgraded database, but I wanted to test it this morning on a new installation. I think we're missing some of the database bits for new installations.

In http://git.evergreen-ils.org/?p=working/Evergreen.git;a=commit;h=831bd1fa896608c2b3b0281be564c1af69c5a846 , we have a database upgrade script to add the new schema, but I don't see anything that adds the schema for new installations.

In http://git.evergreen-ils.org/?p=working/Evergreen.git;a=commit;h=eeee27c6843e8f1f911c0591cc8e1eaa7cc7ca7d, we have an upgrade script to change the qp_search function, but I don't see any changes made to the function taht would be picked up by new installations.

Revision history for this message
Galen Charlton (gmc) wrote :

Whoops, that's indeed a problem. I'll push out a branch fixing that shortly.

Revision history for this message
Galen Charlton (gmc) wrote :

The branch user/gmcharlt/lp1549505_popularity_ratings_signedoff_fix2 in the working repository now exists and:

- is rebased against master
- contains a signoff of Jason's fix
- contains a patch updating the baseline schema for new installations

Revision history for this message
Kathy Lussier (klussier) wrote :

Hi Galen,

I came across a problem with browse searching. The activity metric branch is loaded on a clean Evergreen install at mlnc1.mvlcstaff.org. If a user tries to conduct a browse search, they get an error message. I see the following in the logs:

open-ils.cstore 2016-05-25 11:33:14 [ERR :27841:oils_sql.c:5800:14641878862791917] open-ils.cstore: Error with query [SELECT * FROM metabib.browse( 'title', 'concerto', '1', NULL, 'f', NULL, '10' ) AS "metabib.browse" ;]: 6820851 6820851: ERROR: record "core_result" has no field "badges"
CONTEXT: SQL statement "SELECT core_result.badges"
PL/pgSQL function search.query_parser_fts(integer,integer,text,integer[],integer[],integer,integer,integer,boolean,boolean,boolean,integer) line 320 at assignment
SQL statement "SELECT visible
                    FROM search.query_parser_fts(
                        context_org, NULL, qpfts_query, NULL,
                        context_locations, 0, NULL, NULL, FALSE, staff, FALSE
                    ) qpfts
                    WHERE qpfts.rel IS NULL"

I can't recall if I had done any browse searches back when I was testing this code on an upgraded system.

Revision history for this message
Galen Charlton (gmc) wrote :

:forehead_slap_emoji:

I'll look at this.

Revision history for this message
Galen Charlton (gmc) wrote :

I've pushed a patch fixing this to the tip of the user/gmcharlt/lp1549505_popularity_ratings_signedoff_fix2 branch.

Revision history for this message
Kathy Lussier (klussier) wrote :

Thanks Galen!

I can conduct a browse search now, but then I come across more trouble when a click a browse entry to view results for that entry.

For example, http://mlnc1.mvlcstaff.org/eg/opac/results?blimit=10;qtype=title;bterm=har;locg=1;fi%3Ahas_browse_entry=678%2C31

brings me to an empty results page with a message asking me to enter search terms.

I wasn't quite sure what was most important to share from the logs, so I dumped a big chunk of the log at http://pastebin.com/muDWudYR

Revision history for this message
Galen Charlton (gmc) wrote :

As it might be relevant for that last issue, what version of Pg is that test system running?

Revision history for this message
Kathy Lussier (klussier) wrote :

9.3.13

Revision history for this message
Galen Charlton (gmc) wrote :

I've not been able to reproduce the failure in Pg 9.3 or 9.4 (or 9.1, not that that's relevant for 2.11). Could you grab the full SQL statement ("SELECT * -- bib search: #CD_docume...") from the Pg logs?

Revision history for this message
Kathy Lussier (klussier) wrote :

2016-05-26 16:53:57 EDT STATEMENT: SELECT * -- bib search: #CD_documentLength #CD_meanHarmonic #CD_uniqueWords core_limit(10000) limit(1000) badge_orgs(1) estimation_strategy(inclusion) keyword: has_browse_entry(340,8) depth(0)
                  FROM search.query_parser_fts(
                            1::INT,
                            0::INT,
                            $core_query_27910$
        WITH lang_with AS (SELECT id FROM config.coded_value_map WHERE ctype = 'item_lang' AND code = $_27910$eng$_27910$), pop_with AS (
                    SELECT record,
                            ARRAY_AGG(badge) AS badges,
                            SUM(s.score::NUMERIC*b.weight::NUMERIC)/SUM(b.weight::NUMERIC) AS total_score
                      FROM rating.record_badge_score s
                            JOIN rating.badge b ON (
                                b.id = s.badge
         AND b.scope = ANY ('{1}')) GROUP BY 1)
        SELECT m.source AS id,
                ARRAY[m.source] AS records,
                (AVG(
                  (1)
                )+1 * COALESCE( NULLIF( FIRST(mrv.vlist @> ARRAY[lang_with.id]), FALSE )::INT * 5, 1)) AS rel,
                1.0/((AVG(
                  (1)
                )+1 * COALESCE( NULLIF( FIRST(mrv.vlist @> ARRAY[lang_with.id]), FALSE )::INT * 5, 1)))::NUMERIC AS rank,
                FIRST(pubdate_t.value) AS tie_break,
                STRING_AGG(ARRAY_TO_STRING(pop_with.badges,','),',') AS badges,
                AVG(COALESCE(pop_with.total_score::NUMERIC,0.0))::NUMERIC(2,1) AS popularity
          FROM metabib.metarecord_source_map m

              INNER JOIN metabib.browse_entry_def_map mbedm ON (mbedm.source = m.source AND mbedm.entry = 340 AND mbedm.def IN (8))

                INNER JOIN metabib.record_attr_vector_list mrv ON m.source = mrv.source

                LEFT JOIN pop_with ON ( m.source = pop_with.record )
                LEFT JOIN metabib.record_sorter pubdate_t ON m.source = pubdate_t.source AND attr = 'pubdate'
                ,lang_with
          WHERE 1=1
                AND (
                  TRUE
                )
          GROUP BY 1
          ORDER BY 4 ASC NULLS LAST, 5 DESC NULLS LAST, 3 DESC
          LIMIT 10000
        $core_query_27910$::TEXT,
                            $${}$$::INT[],
                            $${}$$::INT[],
                            NULL::INT,
                            1000::INT,
                            10000::INT,
                            'f'::BOOL,
                            'f'::BOOL,
                            'f'::BOOL,
                            NULL::INT
                        );

Revision history for this message
Mike Rylander (mrylander) wrote :

Kathy,

I was able to reproduce this by applying the fixed function and then immediately attempting a search without restarting services. It looks like it's a postgres bug having to do with cached query plans. Those only last for the length of a database session, so a restart of services (reconnecting to the database) fixes it.

Can you confirm that works for you?

Thanks!

Revision history for this message
Mike Rylander (mrylander) wrote :

Belay that... we need a cast, because now search fails for me.

More soon...

Revision history for this message
Mike Rylander (mrylander) wrote :

It's not a Pg bug, we just need to address this with proper casting. I'll be looking at that further tomorrow. The issue is cached function plans (not query plans) not liking differing core query shapes for browse and search.

Revision history for this message
Mike Rylander (mrylander) wrote :

OK! I believe this is all repaired now. It was not related to popularity, per se, but looks like it could have happened ever since browse was introduced, and perhaps before.

It comes down to a lack of casting on dummy relevance ranking values when a search has no terms (just filters and such, as is the case with browse links). Postgres' AVG() function treats incoming literals as floating point values rather than NUMERIC. Because of this, we need to cast the rel column on the search core query to NUMERIC in all cases, so it aligns searches.

See the branch below for details, wherein I have sprinkled more casts about just to be sure we're always getting the right data type when we need it.

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/lp1549505_popularity_ratings_signedoff_fix2

Revision history for this message
Galen Charlton (gmc) wrote :

And to expand on Mike's explanation: the issue occurs when a given Pg backend processes a normal bib search, thereby caching a compiled version of query_parser_fts with one shape of the core result query output, then subsequently processes (say) a has_browse_entry() search.

Revision history for this message
Kathy Lussier (klussier) wrote :

We're getting a merge conflict on this branch at the moment. Galen or Mike, could you resolve it?

Thanks!
Kathy

Revision history for this message
Mike Rylander (mrylander) wrote :
Revision history for this message
Kathy Lussier (klussier) wrote :

I came across an issue with the display of the admin interface. Since I didn't see this problem in my earlier testing, I wonder if it's a problem with the new AngularJS.

I've attached a screenshot of the admin interface. Aside from the problem with the menu display, I configured one or two badges on this system via the database, but the badges are inaccessible from the client interface.

Revision history for this message
Kathy Lussier (klussier) wrote :

Adding two commits:

1. There was a stray semicolon in the PgTap test that was causing the test to fail. I've removed it.

2. I changed the value set for search.max_popularity_importance_multiplier in the seed date. With the previous value of 2.0, users would see very little difference between a "Most Popular" search and a "Popularity-Adjusted Relevance" search. In my testing, I found that a low value of 1.1 or 1.2 was enough to bump popular items to the top of results. I used 1.1 as a starting point.

I didn't add a signoff because we still have the issue with the administration interface. Once that is fixed, I think this branch will be ready to merge.

Revision history for this message
Galen Charlton (gmc) wrote :

I've added a commit updating the Angular modal widget; the current branch is now user/gmcharlt/lp1549505_popularity_july_rebase

Revision history for this message
Kathy Lussier (klussier) wrote :

Everything looks good on my end. I've signed off on the branch and merged it to master.

We're really looking forward to this exciting new feature in 2.11!

Kathy

Changed in evergreen:
status: New → Fix Committed
Changed in evergreen:
milestone: 2.next → 2.11-alpha
milestone: 2.11-alpha → 2.11-beta
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.