set limit on facets retrieved during search

Bug #1505286 reported by Galen Charlton on 2015-10-12
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Wishlist
Unassigned

Bug Description

It would be desirable to teach open-ils.search.biblio.multiclass.staged and friends how to set a limit on the number of facets it retrieves and caches. Consider a subject search done at a specialized library that may have thousands of records matching a given heading: potential thousands of author and title facets could be retrieved.

Actually retrieving them can significantly bloat open-ils.cstore child memory consumption, caching them increases the chances of undesirable memcached evictions, and retrieving the set can slow down search results retrieval.

I propose a global flag or the like to specify a limit on the number of facets retrieved.

Evergreen master

Mike Rylander (mrylander) wrote :

I heartily second your proposal, Galen. (He said, to the surprise of nobody.)

Galen Charlton (gmc) wrote :

So, thinking more on how the limit should be expressed, I think it should be per facet type. For example, consider a search results set that has 10,000 subject facets having two more more records, and 1 author facet on just one record. Picking the top 1,000 facets among the total set loses the author facet, whereas picking the top 100 (or whatever) facets for each type doesn't drop the author facet.

Mike Rylander (mrylander) wrote :

I agree. I don't see a need for a setting per facet, though. Since there's really no context location for facets, how about a global flag that, if unset, defaults to 100?

Galen Charlton (gmc) wrote :

Indeed, that's what I had a mind: a single global flag that sets a limit on the number of facets retrieved for each facet type.

Galen Charlton (gmc) wrote :

For feedback, the following stored function definition expresses a suitable facets query for record result sets (and a metabib search version is easily written):

CREATE OR REPLACE FUNCTION facets_for_record_set(max_facets INT, ignore_fields INT[], hits BIGINT[])
RETURNS SETOF RECORD AS $$
    SELECT id, value, count FROM (
        SELECT mfae.field AS id,
        mfae.value,
        COUNT(DISTINCT mmrsm.source),
        row_number() OVER (
            PARTITION BY mfae.field ORDER BY COUNT(distinct mmrsm.source) DESC
        ) AS rownum
    FROM metabib.facet_entry mfae
    JOIN metabib.metarecord_source_map mmrsm ON (mfae.source = mmrsm.source)
    WHERE mmrsm.source IN (SELECT * FROM unnest(hits))
    AND mfae.field NOT IN (SELECT * FROM unnest(ignore_fields))
    GROUP by 1, 2
) x WHERE rownum <= max_facets;
$$ LANGUAGE SQL;

Galen Charlton (gmc) wrote :

Three patches are available at the tip of the user/gmcharlt/lp1505286_limit_facet_retrieval branch in the working/Evergreen repository:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/gmcharlt/lp1505286_limit_facet_retrieval

tags: added: pullrequest
Ben Shum (bshum) wrote :

Worked for me in tests. Pushed to master, thanks Galen!

Changed in evergreen:
milestone: none → 2.10-beta
status: New → Fix Committed
Ben Shum (bshum) wrote :

Well, this turns out to only work for PG 9.3 testing (which is what I was using when I tested it originally). Maybe 9.2 but I didn't test that either.

For PG 9.1, the function fails to create leading to errors like:

psql:300.schema.staged_search.sql:449: ERROR: column "hits" does not exist
LINE 12: WHERE mmrsm.source IN (SELECT * FROM unnest(hits))
                                                             ^
psql:300.schema.staged_search.sql:468: ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK

This causes the entire search schema to fail during database create. Which is the source of the failure for the live tests lately since this merge occurred.

So, question then, do we try to find a solution that works with PG 9.1, or is it time we declare PG 9.3 the minimum version for Evergreen 2.10?

According to http://www.postgresql.org/support/versioning/, PG 9.1 is end of life in September 2016 anyways, so this might be the last version of Evergreen to work with it regardless.

Mike Rylander (mrylander) wrote :

The cause can be found in the PG docs: http://www.postgresql.org/docs/9.1/static/sql-createfunction.html

argname
The name of an argument. Some languages (including PL/pgSQL, but currently not SQL) let you use the name in the function body. For other languages the name of an input argument is just extra documentation, so far as the function itself is concerned; but you can use input argument names when calling a function to improve readability (see Section 4.3). In any case, the name of an output argument is significant, because it defines the column name in the result row type. (If you omit the name for an output argument, the system will choose a default column name.)

That changed in 9.2, so we don't have to push for 9.3 just yet, and I'm personally in favor of moving the PG ball forward to 9.2 (at the very least) ASAP.

That said, this can be made to work in 9.1 very easily by replacing ignore_facet_classes with $1 and hits with $2 in the bodies of search.facets_for_record_set() and search.facets_for_metarecord_set().

Dealer's (fixer's) choice, IMO.

Jason Stephenson (jstephenson) wrote :

I'm not opposed to making 9.2+ required for Evergreen 2.10.

Jason Stephenson (jstephenson) wrote :

Having said what I said in my previous comment, I want to add a more nuanced answer.

While I am still not opposed to making 9.2+ a requirement for 2.10, I think we should consider supporting PostgreSQL 9.1 for one more release. The reasons are as follows:

Ubuntu 12.04 ships with 9.1 by default. Ubuntu 12.04 is officially supported by Canonical and the Ubuntu community through April 2017, according to https://wiki.ubuntu.com/Releases

According to https://wiki.debian.org/LTS/, Debian Wheezy is supported through May of 2018. It shipped with PostgreSQL 9.1 as the default, also.

Finally, http://www.postgresql.org/support/versioning/ indicates that PostgreSQL 9.1' s end of life is schedule for September 2016.

Being conservative about it, I would say that we can easily make an argument that we will drop support for Pg 9.1 in September when the PostgreSQL community no longer supports it.

If the consensus is to stay with 9.1, I'd be more than willing to make the simple changes suggested by Mike in comment #9. If the consensus is to drop 9.1 support for 2.10, I'm also happy to go along with that decision.

Galen Charlton (gmc) wrote :

Let's go with 9.1 with the notion that we'll announce a deprecation with the 2.10 release and de-support with 2.11 (which also corresponds well enough with the PostgreSQL project's EOL for that version).

Galen Charlton (gmc) wrote :

I've taken up a patch that Jason wrote and updated master and rel_2_10.

Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers