title sort extremely slow in certain cases in 2.0

Bug #788365 reported by Dan Wells
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
High
Mike Rylander

Bug Description

We recently discovered that sorting by title was extremely slow under some circumstances on our 2.0 system. While more testing would be needed to determine what those circumstances are, at least one actual search query (run directly in psql) took over three minutes to complete (I stopped timing after 3 minutes, but it did eventually complete).

While it is probably possible to convince Postgresql to use a better query plan of some sort, I decided the simpler route would be to attack the slow point (the 'tnf' selection from mfr) directly with a new targeted index. Doing the following:

CREATE INDEX metabib_full_rec_tnf_idx ON metabib.real_full_rec (record, tag, subfield) WHERE tag = 'tnf';

reduced the query time to mere seconds with very minimal bloat.

While it looks like 2.1+ might handle 'tnf' differently, I believe adding something along these lines to at least 2.0.x is a pretty simple cure to a serious problem.

Revision history for this message
Dan Scott (denials) wrote :

Hi Dan - can you please include the actual search query that you used in the bug report?

Revision history for this message
Dan Wells (dbw2) wrote :

Sure, here is the offending query (the meat of a keyword search for 'great white'):

SELECT m.source AS id,
                ARRAY_ACCUM(DISTINCT m.source) AS records,
                (AVG(
                        (rank(x8fb5140_keyword.index_vector, x8fb5140_keyword.tsq) * x8fb5140_keyword.weight)
                ) * COALESCE( NULLIF( FIRST(mrd.item_lang) = $_4523$eng$_4523$ , FALSE )::INT * 5, 1))::NUMERIC AS rel,
                FIRST((SELECT frt.value FROM metabib.full_rec frt WHERE frt.record = m.source AND frt.tag = 'tnf' AND frt.subfield = 'a' LIMIT 1)) AS rank,
                FIRST(mrd.date1) AS tie_break
          FROM metabib.metarecord_source_map m
                JOIN metabib.rec_descriptor mrd ON (m.source = mrd.record)
                LEFT JOIN (
                        SELECT fe.*, fe_weight.weight, x.tsq /* search */
                          FROM metabib.keyword_field_entry AS fe
                                JOIN config.metabib_field AS fe_weight ON (fe_weight.id = fe.field)
                                JOIN (SELECT
                                to_tsquery('keyword', COALESCE(NULLIF( '(' || btrim(regexp_replace(split_date_range(naco_normalize($_4523$great$_4523$)),E'(?:\\s+|:)','&','g'),'&|') || ')', '()'), ''))&&
                                to_tsquery('keyword', COALESCE(NULLIF( '(' || btrim(regexp_replace(split_date_range(naco_normalize($_4523$white$_4523$)),E'(?:\\s+|:)','&','g'),'&|') || ')', '()'), '')) AS tsq ) AS x ON (fe.index_vector @@ x.tsq)
                ) AS x8fb5140_keyword ON (m.source = x8fb5140_keyword.source)
          WHERE 1=1
                AND (x8fb5140_keyword.id IS NOT NULL)
          GROUP BY 1
          ORDER BY 4 ASC NULLS LAST, 5 DESC NULLS LAST, 3 DESC
          LIMIT 10000;

The search only returns 960 rows, so not really a large set at all. Of course you may need to substitute some different terms for your catalog and see what happens. I am very interested to hear if anyone else can trigger this slowness.

Dan

Revision history for this message
Dan Wells (dbw2) wrote :

Sorry, also meant to note that this is on PG 8.4

Revision history for this message
Mike Rylander (mrylander) wrote :

I'm happy to push this change through ... is there a branch to add this already, or should I just create one?

Revision history for this message
Dan Wells (dbw2) wrote :

Thanks, Mike. No branch exists yet. I wasn't really sure if this idea had legs. I'd be happy to create one, but for something this small in code, if it looks good to you, I'd support you just creating/pushing it through.

Revision history for this message
Mike Rylander (mrylander) wrote :

Will do.

Revision history for this message
Mike Rylander (mrylander) wrote :

Done, with a reservation of 0561 in master and 2.1 to avoid confusion.

Changed in evergreen:
status: New → Fix Committed
assignee: nobody → Mike Rylander (mrylander)
Ben Shum (bshum)
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.