Evergreen

Bug #788379
Comment #10

Comment 10 for bug 788379

Revision history for this message

Galen Charlton (gmc) wrote on 2011-06-10:

#10

A small optimization relevant in the 2.0.6 that George is running: the staff-side copy visibility check includes the following query:

            PERFORM 1
              FROM asset.call_number cn
                    JOIN asset.copy cp ON (cp.call_number = cn.id)
                    JOIN actor.org_unit a ON (cp.circ_lib = a.id)
              WHERE NOT cn.deleted
                    AND NOT cp.deleted
                    AND cp.circ_lib IN ( SELECT * FROM search.explode_array( search_org_list ) )
                    AND cn.record IN ( SELECT * FROM search.explode_array( core_result.records ) )
              LIMIT 1;

Note that joining actor.org_unit is unnecessary, but it also has a bad effect on the query plan. Compare:

evergreen=# explain SELECT 1
evergreen-# FROM asset.call_number cn
evergreen-# JOIN asset.copy cp ON (cp.call_number = cn.id)
evergreen-# JOIN actor.org_unit a ON (cp.circ_lib = a.id)
evergreen-# WHERE NOT cn.deleted
evergreen-# AND NOT cp.deleted
evergreen-# AND cp.circ_lib IN ( SELECT * FROM search.explode_array( ARRAY[1, 2] ) )
evergreen-# AND cn.record IN ( SELECT * FROM search.explode_array( ARRAY[2, 3] ) )
evergreen-# LIMIT 1;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Limit (cost=20.88..82.78 rows=1 width=0)
   -> Nested Loop Semi Join (cost=20.88..1569722.78 rows=25360 width=0)
         Join Filter: (cn.record = ('{2,3}'::integer[])[s.s])
         -> Nested Loop (cost=20.88..301722.78 rows=50720 width=8)
               -> Hash Join (cost=20.88..25198.05 rows=50720 width=8)
                     Hash Cond: (cp.circ_lib = a.id)
                     -> Append (cost=0.00..21833.43 rows=751401 width=12)
                           -> Seq Scan on copy cp (cost=0.00..21821.73 rows=751316 width=12)
                                 Filter: (NOT deleted)
                           -> Seq Scan on unit cp (cost=0.00..11.70 rows=85 width=12)
                                 Filter: (NOT deleted)
                     -> Hash (cost=20.71..20.71 rows=14 width=8)
                           -> Hash Join (cost=16.61..20.71 rows=14 width=8)
                                 Hash Cond: (('{1,2}'::integer[])[s.s] = a.id)
                                 -> HashAggregate (cost=15.00..17.00 rows=200 width=4)
                                       -> Function Scan on generate_series s (cost=0.00..12.50 rows=1000 width=4)
                                 -> Hash (cost=1.27..1.27 rows=27 width=4)
                                       -> Seq Scan on org_unit a (cost=0.00..1.27 rows=27 width=4)
               -> Index Scan using call_number_pkey on call_number cn (cost=0.00..5.44 rows=1 width=16)
                     Index Cond: (cn.id = cp.call_number)
                     Filter: (NOT cn.deleted)
         -> Function Scan on generate_series s (cost=0.00..12.50 rows=1000 width=4)
(22 rows)

with

evergreen=# explain SELECT 1
evergreen-# FROM asset.call_number cn
evergreen-# JOIN asset.copy cp ON (cp.call_number = cn.id)
evergreen-# WHERE NOT cn.deleted
evergreen-# AND NOT cp.deleted
evergreen-# AND cp.circ_lib IN ( SELECT * FROM search.explode_array( ARRAY[1, 2] ) )
evergreen-# AND cn.record IN ( SELECT * FROM search.explode_array( ARRAY[2, 3] ) )
evergreen-# LIMIT 1;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=15.00..66.58 rows=1 width=0)
   -> Nested Loop Semi Join (cost=15.00..9688659.75 rows=187850 width=0)
         Join Filter: (cp.circ_lib = ('{1,2}'::integer[])[s.s])
         -> Nested Loop (cost=15.00..4138521.12 rows=375701 width=4)
               Join Filter: (cn.id = cp.call_number)
               -> Nested Loop (cost=15.00..4511.81 rows=300396 width=8)
                     -> HashAggregate (cost=15.00..17.00 rows=200 width=4)
                           -> Function Scan on generate_series s (cost=0.00..12.50 rows=1000 width=4)
                     -> Index Scan using asset_call_number_record_idx on call_number cn (cost=0.00..22.41 rows=5 width=16)
                           Index Cond: (cn.record = ('{2,3}'::integer[])[s.s])
                           Filter: (NOT cn.deleted)
               -> Append (cost=0.00..13.70 rows=5 width=12)
                     -> Index Scan using cp_avail_cn_idx on copy cp (cost=0.00..11.43 rows=4 width=12)
                           Index Cond: (cp.call_number = cn.id)
                           Filter: (NOT cp.deleted)
                     -> Index Scan using unit_avail_cn_idx on unit cp (cost=0.00..2.27 rows=1 width=12)
                           Index Cond: (cp.call_number = cn.id)
                           Filter: (NOT cp.deleted)
         -> Function Scan on generate_series s (cost=0.00..12.50 rows=1000 width=4)
(19 rows)

This is issue is resolved in 2.1+ with the changes to query_parser_fts done as part of the multi-homed support, but I'll push a fix for rel_2_0. Obviously, this micro-optimization, although it is significant in George's particular situation and possibly for other 2.0.x users, doesn't address the "make searching more and more faster" issue in the general case.

A small optimization relevant in the 2.0.6 that George is running: the staff-side copy visibility check includes the following query:

PERFORM 1
              FROM  asset.call_number cn
                    JOIN asset.copy cp ON (cp.call_number = cn.id)
                    JOIN actor.org_unit a ON (cp.circ_lib = a.id)
              WHERE NOT cn.deleted
                    AND NOT cp.deleted
                    AND cp.circ_lib IN ( SELECT * FROM search.explode_array( search_org_list ) )
                    AND cn.record IN ( SELECT * FROM search.explode_array( core_result.records ) )
              LIMIT 1;

Note that joining actor.org_unit is unnecessary, but it also has a bad effect on the query plan.  Compare:

evergreen=# explain SELECT 1
evergreen-#               FROM  asset.call_number cn
evergreen-#                     JOIN asset.copy cp ON (cp.call_number = cn.id)
evergreen-#                     JOIN actor.org_unit a ON (cp.circ_lib = a.id)
evergreen-#               WHERE NOT cn.deleted
evergreen-#                     AND NOT cp.deleted
evergreen-#                     AND cp.circ_lib IN ( SELECT * FROM search.explode_array( ARRAY[1, 2] ) )
evergreen-#                     AND cn.record IN ( SELECT * FROM search.explode_array( ARRAY[2, 3] ) )
evergreen-#               LIMIT 1;
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Limit  (cost=20.88..82.78 rows=1 width=0)
   ->  Nested Loop Semi Join  (cost=20.88..1569722.78 rows=25360 width=0)
         Join Filter: (cn.record = ('{2,3}'::integer[])[s.s])
         ->  Nested Loop  (cost=20.88..301722.78 rows=50720 width=8)
               ->  Hash Join  (cost=20.88..25198.05 rows=50720 width=8)
                     Hash Cond: (cp.circ_lib = a.id)
                     ->  Append  (cost=0.00..21833.43 rows=751401 width=12)
                           ->  Seq Scan on copy cp  (cost=0.00..21821.73 rows=751316 width=12)
                                 Filter: (NOT deleted)
                           ->  Seq Scan on unit cp  (cost=0.00..11.70 rows=85 width=12)
                                 Filter: (NOT deleted)
                     ->  Hash  (cost=20.71..20.71 rows=14 width=8)
                           ->  Hash Join  (cost=16.61..20.71 rows=14 width=8)
                                 Hash Cond: (('{1,2}'::integer[])[s.s] = a.id)
                                 ->  HashAggregate  (cost=15.00..17.00 rows=200 width=4)
                                       ->  Function Scan on generate_series s  (cost=0.00..12.50 rows=1000 width=4)
                                 ->  Hash  (cost=1.27..1.27 rows=27 width=4)
                                       ->  Seq Scan on org_unit a  (cost=0.00..1.27 rows=27 width=4)
               ->  Index Scan using call_number_pkey on call_number cn  (cost=0.00..5.44 rows=1 width=16)
                     Index Cond: (cn.id = cp.call_number)
                     Filter: (NOT cn.deleted)
         ->  Function Scan on generate_series s  (cost=0.00..12.50 rows=1000 width=4)
(22 rows)

with

evergreen=# explain SELECT 1
evergreen-#               FROM  asset.call_number cn
evergreen-#                     JOIN asset.copy cp ON (cp.call_number = cn.id)
evergreen-#               WHERE NOT cn.deleted
evergreen-#                     AND NOT cp.deleted
evergreen-#                     AND cp.circ_lib IN ( SELECT * FROM search.explode_array( ARRAY[1, 2] ) )
evergreen-#                     AND cn.record IN ( SELECT * FROM search.explode_array( ARRAY[2, 3] ) )
evergreen-#               LIMIT 1;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=15.00..66.58 rows=1 width=0)
   ->  Nested Loop Semi Join  (cost=15.00..9688659.75 rows=187850 width=0)
         Join Filter: (cp.circ_lib = ('{1,2}'::integer[])[s.s])
         ->  Nested Loop  (cost=15.00..4138521.12 rows=375701 width=4)
               Join Filter: (cn.id = cp.call_number)
               ->  Nested Loop  (cost=15.00..4511.81 rows=300396 width=8)
                     ->  HashAggregate  (cost=15.00..17.00 rows=200 width=4)
                           ->  Function Scan on generate_series s  (cost=0.00..12.50 rows=1000 width=4)
                     ->  Index Scan using asset_call_number_record_idx on call_number cn  (cost=0.00..22.41 rows=5 width=16)
                           Index Cond: (cn.record = ('{2,3}'::integer[])[s.s])
                           Filter: (NOT cn.deleted)
               ->  Append  (cost=0.00..13.70 rows=5 width=12)
                     ->  Index Scan using cp_avail_cn_idx on copy cp  (cost=0.00..11.43 rows=4 width=12)
                           Index Cond: (cp.call_number = cn.id)
                           Filter: (NOT cp.deleted)
                     ->  Index Scan using unit_avail_cn_idx on unit cp  (cost=0.00..2.27 rows=1 width=12)
                           Index Cond: (cp.call_number = cn.id)
                           Filter: (NOT cp.deleted)
         ->  Function Scan on generate_series s  (cost=0.00..12.50 rows=1000 width=4)
(19 rows)

This is issue is resolved in 2.1+ with the changes to query_parser_fts done as part of the multi-homed support, but I'll push a fix for rel_2_0.  Obviously, this micro-optimization, although it is significant in George's particular situation and possibly for other 2.0.x users, doesn't address the "make searching more and more faster" issue in the general case.