Evergreen

Bug #788379
Comment #2

Comment 2 for bug 788379

Revision history for this message

Dan Scott (denials) wrote on 2011-05-26:

Ben, the continuation of the spinning search logo (I believe this happens when the OpenSRF call times out (default: 60 seconds) should be treated as a separate issue.

Let's focus on the core issue: single broad search terms being unacceptably slow. Having just upgraded to 2.0, such searches are indeed much slower for Conifer than when we were running 1.6.1. However, in sanity-checking our configuration, I realized that my carefully tuned PostgreSQL config file was sitting in a different directory, and our 48-core/48 GB RAM database server was running with the insane defaults for PostgreSQL (maintenance_work_mem: 16MB / work_mem: 1M / shared_buffers: 24MB / checkpoint_segments: 3 / effective_cache_size: 128MB), which is obviously handicapping our system at the moment.

That said, let's take a look at one simple change to the default configuration that can be made - switching to GIN indexes. We're running a PostgreSQL 9.0 server so my confidence level is a lot higher than in the early days of GIN indexes. I created a GIN index on one table - the obvious candidate metabib.keyword_field_entry - and here's a comparison:

EXPLAIN ANALYZE SELECT COUNT(*) FROM metabib.keyword_field_entry WHERE index_vector @@ to_tsquery('public.keyword', 'canada');

was 38 seconds with a GIST index, and 9.5 seconds with a GIN index. "Canada" is probably a worst-case scenrario for us, with 420K matching rows.

9.5 seconds is still too long to simply return the number of matching rows for a text query, but shaving 28.5 seconds off the base query is a good start.

Trying one of your examples:

EXPLAIN ANALYZE SELECT COUNT(*) FROM metabib.keyword_field_entry WHERE index_vector @@ to_tsquery('public.keyword', 'europe');

Tells me 1.191 seconds using a GIN index (60K matching rows).

More GIN - "fish" is 95 ms for 11K matching rows, "the" is 7.5 seconds for 1.9 M rows; that's pretty acceptable to me.

Given the current crazy configuration of our database server, I'll report back on the results after PostgreSQL is actually allowed to use its shared_buffers and work_mem to augment performance. Hopefully I'll have a span of down time sometime in the next 12 hours to apply the new configuration and will be able to report back shortly.

Of course, here we're simply dealing with the core full-text search matching; some significant delays come from ranking and limiting within the search routine.