Decrease the indexed text size on pgtextindex
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
KARL4 |
Fix Released
|
Medium
|
Paul Everitt |
Bug Description
Prefix searches are slow, for a number of reasons related to relevance ranking, which requires a table scan of the tsvector for any matching docids.
When the searchable text and thus tsvector size is over the TOAST threshold, this means the tsvector gets gzip'd and written to disk. Since these are documents with many words, they are likely to match on many prefix searches.
Thus, just about any prefix search is going to match all of those big entries and trigger a TOAST read/decompress. Some searches cause 500 MB to get read and decompressed.
In 2012, Shane did this analysis:
"""
select min(textlen), count(1), sum(textlen) from (select length(
min | count | sum
-------
5 | 4659 | 37390
10 | 41436 | 1765747
100 | 41345 | 16814598
1000 | 49416 | 188352636
10000 | 19041 | 566536929
100009 | 879 | 120819463
This says OSF has:
- 4659 documents with text_vectors containing 1-9 characters
- 41436 documents with text_vectors containing 10-99 characters
Postgres can comfortably handle text vectors up to around 10,000 characters in length, but after that it moves the vectors to TOAST tables. So OSF has around 20,000 documents getting TOASTed.
"""
For this ticket:
- Get some measurements beforehand
- Decrease the maxlen in karl.textindex.
- reindex
- Re-do the measurements
Changed in karl4: | |
status: | New → Fix Released |
Some values beforehand from production, starting with counts at various text_vector sizes.
osf-karl=# select count(docid) from pgtextindex;
count
--------
365338
(1 row)
osf-karl=# select count(docid) from pgtextindex where length( text_vector: :varchar) > 9000;
count
-------
33214
(1 row) text_vector: :varchar) > 10000;
osf-karl=# select count(docid) from pgtextindex where length(
count
-------
30520
(1 row)
osf-karl=# select count(docid) from pgtextindex where length( text_vector: :varchar) > 200000;
count
-------
112
(1 row)
osf-karl=# select count(docid) from pgtextindex where length( text_vector: :varchar) > 300000;
count
-------
20
(1 row)
osf-karl=# select count(docid) from pgtextindex where length( text_vector: :varchar) > 400000;
count
-------
5
(1 row)