While this needs to be reported up-stream to the Pg team, I think we should address it locally as well, in case the fix is a while coming. I see two options, each with a tradeoff:
1) Switch to pgtrm-based gist or gin indexes. We lose the unique check -- so sayeth the pg docs: Only B-tree currently supports unique indexes. -- but we can do that other ways (before trigger, perhaps an exclusion constraint). For example, after "create extension pg_trgm;":
- drop index authority.unique_by_heading_and_thesaurus; create index by_heading_and_thesaurus on authority.record_entry using gist (authority.normalize_heading(marc) gist_trgm_ops) WHERE deleted IS FALSE OR deleted = false;
- drop index authority.by_heading; create index by_heading on authority.record_entry using gist (authority.simple_normalize_heading(marc) gist_trgm_ops) WHERE deleted IS FALSE OR deleted = false;
2) Add two new columns to authority.record_entry to contain the two heading variants we need. Populate them at ingest time in a BEFORE trigger. Create the indexes we want directly on the column data.
(1) has the benefit of using less space, and hewing closer to how things work today -- less code and schema churn.
(2) has the benefit of no new deps (pg_trgm extention) and the the availability of the heading strings for other purposes -- though I can think of none right now -- and arguably simpler schema and query construction.
While this needs to be reported up-stream to the Pg team, I think we should address it locally as well, in case the fix is a while coming. I see two options, each with a tradeoff:
1) Switch to pgtrm-based gist or gin indexes. We lose the unique check -- so sayeth the pg docs: Only B-tree currently supports unique indexes. -- but we can do that other ways (before trigger, perhaps an exclusion constraint). For example, after "create extension pg_trgm;": unique_ by_heading_ and_thesaurus; create index by_heading_ and_thesaurus on authority. record_ entry using gist (authority. normalize_ heading( marc) gist_trgm_ops) WHERE deleted IS FALSE OR deleted = false; by_heading; create index by_heading on authority. record_ entry using gist (authority. simple_ normalize_ heading( marc) gist_trgm_ops) WHERE deleted IS FALSE OR deleted = false;
- drop index authority.
- drop index authority.
2) Add two new columns to authority. record_ entry to contain the two heading variants we need. Populate them at ingest time in a BEFORE trigger. Create the indexes we want directly on the column data.
(1) has the benefit of using less space, and hewing closer to how things work today -- less code and schema churn.
(2) has the benefit of no new deps (pg_trgm extention) and the the availability of the heading strings for other purposes -- though I can think of none right now -- and arguably simpler schema and query construction.
Thoughts?