Investigate use of whitespace analyzer for id fields

Bug #1602718 reported by Steve McLellan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Searchlight
New
Medium
Steve McLellan

Bug Description

We currently mark a large number of fields as not_analyzed so that they can be used for exact matches in 'term' queries. In retrospect, it would have made more sense to leave them as analyzed (so that query_string and match queries work more reliably) but change the analyzer so that dashes aren't treated as end-of-token characters [1].

The 'Standard' default analyzer [2] splits on tokens suitable for European languages; there's a whitespace analyzer [3] that tokenizes only on whitespace and is more suitable for our UUID columns; we might combine it with a lower case filter.

This should make searching less confusing since it will abstract some of the indexing details away (right now you can get weird partial matches searching for IDs - try a query string for an id, but change some of the characters).

[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer.html
[2] https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html
[3] https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-whitespace-analyzer.html

Steve McLellan (sjmc7)
Changed in searchlight:
importance: Undecided → Medium
Steve McLellan (sjmc7)
Changed in searchlight:
assignee: nobody → Steve McLellan (sjmc7)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.