OpenStack Searchlight

Investigate use of whitespace analyzer for id fields

Bug #1602718 reported by Steve McLellan on 2016-07-13

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Searchlight	New	Medium	Steve McLellan

Bug Description

We currently mark a large number of fields as not_analyzed so that they can be used for exact matches in 'term' queries. In retrospect, it would have made more sense to leave them as analyzed (so that query_string and match queries work more reliably) but change the analyzer so that dashes aren't treated as end-of-token characters [1].

The 'Standard' default analyzer [2] splits on tokens suitable for European languages; there's a whitespace analyzer [3] that tokenizes only on whitespace and is more suitable for our UUID columns; we might combine it with a lower case filter.

This should make searching less confusing since it will abstract some of the indexing details away (right now you can get weird partial matches searching for IDs - try a query string for an id, but change some of the characters).

[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer.html
[2] https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html
[3] https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-whitespace-analyzer.html

Steve McLellan (sjmc7) on 2016-09-11

Changed in searchlight:
importance:	Undecided → Medium

Steve McLellan (sjmc7) on 2016-09-15

Changed in searchlight:
assignee:	nobody → Steve McLellan (sjmc7)

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.