So, ftq('sqlobject.select') generates a reasonable expression -- but the
full text index stores 'sqlobject.select' instead of two words 'sqlobject'
and 'select'.
The query below works though:
select bug from bugtaskflat where fti @@ 'sqlobject.select';
bug
-----
16
17
A search for "community-contributions.py" has the same problem: The index
stores the complete word, but:
_build_ search_ text_clause( ) in bugtasksearch.py generates these search
clauses:
SQL( "BugTaskFlat. fti @@ ftq(?)", params= (searchtext, ))
So, no tokenisation in Python.
I added two bugs with some of "bad search terms" mentioned above to my
local launchpad_dev DB:
select bug.description, bugtaskflat.fti bugtaskflat. bug and bug.id>=16;
from bugtaskflat, bug where bug.id=
row 16: contributions. py SQLObject.select 2.6.20-12 1.10 crash contributions. py':5 'crash':10 'foo':3 'sqlobject. select' :6
description: from "from" foo "bar" <div> community-
fti: '-12':8 '1.10':9 '2.6.20':7 'bar':4 'community-
row 17: contributions. py SQLObject.select 2.6.20-12 1.10 crash contributions. py':7 'crash':12 'div':6 'foo':4 'sqlobject. select' :8 'xxx':1B
description: from "from" foo "bar" div community-
fti: '-12':10 '1.10':11 '2.6.20':9 'bar':5 'community-
The only difference between these rows is '<div>' vs. 'div'
Neither '<div>' nor 'div' appear in the first FTI: it seems that the FTI
tokenizer simply drops anything between '<' and '>'.
search queries:
select bug from bugtaskflat where fti @@ ftq('sqlobject. select' );
-> no result.
select ftq('sqlobject. select' );
ftq ------- ------- ------- ------- ------- --
-------
'sqlobject' & 'select' | 'sqlobjectselect'
(1 row)
So, ftq('sqlobject. select' ) generates a reasonable expression -- but the
full text index stores 'sqlobject.select' instead of two words 'sqlobject'
and 'select'.
The query below works though:
select bug from bugtaskflat where fti @@ 'sqlobject.select';
bug
-----
16
17
A search for "community- contributions. py" has the same problem: The index
stores the complete word, but:
select ftq('community- contributions. py');
ftq ------- ------- ------- ------- ------- ------- ------- ------- ibutionspi'
-------
'communiti' & 'contribut' & 'py' | 'communitycontr
(1 row)
"From" is probably in the set of stop words. I am not sure if it makes
sense to remove "from" from this set...