Comment 7 for bug 29713

Revision history for this message
Abel Deuring (adeuring) wrote :

_build_search_text_clause() in bugtasksearch.py generates these search
clauses:

    SQL("BugTaskFlat.fti @@ ftq(?)", params=(searchtext,))

So, no tokenisation in Python.

I added two bugs with some of "bad search terms" mentioned above to my
local launchpad_dev DB:

select bug.description, bugtaskflat.fti
    from bugtaskflat, bug where bug.id=bugtaskflat.bug and bug.id>=16;

row 16:
  description: from "from" foo "bar" <div> community-contributions.py SQLObject.select 2.6.20-12 1.10 crash
  fti: '-12':8 '1.10':9 '2.6.20':7 'bar':4 'community-contributions.py':5 'crash':10 'foo':3 'sqlobject.select':6

row 17:
  description: from "from" foo "bar" div community-contributions.py SQLObject.select 2.6.20-12 1.10 crash
  fti: '-12':10 '1.10':11 '2.6.20':9 'bar':5 'community-contributions.py':7 'crash':12 'div':6 'foo':4 'sqlobject.select':8 'xxx':1B

The only difference between these rows is '<div>' vs. 'div'

Neither '<div>' nor 'div' appear in the first FTI: it seems that the FTI
tokenizer simply drops anything between '<' and '>'.

search queries:

select bug from bugtaskflat where fti @@ ftq('sqlobject.select');
-> no result.

select ftq('sqlobject.select');
                    ftq
--------------------------------------------
 'sqlobject' & 'select' | 'sqlobjectselect'
(1 row)

So, ftq('sqlobject.select') generates a reasonable expression -- but the
full text index stores 'sqlobject.select' instead of two words 'sqlobject'
and 'select'.

The query below works though:

select bug from bugtaskflat where fti @@ 'sqlobject.select';

 bug
-----
  16
  17

A search for "community-contributions.py" has the same problem: The index
stores the complete word, but:

select ftq('community-contributions.py');
                              ftq
---------------------------------------------------------------
 'communiti' & 'contribut' & 'py' | 'communitycontributionspi'
(1 row)

"From" is probably in the set of stop words. I am not sure if it makes
sense to remove "from" from this set...