Comment 1 for bug 1025357

Abel Deuring (adeuring) wrote :

As a workaround, you can search simply for "servercloud q". That's basically what was used internally before. (well, strictly speaking, the internal search term was "(servercloud & q) | servercloudq", but "servercloudq" would not yield a match.)

This issue is caused by my work on bug 29713, specifically to fix the problem described at the end of comment #7 that certain filenames cannot be searched. My conclusion was that it is best to simply not mangle any '-' inside a word.

This bug a good example that we should do this again, but slightly modified. The current situation:

The FTI data is for example

select to_tsvector('servercloud-q-cloud-archive');
                               to_tsvector
-------------------------------------------------------------------------
 'archiv':5 'cloud':4 'q':3 'servercloud':2 'servercloud-q-cloud-arch':1

and the ts_query for "servercloud-q" is:

select ftq('servercloud-q');
                  ftq
---------------------------------------
 'servercloud-q' & 'servercloud' & 'q'

(This is the same as a direct call of to_tsquery())

So, the "blocker" is that "servercloud-q" is not part of the FTI.

We should probably re-introduce a form of "mangling of hypens" so that we have a call of to_tsquery() like:

to_tsquery('servercloud-q | (servercloud & q)')

The result of this call is slightly redundant but should work:

select to_tsquery('servercloud-q | (servercloud & q)');
                         to_tsquery
-------------------------------------------------------------
 'servercloud-q' & 'servercloud' & 'q' | 'servercloud' & 'q'

Note that a simple s/-/ / for a search term will cause problems for words that are treated as file names or host names:

launchpad_dev=# select to_tsvector('file-name.txt');
    to_tsvector
-------------------
 'file-name.txt':1

select to_tsquery('file & name.txt');
         ftq
---------------------
 'file' & 'name.txt'

so, here we must keep the '-'. The "redundant looking" variant to call to_tsquery('file-name.txt | (file & name.txt)') makes searches successful both for words like "servercloud-q-cloud-archive" as described here as well as for file/host names containing dashes.