Inconsistent parsing of '~' in to_tsquery() and to_tsvector()

Bug #1015511 reported by Abel Deuring
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
Low
Unassigned

Bug Description

If a word from a full-text-search-indexed text is used in a
search, all texts containing this word should be returned by
this search.

This can fail if a word starts with a '~':

    psql -d launchpad_dev
    psql (9.1.4)
    Type "help" for help.

    launchpad_dev=# select to_tsvector('aaa ~bbb ccc') @@ to_tsquery('~bbb');
     ?column?
    ----------
     f

The reason:

    launchpad_dev=# select to_tsvector('aaa ~bbb ccc');
           to_tsvector
    -------------------------
     'aaa':1 'bbb':2 'ccc':3
    (1 row)

So, the '~' is stripped from '~bbb'. But the search term generated by
to_tsquery() retains the '~':

    launchpad_dev=# select to_tsquery('~bbb');
     to_tsquery
    ------------
     '~bbb'
    (1 row)

This is not completely wrong, because to_tsvector() sometimes keeps a
leading '~':

    launchpad_dev=# select to_tsvector('~aaa bbb~ccc');
            to_tsvector
    ---------------------------
     'bbb':2 '~aaa':1 '~ccc':3
    (1 row)

ts_debug() gives a clue what is happening:

    launchpad_dev=# select ts_debug('~bbb');
                            ts_debug
    --------------------------------------------------------
     (file,"File or path name",~bbb,{simple},simple,{~bbb})
    (1 row)

    launchpad_dev=# select ts_debug('~aaa bbb~ccc');
                                  ts_debug
    ---------------------------------------------------------------------
     (file,"File or path name",~aaa,{simple},simple,{~aaa})
     (blank,"Space symbols"," ",{},,)
     (asciiword,"Word, all ASCII",bbb,{english_stem},english_stem,{bbb})
     (file,"File or path name",~ccc,{simple},simple,{~ccc})
    (4 rows)

So, a '~' at the start of a text or following a word is treated as the
first character of a filename, while a '~' preceded by a space is
simply dropped and the following word is treated as an oridnary word.

Tags: search
Revision history for this message
Abel Deuring (adeuring) wrote :

Marked as "critcal" since this bug descirbes one details of the quite generic bug 29713

Changed in launchpad:
importance: Undecided → Critical
status: New → Triaged
Abel Deuring (adeuring)
description: updated
Abel Deuring (adeuring)
description: updated
William Grant (wgrant)
Changed in launchpad:
importance: Critical → Low
tags: added: search
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.