Inconsistent parsing of '~' in to_tsquery() and to_tsvector()

Bug #1015511 reported by Abel Deuring on 2012-06-20
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Low
Unassigned

Bug Description

If a word from a full-text-search-indexed text is used in a
search, all texts containing this word should be returned by
this search.

This can fail if a word starts with a '~':

    psql -d launchpad_dev
    psql (9.1.4)
    Type "help" for help.

    launchpad_dev=# select to_tsvector('aaa ~bbb ccc') @@ to_tsquery('~bbb');
     ?column?
    ----------
     f

The reason:

    launchpad_dev=# select to_tsvector('aaa ~bbb ccc');
           to_tsvector
    -------------------------
     'aaa':1 'bbb':2 'ccc':3
    (1 row)

So, the '~' is stripped from '~bbb'. But the search term generated by
to_tsquery() retains the '~':

    launchpad_dev=# select to_tsquery('~bbb');
     to_tsquery
    ------------
     '~bbb'
    (1 row)

This is not completely wrong, because to_tsvector() sometimes keeps a
leading '~':

    launchpad_dev=# select to_tsvector('~aaa bbb~ccc');
            to_tsvector
    ---------------------------
     'bbb':2 '~aaa':1 '~ccc':3
    (1 row)

ts_debug() gives a clue what is happening:

    launchpad_dev=# select ts_debug('~bbb');
                            ts_debug
    --------------------------------------------------------
     (file,"File or path name",~bbb,{simple},simple,{~bbb})
    (1 row)

    launchpad_dev=# select ts_debug('~aaa bbb~ccc');
                                  ts_debug
    ---------------------------------------------------------------------
     (file,"File or path name",~aaa,{simple},simple,{~aaa})
     (blank,"Space symbols"," ",{},,)
     (asciiword,"Word, all ASCII",bbb,{english_stem},english_stem,{bbb})
     (file,"File or path name",~ccc,{simple},simple,{~ccc})
    (4 rows)

So, a '~' at the start of a text or following a word is treated as the
first character of a filename, while a '~' preceded by a space is
simply dropped and the following word is treated as an oridnary word.

Revision history for this message
Abel Deuring (adeuring) wrote :

Marked as "critcal" since this bug descirbes one details of the quite generic bug 29713

Changed in launchpad:
importance: Undecided → Critical
status: New → Triaged
Abel Deuring (adeuring) on 2012-06-20
description: updated
Abel Deuring (adeuring) on 2012-06-20
description: updated
William Grant (wgrant) on 2012-10-12
Changed in launchpad:
importance: Critical → Low
tags: added: search
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers