HTML/XML tags are dropped from the Postgres full text index and from search queries.

Bug #1015519 reported by Abel Deuring
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself

Bug Description


    launchpad_dev=# select to_tsvector('aaa <div>bbb</div> ccc');
     'aaa':1 'bbb':2 'ccc':3
    (1 row)

    launchpad_dev=# select to_tsquery('aaa & <div>bbb</div> & ccc');
     'aaa' & 'bbb' & 'ccc'
    (1 row)

    launchpad_dev=# select to_tsquery('aaa & <div> & bbb & </div> & ccc');
     'aaa' & 'bbb' & 'ccc'
    (1 row)

    launchpad_dev=# select ts_debug('aaa <div>bbb</div> ccc');
     (asciiword,"Word, all ASCII",aaa,{english_stem},english_stem,{aaa})
     (blank,"Space symbols"," ",{},,)
     (tag,"XML tag",<div>,{},,)
     (asciiword,"Word, all ASCII",bbb,{english_stem},english_stem,{bbb})
     (tag,"XML tag",</div>,{},,)
     (blank,"Space symbols"," ",{},,)
     (asciiword,"Word, all ASCII",ccc,{english_stem},english_stem,{ccc})
    (7 rows)

So, strings like '<div>' are treated as tokens of type "tag" -- but these
tokens do not appear in the FTI data, and they do not appear the the
result of to_tsquery().

Tags: search
Revision history for this message
Abel Deuring (adeuring) wrote :

Marked as "critical" since this bug describes one detail of the quite generic bug 29713, which itself is critical

Changed in launchpad:
importance: Undecided → Critical
status: New → Triaged
William Grant (wgrant)
Changed in launchpad:
importance: Critical → Low
tags: added: search
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers