HTML/XML tags are dropped from the Postgres full text index and from search queries.

Bug #1015519 reported by Abel Deuring on 2012-06-20
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Low
Unassigned

Bug Description

Example:

    launchpad_dev=# select to_tsvector('aaa <div>bbb</div> ccc');
           to_tsvector
    -------------------------
     'aaa':1 'bbb':2 'ccc':3
    (1 row)

    launchpad_dev=# select to_tsquery('aaa & <div>bbb</div> & ccc');
          to_tsquery
    -----------------------
     'aaa' & 'bbb' & 'ccc'
    (1 row)

    launchpad_dev=# select to_tsquery('aaa & <div> & bbb & </div> & ccc');
          to_tsquery
    -----------------------
     'aaa' & 'bbb' & 'ccc'
    (1 row)

    launchpad_dev=# select ts_debug('aaa <div>bbb</div> ccc');
                                  ts_debug
    ---------------------------------------------------------------------
     (asciiword,"Word, all ASCII",aaa,{english_stem},english_stem,{aaa})
     (blank,"Space symbols"," ",{},,)
     (tag,"XML tag",<div>,{},,)
     (asciiword,"Word, all ASCII",bbb,{english_stem},english_stem,{bbb})
     (tag,"XML tag",</div>,{},,)
     (blank,"Space symbols"," ",{},,)
     (asciiword,"Word, all ASCII",ccc,{english_stem},english_stem,{ccc})
    (7 rows)

So, strings like '<div>' are treated as tokens of type "tag" -- but these
tokens do not appear in the FTI data, and they do not appear the the
result of to_tsquery().

Abel Deuring (adeuring) wrote :

Marked as "critical" since this bug describes one detail of the quite generic bug 29713, which itself is critical

Changed in launchpad:
importance: Undecided → Critical
status: New → Triaged
William Grant (wgrant) on 2012-10-12
Changed in launchpad:
importance: Critical → Low
tags: added: search
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers