HTML/XML tags are dropped from the Postgres full text index and from search queries.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Triaged
|
Low
|
Unassigned |
Bug Description
Example:
launchpad_dev=# select to_tsvector('aaa <div>bbb</div> ccc');
---
'aaa':1 'bbb':2 'ccc':3
(1 row)
launchpad_dev=# select to_tsquery('aaa & <div>bbb</div> & ccc');
---
'aaa' & 'bbb' & 'ccc'
(1 row)
launchpad_dev=# select to_tsquery('aaa & <div> & bbb & </div> & ccc');
---
'aaa' & 'bbb' & 'ccc'
(1 row)
launchpad_dev=# select ts_debug('aaa <div>bbb</div> ccc');
---
(asciiword
(blank,"Space symbols"," ",{},,)
(tag,"XML tag",<div>,{},,)
(asciiword
(tag,"XML tag",</div>,{},,)
(blank,"Space symbols"," ",{},,)
(asciiword
(7 rows)
So, strings like '<div>' are treated as tokens of type "tag" -- but these
tokens do not appear in the FTI data, and they do not appear the the
result of to_tsquery().
Changed in launchpad: | |
importance: | Critical → Low |
tags: | added: search |
Marked as "critical" since this bug describes one detail of the quite generic bug 29713, which itself is critical