Search tokenisation fails with documents like "... <div> ..." - cannot search for them

Bug #2753 reported by Matthew Paul Thomas
2
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
High
Unassigned

Bug Description

From https://launchpad.net/products/launchpad/+bugs, searching for "div" should (at the time of writing) find bug 1749, which is unfixed and has "div" in its summary. But it doesn't.

Brad Bollenbach (bradb)
Changed in malone:
assignee: nobody → bradb
status: New → Accepted
Brad Bollenbach (bradb)
Changed in malone:
assignee: bradb → nobody
Changed in malone:
assignee: nobody → stub
Stuart Bishop (stub)
Changed in launchpad-foundations:
status: Confirmed → Triaged
Revision history for this message
Robert Collins (lifeless) wrote : Re: Search tokenisation fails with documents like "... <div> ..." or " ... Foo.bar" - cannot search for them

This appears to be a root cause for much of the dissatisifaction with LP search. Escalating to high.

summary: - Searching for "div" fails to find bug 1749
+ Search tokenisation fails with documents like "... <div> ..." or " ...
+ Foo.bar" - cannot search for them
Changed in launchpad:
importance: Medium → High
Stuart Bishop (stub)
Changed in launchpad:
assignee: Stuart Bishop (stub) → nobody
Curtis Hovey (sinzui)
Changed in launchpad:
importance: High → Low
tags: added: search
Revision history for this message
Robert Collins (lifeless) wrote :

Still appears to be a root cause, and worth being in our 6-month list.

Changed in launchpad:
importance: Low → High
Revision history for this message
Robert Collins (lifeless) wrote :

I'm duping this on a slightly newer bug which covers the same case (token adjacent to punctuation) but has a clearer description etc.

Revision history for this message
Abel Deuring (adeuring) wrote :

Changed the "duplicate target" from 29713 to bug 1015519 because the current fix for bug 29713 does not address the bad parsing of HTML/XML tags.

summary: - Search tokenisation fails with documents like "... <div> ..." or " ...
- Foo.bar" - cannot search for them
+ Search tokenisation fails with documents like "... <div> ..." - cannot
+ search for them
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.