search poorly prioritizes queries with common terms

Bug #104032 reported by Michal Suchanek
2
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
Medium
Unassigned

Bug Description

If you issue general bug search on "launchpad plugin firefox":

https://bugs.launchpad.net/bugs/+bugs?field.searchtext=launchpad+plugin+firefox&search=Search+Bug+Reports&field.scope=all&field.scope.target=

It returns about a dozen of results. The bug that is actually filed against the firefox-launchpad-plugin package is at the bottom. Searching for something in a more mainstream package would be quite useless (ie firefox crash).

Looking at some of the top results reveals that some (all?) of the keywords are present in a comment that contains apt output dump. These spam comments have two properties:

- are very long
- match *many* keywords

Perhaps some heuristic could be applied that sorts them lower in the list.

Tags: lp-bugs search
Revision history for this message
Matthew Paul Thomas (mpt) wrote :

Michal, please give the URL of a search you did, and an example of an irrelevant result from that search (i.e. a bug report that was returned and shouldn't have been). Without that, it's hard to tell what the problem is.

(I also don't know what you mean by "there is only one project". There are currently 2614 projects registered in Launchpad, of which 693 have bug reports.)

Changed in malone:
status: Unconfirmed → Needs Info
Revision history for this message
Michal Suchanek (hramrach) wrote :

A bug search that nicely demonstrates the problem is general bug search on "launchpad plugin firefox".

https://bugs.launchpad.net/bugs/+bugs?field.searchtext=launchpad+plugin+firefox&search=Search+Bug+Reports&field.scope=all&field.scope.target=

It returns about a dozen of results. The bug that is actually filed against firefox-launchpad-plugin is at the bottom. Searching for something in a more mainstream package would be quite useless (ie firefox crash).

Looking at some of the top results reveals that some (all?) of the keywords are present in a comment that contains apt output dump.

These spam comments have two properties:

- are very long
- match *many* keywords

Perhaps some heuristic could be applied that sorts them lower in the list.

Thanks

Revision history for this message
Matthew Paul Thomas (mpt) wrote :

Thank you for that example.

Changed in malone:
importance: Undecided → Medium
status: Needs Info → Confirmed
Revision history for this message
Matthew Paul Thomas (mpt) wrote :

Actually the all-projects Bugs search, just like all the other Bugs searches, sorts results by Importance rather than by any text ranking (see bug 1022). Perhaps this search should be special-cased to use a full-text ranking.

Christian Reis (kiko)
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.