Blueprints search is totally broken

Bug #1064996 reported by Ara Pulido
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Invalid
Undecided
Unassigned

Bug Description

Right now searching for blueprints is completely broken. The results of a very simple search are not reliable at all.

Steps to reproduce:

 1. Go to https://blueprints.launchpad.net/ubuntu
 2. Order by blueprint name, so it is easier to spot the bug
 3. Look at arm-m (for example) blueprints. Currently there are 4 starting with "arm-m"
 4. Search by "arm-m" (or visit https://blueprints.launchpad.net/ubuntu?searchtext=arm-m)

Expected results:

 The user gets at least those 4 blueprints

Current results:

 No blueprints are returned

Curtis Hovey (sinzui)
tags: added: lp-blueprints specifications
Revision history for this message
Abel Deuring (adeuring) wrote :

Please search for "arm m" instead.

Your example shows a limit of the full text search features of Postgres:

Let's take the name "arm-m-xdeb-cross-compilation-environment". Thes full text index data for this word is:

to_tsvector('arm-m-xdeb-cross-compilation-environment');
                                           to_tsvector
--------------------------------------------------------------------------------------------------
 'arm':2 'arm-m-xdeb-cross-compilation-environ':1 'compil':6 'cross':5 'environ':7 'm':3 'xdeb':4

so, all words separated by a '-' (some of them stemmed), and the complete name.

If you enter the searh term "arm-m", it is transformed into these search expression:

select ftq('arm-m');
          ftq
-----------------------
 'arm-m' & 'arm' & 'm'

meaning that all three words must appear in the the index data -- but "arm-m" is not present.

But the search term "arm m" is transformed into

select ftq('arm m');
     ftq
-------------
 'arm' & 'm'

so, those row are returned where the the full text index contains the words "arm" and "m"

Another caveat: Most single characters are indexed, but some are not:

select to_tsvector('a b c d e f g h i j k l m n o p q r s t u v w x y z');
'b':2 'c':3 'd':4 'e':5 'f':6 'g':7 'h':8 'j':10 'k':11 'l':12 'm':13 'n':14 'o':15 'p':16 'q':17 'r':18 'u':21 'v':22 'w':23 'x':24 'y':25 'z':26

So, a, i, s, t are missing. And they are also omitted from search terms, meaning that a search for "arm s" will return all texts containig "arm", including "arm-m".

You should use names like "arm-strange" (or whichever name will be used for the "s" series) instead of just "arm-s".

Changed in launchpad:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.