Blueprints search is totally broken

Bug #1064996 reported by Ara Pulido on 2012-10-10
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Undecided
Unassigned

Bug Description

Right now searching for blueprints is completely broken. The results of a very simple search are not reliable at all.

Steps to reproduce:

 1. Go to https://blueprints.launchpad.net/ubuntu
 2. Order by blueprint name, so it is easier to spot the bug
 3. Look at arm-m (for example) blueprints. Currently there are 4 starting with "arm-m"
 4. Search by "arm-m" (or visit https://blueprints.launchpad.net/ubuntu?searchtext=arm-m)

Expected results:

 The user gets at least those 4 blueprints

Current results:

 No blueprints are returned

Curtis Hovey (sinzui) on 2012-10-10
tags: added: lp-blueprints specifications
Abel Deuring (adeuring) wrote :

Please search for "arm m" instead.

Your example shows a limit of the full text search features of Postgres:

Let's take the name "arm-m-xdeb-cross-compilation-environment". Thes full text index data for this word is:

to_tsvector('arm-m-xdeb-cross-compilation-environment');
                                           to_tsvector
--------------------------------------------------------------------------------------------------
 'arm':2 'arm-m-xdeb-cross-compilation-environ':1 'compil':6 'cross':5 'environ':7 'm':3 'xdeb':4

so, all words separated by a '-' (some of them stemmed), and the complete name.

If you enter the searh term "arm-m", it is transformed into these search expression:

select ftq('arm-m');
          ftq
-----------------------
 'arm-m' & 'arm' & 'm'

meaning that all three words must appear in the the index data -- but "arm-m" is not present.

But the search term "arm m" is transformed into

select ftq('arm m');
     ftq
-------------
 'arm' & 'm'

so, those row are returned where the the full text index contains the words "arm" and "m"

Another caveat: Most single characters are indexed, but some are not:

select to_tsvector('a b c d e f g h i j k l m n o p q r s t u v w x y z');
'b':2 'c':3 'd':4 'e':5 'f':6 'g':7 'h':8 'j':10 'k':11 'l':12 'm':13 'n':14 'o':15 'p':16 'q':17 'r':18 'u':21 'v':22 'w':23 'x':24 'y':25 'z':26

So, a, i, s, t are missing. And they are also omitted from search terms, meaning that a search for "arm s" will return all texts containig "arm", including "arm-m".

You should use names like "arm-strange" (or whichever name will be used for the "s" series) instead of just "arm-s".

Changed in launchpad:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers