Comment 5 for bug 1025357

Abel Deuring (adeuring) wrote :


right, if you have have a name like "servercloud-q-r-foo", the name will be found if you search for "servercloud q" as well as for "servercloud r" -- probably not what you would expect ;)

The problem is that there is not way to search for adjacent words. The core search data provided by Postgres would allow this: For each word it's position in a given text is stored, so in theory there would be the option to limit the search to "returns texts having 'servercloud' in position N and 'q' in position N+1" but AFAIK Postgres' remaining search infrastructure does not provide it.

The second possible problem are stop words, i.e., words that are not indexed. These are common English words, like 'a', 'the', 'be' etc. The "single-character" words "a", "i", "s", "t" are treated as stop words (meaning that they ae not stored in the full text index). The reason for "a" and "i" is obvious; "s" is probably not indexed because it is used as the "genetive marker" (like in "Antonio's bug 1025327"), "t" is probably dropped because of words like "can't". (The single quotation mark is parsed as a word separator.)

Anyway, what I mean is: A search for "servercloud-s" would find texts containing "servercloud-b" or "servercloud-no-single-character-at-all" and so on because the "s", being a stop word, is silently dropped from the query. Might be a reason to change the naming scheme for the releases "S" and "T" to avoid another unpeasant surprise ;)