Near-term steps to improve relevance ranking
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
KARL3 |
Fix Released
|
Medium
|
Tres Seaver |
Bug Description
OSI users have given a verdict: they miss the quality of the search results from Xapian. No big surprise, this was something we mentioned as a downside in the move away from Crapian.
We hope to schedule a call with one of the smart textindex folks to see what R&D next week to talk about more substantive options on the ranking system.
This ticket is focused on what we can do immediately. At the time of the de-Xapian decision, we mentioned that we could make words in the title score higher than words in the body via brute force: repeat the title words by a factor, say 10, when extracting the searchable text.
We can do this, test it out on the staging server, and gauge the impact.
If there are other idea, this ticket is a good place for them.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Launchpad Bug Tracker wrote: ...ahem. ..Xapian.
> You have been subscribed to a public bug by Paul Everitt (paul-agendaless):
>
> OSI users have given a verdict: they miss the quality of the search
> results from Xapian. No big surprise, this was something we mentioned
> as a downside in the move away from Crapian.
>
> We hope to schedule a call with one of the smart textindex folks to see
> what R&D next week to talk about more substantive options on the ranking
> system.
>
> This ticket is focused on what we can do immediately. At the time of
> the de-Xapian decision, we mentioned that we could make words in the
> title score higher than words in the body via brute force: repeat the
> title words by a factor, say 10, when extracting the searchable text.
>
> We can do this, test it out on the staging server, and gauge the impact.
>
> If there are other idea, this ticket is a good place for them.
>
> ** Affects: karl3
> Importance: Medium
> Assignee: Shane Hathaway (shane-hathawaymix)
> Status: New
>
Here is a sketch from a client project which does this (note that I of-implementati on is pretty low: I'm gonna have to fix it :()::
didn't write this, and now that i look at it, the
quality-
def SearchableText( self):
"""
Override searchable text taking field search weights into
account, as well as possible extra search tuning information.
The goal is to change word occurrences in order to manipulate SearchableText( self) search_ weights. items() :
field_ text = field()
field_ text = field
try:
foo = field_text + ''
except:
# this is a problem, skip this field
continue
field_ text = (field_text + ' ') * weight
text. extend( field_text. split() )
the relevance ranking of searches on the SearchableText full
text index. This way we won't have to do sorting of search
results -- they should already be in the right order.
"""
text = Content.
for field_id, weight in self._field_
field = getattr(self, field_id, '')
if callable(field):
else:
if field_text is not None:
if not ISearchTunable. isImplementedBy (self):
return text
# do extra search tuning relevant_ terms:
term_ text = (self.very_ relevant_ terms + ' ') * \
self. VERY_IMPORTANT_ TERM_FACTOR
text. extend( term_text. split() )
if self.very_
if self.relevant_ terms:
term_ text = (self.relevant_ terms + ' ') * \
self. IMPORTANT_ TERM_FACTOR
text. extend( term_text. split() )
I think the the ITextIndexData adapter implementations could easily pick models. adapters) .
this strategy up (in karl.content.
Perhaps we should consider applying this policy in the OSI package?
Tres. ======= ======= ======= ======= ======= ======= ======= ======= ====
- --
=======
Tres Seaver +1 540-429-0999 <email address hidden>
Palladion Software "...