improve search results

Bug #122132 reported by Aaron Swartz
8
Affects Status Importance Assigned to Milestone
Open Library
Fix Released
Medium
solrize

Bug Description

Search result ordering is poor.

Aaron Swartz (aaronsw)
Changed in openlibrary:
assignee: nobody → aaronsw
importance: Undecided → Critical
status: New → Confirmed
Revision history for this message
solrize (solrize) wrote :

I just tried it and it appears to work. What's the bug?

Revision history for this message
solrize (solrize) wrote :

Oh I see, the result ranking is pretty terrible. The keywords do occur in all the hits that I checked. But I will try adjusting the scoring weights to get better ranking.

Revision history for this message
Aaron Swartz (aaronsw) wrote :

actually, I'd fixed it and forgot to close the bug

Changed in openlibrary:
status: Confirmed → Fix Released
Revision history for this message
solrize (solrize) wrote :

How about keeping it open but downgrading the importance. I misunderstood what the bug was, but the ranking stuff needs atttention. "tom sawyer adventures" (plural) gets a much better result set than "adventure", so I should turn on some stemming.

Revision history for this message
solrize (solrize) wrote :

The current index uses stemming and "tom sawyer adventure" gets a pile of tom sawyer editions. Better still would be to merge them somehow. I plan to give a score boost to OCA editions (which have fulltext).

Revision history for this message
Anand Chitipothu (anandology) wrote : Re: [Bug 122132] Re: search engine broken

On 20-Jul-07, at 4:54 AM, solrize wrote:

> The current index uses stemming and "tom sawyer adventure" gets a pile
> of tom sawyer editions. Better still would be to merge them
> somehow. I
> plan to give a score boost to OCA editions (which have fulltext).

Is there any way to search only for the fulltext editions? earlier
fulltext:true used to work.

Revision history for this message
solrize (solrize) wrote : Re: search engine broken

I don't think fulltext:true ever worked (at least accurately), but has_fulltext:true should work now.

Revision history for this message
Anand Chitipothu (anandology) wrote : Re: [Bug 122132] Re: search engine broken

On 20-Jul-07, at 7:08 AM, solrize wrote:

> I don't think fulltext:true ever worked (at least accurately), but
> has_fulltext:true should work now.

Well, when i tried, it returned only books with fulltext.
Don't you think, there is a need for advanced search where all these
options can be exposed?

Revision history for this message
solrize (solrize) wrote : Re: search engine broken

The "fulltext" field contains the book's entire text, so if you said "fulltext:true" that found books that contained the word "true" somewhere in the contents (that would be a lot of books but not necessarily every book with fulltext). The "has_fulltext" field always contains one word ("true" or "false") depending on whether fulltext is present. "fulltext:true" no longer works on the main solr instance because the fulltext has been moved to a separate SE for performance reasons but you can still use it on the other instance.

Yes, our early UI mockups had an advanced search screen that had such options and (as of the last phone call) we are planning to bring it back, but applying a score boost for fulltext books in basic search is a little bit different. I think we didn't implement advanced search for the softlaunch because there were too many other things going on, and also there is the issue with the flipbook page numbering. However I hope we do it soon.

Related topic: I spoke with Todd on the phone today about flipbook and he explained what he thinks is going on--we may need to run something to crunch the scandata.xml files to fix the page numbers, or add some more server-side flipbook code to handle it at query time. I'm going to see him on Monday and will discuss it with him some more then.

Revision history for this message
solrize (solrize) wrote :

Could the priority of this bug be downgraded from critical--it was marked that way because of an outage that's fixed, but remains open since there are still some issues to discuss.

webchick (webchick)
Changed in openlibrary:
importance: Critical → Medium
Revision history for this message
webchick (webchick) wrote :
Revision history for this message
Anand Chitipothu (anandology) wrote : Re: [Bug 122132] Re: search engine broken

On 20-Jul-07, at 9:46 AM, webchick wrote:

> link to early UI mockup:
> http://invisible.net/openlibrary/advanced.png

May be there should be a check box, "show only books with fulltext",
or something like that.

Revision history for this message
solrize (solrize) wrote : Re: search engine broken

assigning to me since it's now mostly about SE functionality changes now.

Changed in openlibrary:
assignee: aaronsw → solrize
Aaron Swartz (aaronsw)
description: updated
Changed in openlibrary:
status: Fix Released → In Progress
Revision history for this message
solrize (solrize) wrote :

Lots more work on search ranking is planned, but this bug was about a much simpler issue, so closing it in favor of keeping a few outstanding wishlist items open, instead of a bunch of loose items like this.

Changed in openlibrary:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.