"Get books" search is too fuzzy

Bug #1200012 reported by Anders Skogheim Liane
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Fix Released
Undecided
John Schember

Bug Description

As described in the following thread, Get books hits too many books which don't even include all the words from a title/author search: http://www.mobileread.com/forums/showthread.php?t=212422.

I kludged together a horrible hack which I was reluctant to even include here, as it's likely to be altogether the wrong way to achieve this, but it seems to work as I want (and it demonstrates what I mean) :)

For instance, in a search for title "the player of games" my release version of calibre have 96 hits, most of which don't have neither player nor games in the title (there are a few with "game", though). The exact number is of course dependent on my choice of stores. The only 8 hits in my "fixed" search results is the actual book "The Player of Games" by Iain M. Banks.

I propose that an option is provided to only include hits which contain an exact partial match (case-insensitive) of the searched title/author. This could be done either by having a checkbox enabling it, or by respecting double quotation marks as an exact phrase marker (together with a field description that indicates this).

Revision history for this message
Anders Skogheim Liane (maneatingduck-gmail) wrote :
Revision history for this message
Kovid Goyal (kovid) wrote : Re: calibre bug 1200012

Changing the component for this bug.

 assignee user-none
 status triaged

Changed in calibre:
assignee: nobody → John Schember (user-none)
status: New → Triaged
Revision history for this message
John Schember (user-none) wrote :

There are a few aspects to this issue.

1) The results are returned by the stores. Some stores have poor search ability and return arbitrary results.

2) It can be made more specific in the search but this excludes searching for discovery. For example searching for dragons. You don't necessarily want books with dragon in the title but books about dragons. Excluding by title would hobble this use case.

Both needs to be balanced. We need to ensure books aren't excluded but we also need to not include books that are clearly unwanted. Before I look into changing the searching I think I have an idea that might alleviate some of this issue.

I think I'll add a new column which will give the order of results returned from the stores. Typically a store will return results that are a closer match to the query earlier. So you'll be able to sort by rank where a lower rank should be closer to what you searched for.

Revision history for this message
Anders Skogheim Liane (maneatingduck-gmail) wrote :

1) Yes, I am aware of that, that is why calibre needs to filter the results received.

2) What makes Get books different from Google is that it's a lot better for the use case "I want to buy this particular book. Which stores sells it cheapest?" That's what I use it for, and in that case completely unrelated books in the results are just noise (more than 90% in my example). I would be very surprised if this is not a reasonably common use case. The About presentation on first use also emphasises this feature. Worst case is that a first-time user of Get books dismisses it completely when he doesn't see a single relevant result in the initial view, for never to return to it. That would be a shame, as Get books *is* very useful even as it is.

Furthermore, I would expect that the keyword field is more suitable for discovery than the title field. I would also expect that the specific fields "title" and "author" would mostly yield books that actually have some resemblance to the input in those fields. A partial match would also be useful to find books with "dragons" in the title.

TBH I don't think your proposed solution is very useful, as stores return fairly unpredictable results. At least it isn't good enough when a deterministic solution is possible to implement.

I don't think that one should change the behaviour completely, as existing users are used to the current one. That's why I propose to make exact partial search an option. Choice is good most of the time, and calibre is generally very generous in that respect, in other areas it caters to a lot of different user preferences. As it is now there is no option to get only relevant results from Get books.

I can get this option anyway, as I can just keep applying my patch and running from source :)

Revision history for this message
Anders Skogheim Liane (maneatingduck-gmail) wrote :

Uh, it suddenly hit me that in my overly long comment I missed an improvement which would be a compromise. Sort the result as default in the following way:

Place exact whole-title/author matches first, followed by exact partial matches, and then the less relevant ones in whichever order.

I'm still inclined to think that giving the user control of this option would be better as it allows sorting by author in case of similar titles, but this would at least emphasise the most interesting results in the view.

Revision history for this message
Kovid Goyal (kovid) wrote :

@john: IIRC wasn't the introduction of the title and author fields meant to filter matches on the title and author metadata? I seem to recall vaguely that title2 and author2 were introduced into the search query parser to make that happen.

Revision history for this message
John Schember (user-none) wrote :

@kovid, Yes that's exactly what happens. If you search using the title field it will include all results that have any of the search words (common articles such as a,and,the... excluded because those should be filtered out for matching). You had me explicitly change it to this type of matching instead of exact matching on the title as input itself due common misspelling resulting in no matches.

When I do a search for "the player of games" in the title field very result I see in Get Books has player or games in the title. This is exactly the intended and last requested behavior.

The attached path changes it back to the original behavior of requiring the actual title as typed by the user to be present in the result title.

Basically, every change request is either it's too strict and it needs to allow more results or it's including too many results and it needs to be made more strict.

Revision history for this message
Kovid Goyal (kovid) wrote :

Certainly to me displaying results with player or games in the title is the correct behavior. The question is why is the OP seeing something different? @Anders: I assume you are running from up-to-date source? (Note that calibre's source control recently moved from bzr to git). Also can you post an example search that displays the incorrect behavior (the exact search query and list of enabled plugins would help).

Revision history for this message
John Schember (user-none) wrote :

@ kovid looking at the screen shot from the MobileRead I see results with player and or game in the title. Some of the results like "the psychology of baseball..." Have game in the title but it's longer than the column. Mousing over to see the full title in the tool tip shows game. So the behavior is Andres is seeing is correct and desired.

Changed in calibre:
status: Triaged → Invalid
Revision history for this message
John Schember (user-none) wrote :

Oh. I almost forgot. It is possible to do exact title searching.

You can either use the advanced search builder (binoculars next to keyword) or you can type it directly. Put the following in the keyword field:

title:"the player of games"

This will have the same result as the patch submitted. So, partial match searching is already supported and there is no need for an option because it can be "enabled" by doing an advanced search.

Revision history for this message
Anders Skogheim Liane (maneatingduck-gmail) wrote :

title:"the player of games" in the keyword field indeed works as I want. It might not be very intuitive that you need to put it in that field instead of in the title field, but the functionality is there :)

Thanks to you both, this certainly works fine for me.

While I consider my issue resolved, re Kovids reply #8 I still see numerous results which don't contain relevant words. The attached screenshot is from version 9.39 x64 on Win7 Home Premium, not source, but I see similar issues when running from up-to-date git source. Note "The Spy Wore Red" and "The Secret Life of Bees - Behind the Story". The results also include the whole Harry Potter series from Kobo, with a lot of garbage characters prefixed to the title. While the HP books obviously have invalid titles, they still shouldn't be in the results.

Instead of attaching more screenshots, I hope it's okay to just include a few of the links as well, you'll just have to trust me that they indeed were amongst the results. The links are copied from the browser after clicking them in Get books, i removed the buy suffix from the feedbooks ones.

"The Improving Annotator: From Beginner to Master" https://www.feedbooks.com/item/182627

"The Intelligent Guide to Texas Hold'em Poker" https://www.feedbooks.com/item/49151

"The Cocka Hola Company - Roman" http://www.kobobooks.com//ebook/The-Cocka-Hola-Company/book-3zGBy3PQC02CPMm7V_m2rg/page1.html?utm_source=linkshare_us&utm_medium=Affiliate&utm_campaign=linkshare_us&siteID=0dsO3kDu_AU-lVXPkJHDc086L9GpcOYEUg

"Hammerin' Hank, George Almighty and the Say Hey Kid" http://www.feedbooks.com/item/28629

"Killer Advantage: The Gambler's, Magician's and Mentalists Guide to Easily Memorize Discards and a Deck of 52 Cards in an Hour!" http://www.smashwords.com/books/view/56751?ref=usernone

"kill.switch - poradnik do gry (ebook)" http://www.empik.com/kill-switch-poradnik-do-gry-bienkowski-daniel-kami,p1045706229,ebooki-i-mp3-p

Revision history for this message
John Schember (user-none) wrote :

It's not filtering articles such as "the" that are part of the search terms.

Really that's a minor issue which will need a bit more work to resolve. Currently the article stripping for searches is English only. Really this need to be enhanced to include a larger list of words to ignore that encompasses more languages.

Revision history for this message
Kovid Goyal (kovid) wrote : Fixed in master

Fixed in branch master. The fix will be in the next release. calibre is usually released every Friday.

 status fixreleased

Changed in calibre:
status: Invalid → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.