Bug #236947 “Release Search API enhancements” : Bugs : Open Library

Revision history for this message

solrize (solrize) wrote on 2008-06-02:

#1

Some questions for Stephanie:

Should 1) and 2) actually be combined, that does #2 also mean that you want to get the full record for each book?

3) How about if I add a search_params field to the JSON request, i.e.
search_fields : {"author": "mark twain", "title": "connecticut yankee" }
that would allow you to specify any fields you wanted. Default would be a basic (cross-field) search as we have now.

4) In general, do you want to see the OL search facets?

5) What do you want to have happen when there's a large result set? For example, "france" (one of our standard test searches) returns 1000's of results.

solrize (solrize) on 2008-06-02

Changed in openlibrary:
assignee:	nobody → solrize
importance:	Undecided → High
status:	New → Confirmed
importance:	High → Wishlist

Revision history for this message

solrize (solrize) wrote on 2008-06-02:

#2

How does this look as an expanded search api:

q is a dictionary (json formatted) contains the following fields:

offset :: Integer -> result number to start at, default 0
rows :: Integer -> number of rows to return, default 20

EXACTLY ONE OF:
query :: String -> basic query string ("basic query" means cross-field)
advanced_query :: dictionary

The advanced query dictionary would be something like the above. We could make a syntax for Boolean combinations.

Another (simpler) approach would be to just use a string in Lucene syntax, but we may not want to always stay with that.

Revision history for this message

Stephanie Collett (stephanie-collett) wrote on 2008-06-03:

#3

RE: 1+2) Yes, I think the less transactions the better for creating dynamic links on the client side.

RE 2) Multiple books in a single request is a tricky requirement because there needs to be a way to determine which book came from what submitted criteria. Google's API nests the results, but they only allow an identifier search at this time.

http://books.google.com/books?bibkeys=ISBN:9780689317804,LCCN:2007004478&jscmd=viewapi

RE 3) I like this approach. Would it then be possible to limit search results by scanned books?

RE 4) Sounds interesting, I'd have to think about it. But I don't have a use case for this right now.

RE 5) Offset and Row options would be a good way to handle large sets.

Revision history for this message

Stephanie Collett (stephanie-collett) wrote on 2008-06-03:

#4

I am most interested advance query for my catalog integration, but I think a basic cross-field query could useful in many other situations.

Revision history for this message

solrize (solrize) wrote on 2008-06-03:

#5

To select scanned books, just add the has_fulltext field with value 1.

For testing purposes I will add an advanced query using Lucene syntax, but we may later want to switch to something more dictionary based like in comment #1.

Revision history for this message

solrize (solrize) wrote on 2008-06-03:

#6

As a preliminary pass at this, please try:

http://apollonius.us.archive.org:9071/api/search?q={"query":"title:robot authors:asimov"}&prettyprint=true&format=expanded&rows=3

Changed in openlibrary:
status:	Confirmed → In Progress

Revision history for this message

solrize (solrize) wrote on 2008-06-03:

#7

Here is one limited to full text. Sorry that launchpad is breaking up these long url's. You'll have to paste them back together.

http://apollonius.us.archive.org:9071/api/search?q={"query":"authors:(felix klein) has_fulltext:1"}&prettyprint=true&format=expanded

Revision history for this message

Stephanie Collett (stephanie-collett) wrote on 2008-06-03:

#8

This is great! Being able to search by author, title, and fulltext is key for my application. And multiple format options enable both quick linking by the ID and more heavy weight applications working directly with the data.

Could there be a way for multiple queries to be passed within one request to the API? My particular use case can have up to 30 books to look up, and it would be nice to only make a single (or few) requests to query all those books.

Revision history for this message

solrize (solrize) wrote on 2008-06-04:

#9

Really 30 different search queries? Each one is typically an author-title combination where they are all basically unrelated? OK, I made it so you can supply a list as the query value. I'm just a little surprised that the application wants that.

Let me know how this looks and if there's too much crud in the results etc. Also I probably want to clean up the syntax some before releasing it, so this is just a test of concept.

http://apollonius.us.archive.org:9071/api/search?q={"query":["authors:(felix klein) has_fulltext:1","authors:hilbert title:(geometry imagination)"]}&prettyprint=true&format=expanded

Revision history for this message

Stephanie Collett (stephanie-collett) wrote on 2008-06-04:

#10

Yeah, I know that is a little crazy, but it is an upper bound. Searches on Melvyl have an average of 8-9 items. And many of books will have identifiers (e.g. LCCN, ISBN, OCLC). However, some books will require at least an author and title search. So...theoretically there could be 30 author/title searches, but it is unlikely. We are excited to try linking to the OCA content directly in the search results.

I like the prototype, and think it will integrate well into Melvyl. My next step will be to build it into our Melvyl test instance along side the Google Book Search code. I'll let you know how it goes. Thanks!

Revision history for this message

solrize (solrize) wrote on 2008-06-07:

#11

per Stephanie's request I've added a callback parameter, e.g.

http://apollonius.us.archive.org:9071/api/search?q={"query":"authors:(felix klein) has_fulltext:1"}&prettyprint=true&format=expanded&callback=olresults

This wraps the json result string in a function call with the name given in the callback parameter, e.g.

olresults({"status": "ok", ... })

I think the idea is to insert the result directly into the dhtml sent to the user's browser so that client side javascript can process it.

Revision history for this message

Anand Chitipothu (anandology) wrote on 2008-06-07: Re: [Bug 236947] Re: search API improvements

#12

> This wraps the json result string in a function call with the name given
> in the callback parameter, e.g.
>
> olresults({"status": "ok", ... })

What are the available callbacks?

> I think the idea is to insert the result directly into the dhtml sent to
> the user's browser so that client side javascript can process it.

client side javascript can also process JSON. I don't think we should
be doing this.

Revision history for this message

Anand Chitipothu (anandology) wrote on 2008-06-07:

#13

On Sat, Jun 7, 2008 at 6:37 AM, Anand Chitipothu <email address hidden> wrote:
>> This wraps the json result string in a function call with the name given
>> in the callback parameter, e.g.
>>
>> olresults({"status": "ok", ... })
>

Sorry, I misunderstood this. I thought you are executing a python function.
This is useful in general. We can add this support for all API functions.

Revision history for this message

solrize (solrize) wrote on 2008-06-07: Re: search API improvements

#14

Aaron explained it to me, the idea is that the client sends the OL request and the callback gets around restrictions on cross-site scripting. The callback function is just whatever is in the request url. Out of general paranoia I check that it's an identifier-like token.

One issue is about the case where there's a bunch of different queries in one request: maybe that leads to an overlong url and you want to POST instead of GET the request. I'm not sure if that works so well with this model. Also, it occurs to me that putting the search terms into the url is a slight privacy hazard because of company firewalls that log outgoing url's but don't log POST contents. We should at minimum put up an HTTPS server to help with such issues.

Revision history for this message

rejon (rejon) wrote on 2008-06-17:

#15

Cool, this is what David needs for Open Library + Wikipedia plugin. David, anything more needed?

Revision history for this message

Stephanie Collett (stephanie-collett) wrote on 2008-06-17:

#16

To enable JS libraries with namespaces, would it be possible to allow period characters in the callback method?

This would look like:

q={"query":"..."}&callback=JS.Library.Namespace.olresults

and return:

JS.Library.Namespace.olresults({"status": "ok", ... })

Revision history for this message

solrize (solrize) wrote on 2008-06-18:

#17

Does anyone see any exploits? I'm way out of the game with this stuff and not thinking very clearly. I'll put it in tomorrow unless someone sees a problem.

Revision history for this message

Aaron Swartz (aaronsw) wrote on 2008-06-18: Re: [Bug 236947] Re: search API improvements

#18

Del.icio.us doesn't seem to do any sort of filtering, so I suspect it's OK:

http://feeds.delicious.com/feeds/json/aaronsw/?callback=.-@*$()

Revision history for this message

solrize (solrize) wrote on 2008-06-19: Re: search API improvements

#19

I added periods to the set of allowed characters, except the first char still has to be alphabetic.

Revision history for this message

solrize (solrize) wrote on 2008-06-26:

#20

Jonathan Rochkind has asked for a search API to access fulltext search of scanned books. I will add something like that, so the API takes search terms and returns a list of OL book records and leaf numbers where the search terms occur. Jonathan also asks about launching flipbook with the search terms bookmarked and highlighted. See bug# 126611 for more about that.

Revision history for this message

Stephanie Collett (stephanie-collett) wrote on 2008-06-26:

#21

Would it be possible to wrap json error messages in callbacks? When an error occurs there is no way to recover in Javascript since it requires a callback.

olresults({"status": "error"})

solrize (solrize) on 2008-06-26

description:

updated

Revision history for this message

solrize (solrize) wrote on 2008-06-26:

#22

I'll see if I can add the calback around the error message this evening.

Revision history for this message

solrize (solrize) wrote on 2008-06-27:

#23

Callback for error return is added.

Revision history for this message

jrochkind (rochkind-jhu) wrote on 2008-06-27:

#24

Thanks Paul. What I'm asking for is actually a bit different/simpler than a full search API via Open Library. That would be useful, but would be actually too much overhead on my end for what I want to do here, that I am already doing with Amazon and GBS for analogy. To explain:

I can already search using the IA XML search, and discover from those XML results if there is a flipbook available. If there is a flipbook available, I can determine it's URL, such as:

http://www.archive.org/stream/europeananarchy00dickiala

And I can send the user there, and once there, they can enter a search query and see search results. But I'd like to let them enter a search query over in my interface, and send them to that page with their search already submitted and the results immediately shown. Both Amazon and Google Books let me do that with URLs analagous to:

http://www.archive.org/stream/europeananarchy00dickiala?query=london

( Ie, http://books.google.com/books?id=RWPrAFvARUQC&q=california#search
or: http://www.amazon.com/gp/reader/0520244141/ref=sib_dp_srch_pop?v=search-inside&keywords=liberal )

Much simpler than having to use a full scale API and render the search results myself--which I don't need that full power right now. I'm happy to just send the user into your rendered search results, like I can with Amazon and GBS.

So, whether it's through OpenLibrary or not, I really want something much simpler than a full API (which would maybe return hits in XML)---I just want a way to "link" into search results in the IA's website. Make sense? If it's through OpenLibrary, the problem is that many books that are accessible and have flipbooks via the IA website in general (and the XML search) are not neccesarily in the OpenLibrary yet, and may not be for some time. So my specific suggestion is really just enhancing that flipbook interface to pay attention to a URL query parameter "&query=" or what have you, and automatically present search results for that query if present.

Make sense?

Thanks Paul. What I'm asking for is actually a bit different/simpler than a full search API via Open Library.  That would be useful, but would be actually too much overhead on my end for what I want to do here, that I am already doing with Amazon and GBS for analogy.  To explain:

I can already search using the IA XML search, and discover from those XML results if there is a flipbook available. If there is a flipbook available, I can determine it's URL, such as:

http://www.archive.org/stream/europeananarchy00dickiala

And I can send the user there, and once there, they can enter a search query and see search results. But I'd like to let them enter a search query over in my interface, and send them to that page with their search already submitted and the results immediately shown. Both Amazon and Google Books let me do that with URLs analagous to:

http://www.archive.org/stream/europeananarchy00dickiala?query=london

(  Ie, http://books.google.com/books?id=RWPrAFvARUQC&q=california#search
or: http://www.amazon.com/gp/reader/0520244141/ref=sib_dp_srch_pop?v=search-inside&keywords=liberal   )

Much simpler than having to use a full scale API and render the search results myself--which I don't need that full power right now. I'm happy to just send the user into your rendered search results, like I can with Amazon and GBS.

So, whether it's through OpenLibrary or not, I really want something much simpler than a full API (which would maybe return hits in XML)---I just want a way to "link" into search results in the IA's website. Make sense?  If it's through OpenLibrary, the problem is that many books that are accessible and have flipbooks via the IA website in general (and the XML search) are not neccesarily in the OpenLibrary yet, and may not be for some time. So my specific suggestion is really just enhancing that flipbook interface to pay attention to a URL query parameter "&query=" or what have you, and automatically present search results for that query if present.

Make sense?

Revision history for this message

solrize (solrize) wrote on 2008-06-27: Re: [Bug 236947] Re: search API improvements

#25

> http://www.archive.org/stream/europeananarchy00dickiala?query=london

Yes, I understand, we have a rather old open request about this (see
above), and there's an implementation that does about what you're asking,
but it isn't reliable enough to release at the moment. I should really
give it more attention but nobody has been asking about it recently. It
requires hacking inside of Flipbook itself, which is a complex Javascript
application that we've been wanting to replace. I'm about to start doing
some more fulltext-related stuff for other reasons too, so I'll see if I
can fix the remaining problems.

Revision history for this message

solrize (solrize) wrote on 2008-07-02: Re: search API improvements

#26

The development search engine (used by apollonius in the links above) will be down for part of today for a ram upgrade. I hope this doesn't interfere with anyone.

Revision history for this message

Aaron Swartz (aaronsw) wrote on 2008-07-17:

#27

It still seems to be down -- everything is returning:

IOError: [Errno 32] Broken pipe

Revision history for this message

solrize (solrize) wrote on 2008-07-17:

#28

Restarted apollonius:9071 server. I've seen it wedge like that a couple other times. I don't know why.

Revision history for this message

Jason Ronallo (jronallo) wrote on 2008-07-17:

#29

> 1) Search API should return full record for each book, instead of just ID

Has this been implemented yet? I'm still just getting back just IDs. As the original requester commented, it would be nice to be able to make a single request of OL and be returned all metadata on an edition.

> http://apollonius.us.archive.org:9071/api/search?q={%22query%22:[%22authors:(felix%20klein)%20has_fulltext:1%22,%22authors:hilbert%20title:(geometry%20imagination)%22]}&prettyprint=true&format=expanded

This example link was given for how to pass multiple queries in a batch. The concern was that there would be too much crud in the response. I tried to take a look but got an error as Aaron did.

I can easily see having a batch of 30 queries to make at a time even though I'm looking for a single work. If I have an identifier like an ISBN, I may run it through a service that returns all related identifiers (OCLC, ISBN, LCCN). Then query OL with all of them. This greatly increases the chances that I will find fulltext. Now if OL is FRBRized (all related editions groups) and could return those related editions--especially fulltext ones--when passed a certain key (related:1) that'd be awesome.

When you start returning results for a batch of queries, it would be nice to have deduping already done. Some thought will need to be given to how to do this. Some folks will likely want the results for each query in a batch to be identified by the query, so this complicates what you might be able to do with deduping.

One API that you might like to look at for ideas is for MBooks/SDR. It does nice things like scoring results as well as deduping. It is still in development, so it doesn't do batches yet, but here's an example of what it can do with multiple identifiers for the same edition:
http://code.google.com/p/jquery-sdrsmd/wiki/Phase1

Revision history for this message

solrize (solrize) wrote on 2008-07-17:

#30

Jason, please go ahead and try that query again, the test server was down for some reason and I just restarted it.

MBooks/SDR looks interesting, though its scoring function (like ours) doesn't look that useful. I can put solr's scores into the API search results if they aren't already there and you want them, but search ranking for bibliographic records is quite problematic. Most libraries don't attempt it.

When you say deduping, do you mean merging the search results when more than one query hits a specific book? I can think of various ways to do that but maybe it's more sensible to let the client app do it.

Solr implements Lucene-syntax queries with arbitrary boolean combinations, so you could do your ISBN/OCLC/LCCN query as a big OR instead of a bunch of separate queries, if that helps. I'm hesitant to "officially" support that syntax in the API but if it's an important type of query then I could come up with some json-ified version.

Revision history for this message

aaron r (arubinstein) wrote on 2008-07-22:

#31

A couple very minor points...

1. Is it possible to return the cover images with the expanded results?

2. Filters like the "has_fulltext" is great. Is there one planned for language?

Thanks

Revision history for this message

solrize (solrize) wrote on 2008-07-22:

#32

1. You mean you want the url for the cover image jpeg? Hmm. Let me get back to you on that. I don't think it's in the index now but it might be possible to add it in the next rebuild.

2. Unfortunately most of our catalog records don't specify the book's language. Trying to filter on language will usually get an empty result set, even if you're just looking for books in english.

Revision history for this message

aaron r (arubinstein) wrote on 2008-07-23:

#33

1. I'm sorry for not being totally clear. It looks like user-added cover images are returned with a "/get" query using an OpenLibrary #.

Here's an example query that returns the coverimage:
http://openlibrary.org/api/get?key=/b/OL7168569M&prettyprint=true&text=true

When a title with a user-added image comes up in a query using the expanded format, the coverimage information is not included. After I posted this question, I noticed that the results in expanded format are also missing the "isbn_10" that is returned with a /get query for a specific title. Since those seem to be the identifiers used for the jpegs that were gathered from google books (I assume), I've been using them to generate urls for the cover art. I guess this is all moot if you're able to add the image urls directly into the index, which would be great.

2. Too bad... I'm working for an organization that's part of the OCA and will be contributing a large amount of books in one particular language. When the books are submitted, we'll make sure that the language code is included in the metadata. It would be great if there was a way to filter on language for that reason. I hate to ask for special favors but it seems like something that in an ideal world could be very useful for others as well.

Revision history for this message

solrize (solrize) wrote on 2008-07-23:

#34

1. I'm not sure why the query
http://apollonius.us.archive.org:9071/api/search?q={%22query%22:[%22yiddish dictionary abelson%22]}&prettyprint=true&format=expanded
doesn't return the cover attribute. Anand, if you're reading this, any idea?

2. You can send in arbitrary lucene queries so you can say (for example) languages:eng (use the 3-letter MARC code for the language). The problem is that it returns few books because the language data is not there for the search to find. But if you add a bunch of records with language fields, the search engine can see those fields.

Revision history for this message

aaron r (arubinstein) wrote on 2008-07-23:

#35

I can get it to work for English books but this query returned no results:

http://apollonius.us.archive.org:9071/api/search?q={"query":"languages:yid"}&prettyprint=true&format=expanded

while this query returns items that clearly have "yid" in the language field:

http://apollonius.us.archive.org:9071/api/search?q={"query":"yid"}&prettyprint=true&format=expanded

Please feel free to email me (arubinstein at bikher dot org) if this is no longer relevant to the discussion.

Thanks!

Revision history for this message

solrize (solrize) wrote on 2008-07-23:

#36

Thanks, I think I see the problem and will fix it in the next index build, hopefully within a week or so.

Revision history for this message

solrize (solrize) wrote on 2008-07-23:

#37

I have opened a new tracker item, bug #251276, for the (nonexistent right now) fulltext search API. Main reason for having it in a separate item is that this one is getting rather long. The two can be seen as related. Please put any discussion or requests about the fulltext API in the other item rather than here. I expect to be doing some work on it pretty soon.

description:

updated

Revision history for this message

solrize (solrize) wrote on 2008-07-24:

#38

I just checked the stuff described above to the main hg repo. Any reason not to pull it to staging?

Revision history for this message

Jonathan Narwold (jonathan-narwold) wrote on 2008-12-13:

#39

Any progress on #1 at the top (showing full book information instead of just IDs)? I tried some of the links in this post, and it looks like the test server is down at the moment. Any chance this could be moved to the main OpenLibrary search API?

Revision history for this message

solrize (solrize) wrote on 2008-12-13:

#40

Unless I'm mistaken we did push out that code to production a while back, but right now I see that trying the example strings above causes crashes the json encoder, so it looks like there has been some kind of regression. I'll look into it. Thanks for bringing this up, it has been on the back burner for a while.

Revision history for this message

Nate Irwin (nate-nateirwin) wrote on 2009-03-03:

#41

I'm interested in any progress on this, as well.

Trying something like this http://openlibrary.org/api/search?q={"query":"Felix Klein"}&format=expanded does not work.

Am I just doing something wrong?

Revision history for this message

solrize (solrize) wrote on 2009-03-03:

#42

That query crashes with a stack dump, which is automatically a server side bug. If something is wrong with the user query, the server should at worst send a reasonable error message.

Looking at the stack trace, the json serializer is crashing on an input that looks valid at first glance. I'm in the middle of something else right now but will try to fix this soon.

Revision history for this message

Nate Irwin (nate-nateirwin) wrote on 2009-03-12:

#43

Have you been able to take a look at this? I want to integrate Open Library with my application, but I have to be able to get more information back from the search query. Otherwise the search is just too inefficient to integrate. Thanks!

Revision history for this message

i30817 (i30817) wrote on 2009-04-25:

#44

If you don't mind, besides the list of results (that doesn't appear to work yet) i would like something like the amazon not funcionalilty to filter audiobooks and comics for example. If your database is built correctly this may help queries (Amazon has a book index for everything and thus must disambiguate in search).

What is going to be the return format for the List? I need a way to know what search corresponds to what result.

Revision history for this message

i30817 (i30817) wrote on 2009-04-25:

#45

Also, i would be good if the first result on a search was the one that had a book cover. I know this is not always possible even in the amazon search, but that's what i'm using the search for - finding the OLID's for a book filename.

Revision history for this message

solrize (solrize) wrote on 2009-04-25:

#46

Nate Irwin 3/12, sorry, I didn't notice your question when you posted it. If I don't get back to you within a few days, please remind me, either here or by email.

i30817, you can use "not" in Lucene queries--does that take care of what you're asking for? The one concern that I have is wanting to keep open the possibility of moving away from Lucene, so I don't want to promise long-term support for every weird feature of Lucene query syntax. But for now and the immediately foreseeable future, it should work, and any replacement should certainly support "not" in some form.

Revision history for this message

i30817 (i30817) wrote on 2009-04-27: Re: [Bug 236947] Re: search API improvements

#47

I see - it works. There is still one small problem:
besides multiple requests, i still have to check all olid's images returned
to see if there is a viable image there. If the returned list could be
sorted acording to having images or not or if it returned a parameter saying
if there is a image or not this could be avoided - but if you don't want to
change this, its not a great problem.

Revision history for this message

Anand Chitipothu (anandology) wrote on 2009-04-27:

#48

2009/4/27 i30817 <email address hidden>:
> I see - it works. There is still one small problem:
> besides multiple requests, i still have to check all olid's images returned
> to see if there is a viable image there. If the returned list could be
> sorted acording to having images or not or if it returned a parameter saying
> if there is a image or not this could be avoided - but if you don't want to
> change this, its not a great problem.

search engine doesn't know about the availability of covers. You have
to use a separate query to find that.

Revision history for this message

i30817 (i30817) wrote on 2009-04-27: Re: search API improvements

#49

How about a author query parameter? I'm seeing results of Authors who write about other Authors and books.

James Joyce gets a bucketful of leaches for example.

Revision history for this message

solrize (solrize) wrote on 2009-04-27:

#50

Anand, Is there some reason the presence of covers isn't mentioned in the json records? If the info is there then I can index it.

i30817, you can use "authors:(james joyce)". Is that what you want?

Revision history for this message

Anand Chitipothu (anandology) wrote on 2009-04-27: Re: [Bug 236947] Re: search API improvements

#51

2009/4/27 solrize <email address hidden>:
> Anand, Is there some reason the presence of covers isn't mentioned in
> the json records? If the info is there then I can index it.

we don't have cover info in the json records. there are stored separately.
Here is a related issue: https://bugs.launchpad.net/openlibrary/+bug/347897

Revision history for this message

i30817 (i30817) wrote on 2009-04-27: Re: search API improvements

#52

does the "authors:(james joyce)" work with multiple authors? Do i need to put a , or something between authors?

Revision history for this message

solrize (solrize) wrote on 2009-04-27:

#53

You can do arbitrary boolean queries using AND, OR, and NOT:

authors:(james joyce) OR authors:(anton chekhov)

Revision history for this message

i30817 (i30817) wrote on 2009-04-27:

#54

Thanks. Can you tell me why does this search:
"title:(Holidays are Hell) authors:(Kim Harrison ) OR authors:( Lynsay Sands ) OR authors:( Vicki Pettersson ) OR authors:( Marjorie Liu)"

Gives me another unrelated matches by other authors?

Revision history for this message

solrize (solrize) wrote on 2009-04-27:

#55

It looks like "are" is a stopword, so ignored in "holidays are hell". That gets all books with "holidays" and "hell" in the title, including a bunch of editions of "Holidays in Hell" by P. J. O'Rourke, plus " Hell for the Holidays" by Chris Grabenstein etc.

I think I see what you are trying to do:

title:(Holidays are Hell) AND authors:(Kim Harrison)

or the more complete version:

title:(Holidays are Hell) AND (authors:(Kim Harrison ) OR authors:( Lynsay Sands ) OR authors:( Vicki Pettersson ) OR authors:( Marjorie Liu))

In general, if you want to use fancy Lucene syntax, you should look at the Lucene docs for how to do it.

Revision history for this message

i30817 (i30817) wrote on 2009-04-27: Re: [Bug 236947] Re: search API improvements

#56

Ah, ok. I think i'm golden now.

Revision history for this message

i30817 (i30817) wrote on 2009-04-30: Re: search API improvements

#57

Say why does the search:
title:(Roadside Picnic) AND (authors:(Arkady) OR authors:(Boris Strugatsky))
give nothing while
title:(Roadside Picnic) AND authors:(Arkady) OR authors:(Boris Strugatsky)
gives the book, if the first search is more correct (i'm not doubting - i've seen it's more correct in other cases).

Revision history for this message

solrize (solrize) wrote on 2009-04-30:

#58

It is a little bit odd that the stored author fields for some of the books relevant to that query are missing. I'm in the process of reindexing the search engine so will try again when the new index is done.

Let's try to keep this tracker item on the topic of API improvements. For other issues, please feel free to open a separate item. Thanks.

Revision history for this message

solrize (solrize) wrote on 2009-05-04:

#59

To not leave the question hanging, that query seems to work with the new index, which is now on the staging server. It should go on production in the next day or so.

George (george-archive) on 2009-05-04

summary:

- search API improvements
+ FR: search API improvements

Revision history for this message

Jonathan Narwold (jonathan-narwold) wrote on 2009-08-01: Re: FR: search API improvements

#60

This full-text search function STILL does not appear to work on the old JSON api, and I don't see an equivalent search function in the new Restful api. Could someone give me an update on what's available?

Revision history for this message

Jonathan Narwold (jonathan-narwold) wrote on 2009-08-01:

#61

Just for clarification - I didn't mean full-text in the sense that you're searching the text of a book. I'm referring to full book details (as opposed to just keys).

Revision history for this message

solrize (solrize) wrote on 2009-08-02:

#62

Jonathan, thanks for pinging this. format=expanded is definitely still broken. I'll see if I can fix it.

Are you willing to help write a spec for the restful api? One reason it's slid is that I'm not sure exactly what it should do.

Revision history for this message

Anand Chitipothu (anandology) wrote on 2009-08-02: Re: [Bug 236947] Re: FR: search API improvements

#63

2009/8/2 solrize <email address hidden>

> Jonathan, thanks for pinging this. format=expanded is definitely still
> broken. I'll see if I can fix it.
>
> Are you willing to help write a spec for the restful api? One reason
> it's slid is that I'm not sure exactly what it should do.

The search API must be exactly same as the query API.

http://openlibrary.org/dev/docs/restful_api#query

Please let me know if you need any more info.

Revision history for this message

solrize (solrize) wrote on 2009-08-28: Re: FR: search API improvements

#64

It looks like format=expanded broke because the format of author objects changed. I've made a patch to fix this and will try to deploy it.

http://github.com/openlibrary/openlibrary/commit/474e0b179e816ef0508c1cef0b1ca4984dafabe7

Revision history for this message

George (george-archive) wrote on 2009-11-17:

#65

Anand - are you calling this resolved?

Changed in openlibrary:
assignee:	solrize (solrize) → Anand Chitipothu (anandology)

Revision history for this message

sancho (sancho) wrote on 2010-08-02:

#66

Hi,

Does the restful api support text searching yet?

What i'd like to be able to do is pass a string to the search and return all matching books e.g.

http://openlibrary.org/search.json?q=harry potter&*=&limit=10&offset=1

OR

http://openlibrary.org/search.json?q={"title":"harry potter" OR "authors:"harry potter"}&*=&limit=10&offset=1

This would return all details for matching books.

I'd even be happy having to do 2 queries e.g.

http://openlibrary.org/query.json?type=/type/edition&title="harry potter"&*=&limit=10&offset=1

http://openlibrary.org/query.json?type=/type/edition&authors="harry potter"&*=&limit=10&offset=1

As you can see from the links below both should return books

http://openlibrary.org/search?q=harry+potter (returns both title and author matches)

http://openlibrary.org/search?title=harry+potter

http://openlibrary.org/search?author=harry+potter

Thanks
Simon

Revision history for this message

Anand Chitipothu (anandology) wrote on 2010-08-02:

#67

Sorry Simon, there isn't a way yet to access search though API.

George (george-archive) on 2010-08-02

Changed in openlibrary:
assignee:	Anand Chitipothu (anandology) → Edward Betts (edwardbetts)
assignee:	Edward Betts (edwardbetts) → Anand Chitipothu (anandology)

Revision history for this message

sancho (sancho) wrote on 2010-08-02:

#68

Thanks George,

Any idea when this functionality might be available?

At the moment the multiple queries required to get date via the json api is quite expensive.

Is there a possibility that paging could be added to the json search, as it appears it is only possible at the moment to get the first 20 results e.g.

http://openlibrary.org/api/search?q={"query":"The%20Truth","limit":"10","offset":"1"}

Cheers
Simon

Anand Chitipothu (anandology) on 2010-08-11

Changed in openlibrary:
status:	In Progress → Confirmed

Revision history for this message

Anand Chitipothu (anandology) wrote on 2010-08-11:

#69

You can try using our experimental search API.

http://openlibrary.org/search.json?title=The+Truth&limit=10

Please keep in mind that this is experimental and can change any time soon.

Revision history for this message

sancho (sancho) wrote on 2010-08-11:

#70

Hi Anand,

Thanks, this is exactly the sort of thing I was after :-)

If your open to comment the only thing that could be tweaked is the text section (example below) could do with keys for the values.

Thanks again.
Simon

"text": [
    "OL8893758W",
    "Terry Pratchett",
    "Guilty Of Literature",
    "OL8684550M",
    "OL24298975M",
    "Pocket Essentials",
    "Old Earth Books",
    "188296831X",
    "9781882968312",
    "9781848396852",
    "OL3062820A",
    "Andrew M. Butler",
    "OverDrive",
    "Language Arts",
    "Reference",
    "Nonfiction"
   ],

Revision history for this message

zombiepig (nyall-zombiepigs) wrote on 2010-08-18:

#71

Is there a similar interface to http://openlibrary.org/search.json?title=The+Truth&limit=10 but for an author search instead?

Revision history for this message

George (george-archive) wrote on 2010-09-01:

#72

Another feature request:

Comment: Hi Open Library,
>
> Thanks for providing such a great service to the world. We at the
> National
> Library of Australia are periodically downloading a full dump of your
> edition data, in JSON format. From this full dump, we would like to
> extract just the records where the full text is freely available online,
> for inclusion in our search engine Trove.
> http://trove.nla.gov.au/book/result?q=%22open+library%22&l-availability=y%2Ff&s=20
>
> We currently do this by checking for an ocaid, however we have noticed
> that this includes records where the only online content is a locked
> daisy
> format, which is not available to users in Australia.
>
> DAISY format example: http://openlibrary.org/books/OL352392M/Spin_cycle
>
> We would like some way to filter out these records, however currently
> they don't seem to include enough information to allow us to do this. Is it
> possible for you to include more information in your records so that we
> can identify these records? For example, perhaps the formats the item is
> available in could be included.
>
> Looking forward to your reply,
> Joanna Meakins and Kent Fitch
> Trove Team
> National Library of Australia

Changed in openlibrary:
assignee:	Anand Chitipothu (anandology) → George (george-archive)
assignee:	George (george-archive) → Edward Betts (edwardbetts)
milestone:	none → search-september-release
importance:	Wishlist → High
summary:	- FR: search API improvements + Release Search API enhancements

Revision history for this message

George (george-archive) wrote on 2010-09-01:

#73

Edward - this bug is massive now - feel free to close and start a fresh one.

George (george-archive) on 2011-03-04

Changed in openlibrary:
milestone:	none → general-bucket

Open Library

Release Search API enhancements

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches