FR: Create books.txt XML file from edit page

Bug #343744 reported by Matt Work on 2009-03-16
2
Affects Status Importance Assigned to Milestone
Open Library
High
Matt Work

Bug Description

when on a books edit page, we would like to provide the user with the option to generate a books.txt (definition to folow) XML style file

example edit page: http://openlibrary.org/b/OL7277480M/Cryptonomicon?m=edit
could simply have another button or link that generates file for user

an idea for books.txt (from www.lexcycle.com/developer)

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <title>Online Catalog</title>
  <id>urn:uuid:09aeccc1-c633-aa48-22ab-000052cbd81c</id>
  <updated>2008-09-12T00:44:20+00:00</updated>
  <link rel="self" type="application/atom+xml" href="http://www.billybobsbooks.com/catalog/top.atom"/>
  <link rel="search" title="Search Billy Bob's Books" type="application/atom+xml" href="http://www.billybobsbooks.com/catalog/search.php?search={searchTerms}"/>
  <author>
    <name>Billy Bob</name>
    <uri>http://www.billybobsbooks.com</uri>
    <email><email address hidden></email>
  </author>
  <entry>
    <title>1984</title>
    <content type="xhtml">
      <div xmlns="http://www.w3.org/1999/xhtml"> Published: 1949 Subject: Novels Language: en</div>
    </content>
    <id>urn:billybobsbooks:1166</id>
    <author>
      <name>Orwell, George</name>
    </author>
    <updated>2008-09-12T00:44:20+00:00</updated>
    <link type="application/epub+zip" href="http://www.billybobsbooks.com/book/1984.epub"/>
    <link rel="x-stanza-cover-image-thumbnail" type="image/png" href="http://www.billybobsbooks.com/book/1984.png"/>
    <link rel="x-stanza-cover-image" type="image/png" href="http://www.billybobsbooks.com/book/1984.png"/>
  </entry>
  <entry>
    <title>The Art of War</title>
    <content type="xhtml">
      <div xmlns="http://www.w3.org/1999/xhtml">Published: -500 Subject: Non-Fiction Language: en</div>
    </content>
    <id>urn:billybobsbooks:168</id>
    <author>
      <name>Sun Tzu</name>
    </author>
    <updated>2008-09-12T00:44:20+00:00</updated>
    <link type="application/epub+zip" href="http://www.billybobsbooks.com/book/artofwar.epub"/>
    <link rel="x-stanza-cover-image-thumbnail" type="image/png" href="http://www.billybobsbooks.com/book/artofwar.png"/>
    <link rel="x-stanza-cover-image" type="image/png" href="http://www.billybobsbooks.com/book/artofwar.png"/>
  </entry>
</feed>

Matt Work (mwork) on 2009-03-19
Changed in openlibrary:
milestone: none → may-release
raj (raj-archive) wrote :

Make every edition page have a <link rel="alternate" type="application/atom+xml"> element that links to an atom feed for that edition.

The atom feed should correspond to what Lexcycle has defined: http://www.lexcycle.com/developer

Talk to Peter to make sure we are going this right.

Changed in openlibrary:
assignee: nobody → edward-debian
importance: Undecided → High
status: New → Confirmed
Edward Betts (edwardbetts) wrote :

What is the problem we are trying to solve?

Is the JSON at http://openlibrary.org/b/OL7277480M.json not good enough?

Edward Betts (edwardbetts) wrote :

Who is Peter?

solrize (solrize) wrote :

Peter Brantley is a publishing expert (among other things) who recently joined the IA. The idea is to generate an Atom XML record for each book and put a link on the edition page, using the Lexcycle format that Peter apparently is familiar with.

I think I see how to do this: basically extend the edition plugin to emit the contents of the edition node in XML format, and add a query format that runs the extension. Raj assigned this to you because he didn't like my initial idea of doing it in the solr update daemon (which was a dumb idea, but was the first thing I thought of because I'm already converting OL data to XML that way). Maybe I should ask for it back.

Edward Betts (edwardbetts) wrote :

I'm just wondering what is going to be consuming the books.txt

Matt Work (mwork) wrote :

books.txt may not be the final name of the file, but the idea is that a search engine would be consuming the information

Edward Betts (edwardbetts) wrote :

An existing search engine that reads this format, or a new search engine?

solrize (solrize) wrote :

The idea is to export a standard format that other book cataloguers could index in their own search engines. (So is this really about re-inventing MARC?)

solrize (solrize) wrote :

Edward, I can take this bug unless you're eager to do it.

Let's keep Edward on it; he's distant enough from the insanity to push
back on it until the request is sane.

raj (raj-archive) wrote :

Matt, I think the Atom feeds should be on archive.org details pages, not OpenLibrary pages.

OpenLibrary doesn't have any books, just links to books.

The books we've scanned are on archive.org, so that's where the books.txt thing should live.

OpenLibrary might be useful for indexing the books.txt files out on the net, but most likely that would be a different project, since OL is just fed by marc records and doesn't do crawling.

Changed in openlibrary:
assignee: edward-debian → mwork
Matt Work (mwork) wrote :

I agree Raj. Can we move this to Archive.org?

Changed in openlibrary:
status: Confirmed → Won't Fix
raj (raj-archive) wrote :

We did this as part of the BookServer project. It's amazing to see what we got done on BookServer in the eight months since this bug was filed!

raj (raj-archive) wrote :

Open Library now supports OPDS for edition records!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers