FR: Create books.txt XML file from edit page

Bug #343744 reported by Matt Work
2
Affects Status Importance Assigned to Milestone
Open Library
Won't Fix
High
Matt Work

Bug Description

when on a books edit page, we would like to provide the user with the option to generate a books.txt (definition to folow) XML style file

example edit page: http://openlibrary.org/b/OL7277480M/Cryptonomicon?m=edit
could simply have another button or link that generates file for user

an idea for books.txt (from www.lexcycle.com/developer)

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <title>Online Catalog</title>
  <id>urn:uuid:09aeccc1-c633-aa48-22ab-000052cbd81c</id>
  <updated>2008-09-12T00:44:20+00:00</updated>
  <link rel="self" type="application/atom+xml" href="http://www.billybobsbooks.com/catalog/top.atom"/>
  <link rel="search" title="Search Billy Bob's Books" type="application/atom+xml" href="http://www.billybobsbooks.com/catalog/search.php?search={searchTerms}"/>
  <author>
    <name>Billy Bob</name>
    <uri>http://www.billybobsbooks.com</uri>
    <email><email address hidden></email>
  </author>
  <entry>
    <title>1984</title>
    <content type="xhtml">
      <div xmlns="http://www.w3.org/1999/xhtml"> Published: 1949 Subject: Novels Language: en</div>
    </content>
    <id>urn:billybobsbooks:1166</id>
    <author>
      <name>Orwell, George</name>
    </author>
    <updated>2008-09-12T00:44:20+00:00</updated>
    <link type="application/epub+zip" href="http://www.billybobsbooks.com/book/1984.epub"/>
    <link rel="x-stanza-cover-image-thumbnail" type="image/png" href="http://www.billybobsbooks.com/book/1984.png"/>
    <link rel="x-stanza-cover-image" type="image/png" href="http://www.billybobsbooks.com/book/1984.png"/>
  </entry>
  <entry>
    <title>The Art of War</title>
    <content type="xhtml">
      <div xmlns="http://www.w3.org/1999/xhtml">Published: -500 Subject: Non-Fiction Language: en</div>
    </content>
    <id>urn:billybobsbooks:168</id>
    <author>
      <name>Sun Tzu</name>
    </author>
    <updated>2008-09-12T00:44:20+00:00</updated>
    <link type="application/epub+zip" href="http://www.billybobsbooks.com/book/artofwar.epub"/>
    <link rel="x-stanza-cover-image-thumbnail" type="image/png" href="http://www.billybobsbooks.com/book/artofwar.png"/>
    <link rel="x-stanza-cover-image" type="image/png" href="http://www.billybobsbooks.com/book/artofwar.png"/>
  </entry>
</feed>

Tags: books.txt
Matt Work (mwork)
Changed in openlibrary:
milestone: none → may-release
Revision history for this message
raj (raj-archive) wrote :

Make every edition page have a <link rel="alternate" type="application/atom+xml"> element that links to an atom feed for that edition.

The atom feed should correspond to what Lexcycle has defined: http://www.lexcycle.com/developer

Talk to Peter to make sure we are going this right.

Changed in openlibrary:
assignee: nobody → edward-debian
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Edward Betts (edwardbetts) wrote :

What is the problem we are trying to solve?

Is the JSON at http://openlibrary.org/b/OL7277480M.json not good enough?

Revision history for this message
Edward Betts (edwardbetts) wrote :

Who is Peter?

Revision history for this message
solrize (solrize) wrote :

Peter Brantley is a publishing expert (among other things) who recently joined the IA. The idea is to generate an Atom XML record for each book and put a link on the edition page, using the Lexcycle format that Peter apparently is familiar with.

I think I see how to do this: basically extend the edition plugin to emit the contents of the edition node in XML format, and add a query format that runs the extension. Raj assigned this to you because he didn't like my initial idea of doing it in the solr update daemon (which was a dumb idea, but was the first thing I thought of because I'm already converting OL data to XML that way). Maybe I should ask for it back.

Revision history for this message
Edward Betts (edwardbetts) wrote :

I'm just wondering what is going to be consuming the books.txt

Revision history for this message
Matt Work (mwork) wrote :

books.txt may not be the final name of the file, but the idea is that a search engine would be consuming the information

Revision history for this message
Edward Betts (edwardbetts) wrote :

An existing search engine that reads this format, or a new search engine?

Revision history for this message
solrize (solrize) wrote :

The idea is to export a standard format that other book cataloguers could index in their own search engines. (So is this really about re-inventing MARC?)

Revision history for this message
solrize (solrize) wrote :

Edward, I can take this bug unless you're eager to do it.

Revision history for this message
Aaron Swartz (aaronsw) wrote : Re: [Bug 343744] Re: FR: Create books.txt XML file from edit page

Let's keep Edward on it; he's distant enough from the insanity to push
back on it until the request is sane.

Revision history for this message
raj (raj-archive) wrote :

Matt, I think the Atom feeds should be on archive.org details pages, not OpenLibrary pages.

OpenLibrary doesn't have any books, just links to books.

The books we've scanned are on archive.org, so that's where the books.txt thing should live.

OpenLibrary might be useful for indexing the books.txt files out on the net, but most likely that would be a different project, since OL is just fed by marc records and doesn't do crawling.

Changed in openlibrary:
assignee: edward-debian → mwork
Revision history for this message
Matt Work (mwork) wrote :

I agree Raj. Can we move this to Archive.org?

Changed in openlibrary:
status: Confirmed → Won't Fix
Revision history for this message
raj (raj-archive) wrote :

We did this as part of the BookServer project. It's amazing to see what we got done on BookServer in the eight months since this bug was filed!

Revision history for this message
raj (raj-archive) wrote :

Open Library now supports OPDS for edition records!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.