Comment 8 for bug 128399

Revision history for this message
George (george-archive) wrote : Re: data dump export

2010-03-29
---------------

I'm planning to generate 3 types of dumps with OL data.

Open Library Dump:
     description: Latest revisions of all documents
     filename: ol_dump_${date}.txt.gz
     columns: key, type, revision, json
     frequency: monthly
     sort-order: unspecified

Open Library Complete Dump:
     description: All revisions of all documents
     filename: ol_cdump_${date}.txt.gz
     columns: key, type, revision, json
     frequency: monthly
     sort-order: unspecified

Open Library Incremental Dump:
     description: All revisions of all documents modified in a given day
     filename: ol_idump_${date}.txt.gz
     columns: key, type, revision, json
     frequency: daily
     sort-order: modification time

Each of these dump will be stored as an item in the internet archive
cluster.

URL Format:

Even though these files are stored in IA, there will be a
openlibrary.org/* URL for each file.

I'm considering the following two url formats.

Option#1:
http://openlibrary.org/dumps/ol_dump_2010-03-31.txt.gz
http://openlibrary.org/dumps/ol_cdump_2010-03-31.txt.gz
http://openlibrary.org/dumps/ol_idump_2010-03-31.txt.gz

Option#2:
http://openlibrary.org/dumps/2010/03/ol_dump_2010-03-31.txt.gz
http://openlibrary.org/dumps/2010/03/ol_cdump_2010-03-31.txt.gz
http://openlibrary.org/dumps/2010/03/ol_idump_2010-03-31.txt.gz

Which one should we pick?

Any other suggestions/feedback?

Anand