I'm planning to generate 3 types of dumps with OL data.
Open Library Dump:
description: Latest revisions of all documents
filename: ol_dump_${date}.txt.gz
columns: key, type, revision, json
frequency: monthly
sort-order: unspecified
Open Library Complete Dump:
description: All revisions of all documents
filename: ol_cdump_${date}.txt.gz
columns: key, type, revision, json
frequency: monthly
sort-order: unspecified
Open Library Incremental Dump:
description: All revisions of all documents modified in a given day
filename: ol_idump_${date}.txt.gz
columns: key, type, revision, json
frequency: daily
sort-order: modification time
Each of these dump will be stored as an item in the internet archive
cluster.
URL Format:
Even though these files are stored in IA, there will be a
openlibrary.org/* URL for each file.
2010-03-29
---------------
I'm planning to generate 3 types of dumps with OL data.
Open Library Dump: ${date} .txt.gz
description: Latest revisions of all documents
filename: ol_dump_
columns: key, type, revision, json
frequency: monthly
sort-order: unspecified
Open Library Complete Dump: ${date} .txt.gz
description: All revisions of all documents
filename: ol_cdump_
columns: key, type, revision, json
frequency: monthly
sort-order: unspecified
Open Library Incremental Dump: ${date} .txt.gz
description: All revisions of all documents modified in a given day
filename: ol_idump_
columns: key, type, revision, json
frequency: daily
sort-order: modification time
Each of these dump will be stored as an item in the internet archive
cluster.
URL Format:
Even though these files are stored in IA, there will be a
openlibrary.org/* URL for each file.
I'm considering the following two url formats.
Option#1: openlibrary. org/dumps/ ol_dump_ 2010-03- 31.txt. gz openlibrary. org/dumps/ ol_cdump_ 2010-03- 31.txt. gz openlibrary. org/dumps/ ol_idump_ 2010-03- 31.txt. gz
http://
http://
http://
Option#2: openlibrary. org/dumps/ 2010/03/ ol_dump_ 2010-03- 31.txt. gz openlibrary. org/dumps/ 2010/03/ ol_cdump_ 2010-03- 31.txt. gz openlibrary. org/dumps/ 2010/03/ ol_idump_ 2010-03- 31.txt. gz
http://
http://
http://
Which one should we pick?
Any other suggestions/ feedback?
Anand