Crawl LC catalog for 2007 books
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Open Library |
New
|
Undecided
|
Edward Betts |
Bug Description
The original LC file contained books through the end of 2006. The weekly subscription covers 2008. To obtain the LC books from 2007 we should be able to do a crawl of their catalog.
BACKGROUND:
Every LC book has an LC Catalog Number (LCCN). The first four digits (since 2002 or so) represent the year. Therefore, the records for 2007 will all begin with '2007'. This is followed by 6 digits. We can assume that these begin at '000001' and go forward. There will be about 350,000 books for the year.
METHODS:
We can use either Z39.50, requesting the MARC output based on the record number, or we may be able to use the LC stable URL for each book: http://
Changed in openlibrary: | |
assignee: | nobody → Edward Betts (edwardbetts) |
Looks like we can get MARC XML. For example: http:// lccn.loc. gov/2007000001/ marcxml
This should be relatively easy to load.