Comment 3 for bug 277512

Revision history for this message
Jeff Suttor (jeff-suttor) wrote :

MARCXML is available using LCCN Permalinks, http://lccn.loc.gov/#n9

to test, a simple lc_crawl.py was used to get the first 1k records for 2007. results:

* pymarc used to parse MARCXML and convert to MARC21

* 88 records returned either:
   * <error xmlns:marc="http://www.loc.gov/MARC21/slim">record not found</error>
   * <error xmlns:marc="http://www.loc.gov/MARC21/slim">Temporarily Unavailable.<a href="http://lcweb2.loc.gov/lccn/2007######">Retry</a></error>

* crawl rates need to be throttled to 1 req/2 sec or lccn server returns 500 Server Error responses for the next several requests

if this is of value, the script can be enhanced:

  * better HTTP error recovery, e.g. retry
  * better logging
  * explicit User-Agent: for transparent crawl, e.g. URI to this bug

and a larger test run.

lccn permalink faq, http://lccn.loc.gov/#5 (#5-7) indicates that non-Roman data, authority records and some misc records are not currently available with lccn permalinks. relevant?