Comment 3 for bug 240780

Revision history for this message
Edward Betts (edwardbetts) wrote :

Trying to get some numbers, built a list of unique author names from the Amazon data that, I skipped any that contained commas.

Unique Amazon authors = 3073755 (this includes corporate authors)

I wrote some code to read the 100 and 700 fields from MARC records, and fed most of the MARC we have on archive.org, at least 20 million records.

Unique MARC authors = 4425724 (this shouldn't contain corporate authors)

Then I compared the two data sets and got these results:

Match in western order = 1448220
Match in eastern order = 6053
Match in both orders = 17806

I'll upload the data soon. The eastern order includes false positives.