inconsistent author names: JK Rowling vs. Rowling, JK

Bug #240780 reported by Aaron Swartz
6
Affects Status Importance Assigned to Milestone
Open Library
Confirmed
Undecided
Edward Betts

Bug Description

Edward, I thought we were going to reverse names with commas in them and merge them with the noncomma names. What happened with that?

Aaron Swartz (aaronsw)
Changed in openlibrary:
assignee: nobody → edward-debian
status: New → Confirmed
Revision history for this message
Edward Betts (edwardbetts) wrote :

I was going to do that, but Karen pointed out that it only works for names written in Western name order. Mao Zedong is stored as "Mao, Zedong" and "Mao Zedong". If I reverse the name with a comma it won't match the noncomma name.

I can try looking for a match without reversing the name, if found assume, Eastern name order, if not found assume Western name order.

Revision history for this message
Aaron Swartz (aaronsw) wrote : Re: [Bug 240780] Re: inconsistent author names: JK Rowling vs. Rowling, JK

> I can try looking for a match without reversing the name, if found
> assume, Eastern name order, if not found assume Western name order.

That sounds reasonable. And even if we do just reverse eastern names,
we'll still be right much more of the time. (Even "Mao, Zedong" isn't
exactly correct.)

Revision history for this message
Edward Betts (edwardbetts) wrote :

Trying to get some numbers, built a list of unique author names from the Amazon data that, I skipped any that contained commas.

Unique Amazon authors = 3073755 (this includes corporate authors)

I wrote some code to read the 100 and 700 fields from MARC records, and fed most of the MARC we have on archive.org, at least 20 million records.

Unique MARC authors = 4425724 (this shouldn't contain corporate authors)

Then I compared the two data sets and got these results:

Match in western order = 1448220
Match in eastern order = 6053
Match in both orders = 17806

I'll upload the data soon. The eastern order includes false positives.

Revision history for this message
Edward Betts (edwardbetts) wrote :

Looking at the first two names that Amazon has in Eastern order:

MARC name: Aalto, Madeleine
Amazon name: Aalto Madeleine
Amazon page: http://amazon.com/dp/0810835800
Book cover: http://images.amazon.com/images/P/0810835800.01._SCLZZZZZZZ_.jpg
If you look at the book cover it clearly says Madeleine Aalto

MARC name: Aay, Henk
Amazon name: Aay Henk
Amazon page: http://amazon.com/dp/0761810439
Book cover: http://images.amazon.com/images/P/0761810439.01._SCLZZZZZZZ_.jpg
The book cover says Henk Aay

Revision history for this message
Aaron Swartz (aaronsw) wrote :

Just from having run into this problem myself, I suspect Amazon is
going to be wrong more than the libraries. Let's just reverse all of
them; I'm happy to live with a handful of Zedong Mao's.

Revision history for this message
David Strauss (davidstrauss) wrote :

----- "Aaron Swartz" <email address hidden> wrote:

> > I can try looking for a match without reversing the name, if found
> > assume, Eastern name order, if not found assume Western name order.
>
> That sounds reasonable. And even if we do just reverse eastern names,
> we'll still be right much more of the time. (Even "Mao, Zedong" isn't
> exactly correct.)

That's what I'm doing within my Wikipedia citation tool for Open Library.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.