Comment 2 for bug 286995

Revision history for this message
George (george-archive) wrote :

(From Tom's post to ol-discuss, 4/29)

Rather than just complain about the data quality, here's a small
contribution to help improve it. I put together a little application
which shows all authors who have multiple Open Library author records,
as identified by the Freebase community.

You can find it at http://ol-dupes.freebaseapps.com/authors

The list is sorted by from most to least number of duplicates and each
entry is linked to all OL records as well as the Freebase record.
Freebase uses a slightly different schema, so the authors are linked
to Books ("works" in FRBR lingo) and those are linked to Book Editions
which equate to the Open Library book records.

I also included all the known names for the authors. Most of these
will have come from the merger of multiple records. I haven't looked
in detail, but it wouldn't surprise me if some of the bad names are
from munging on the Freebase side of things. You can see what the
name associated with each OL record is by clicking on the ID link.

The app is better for browsing than actual data cleanup, but I'd be
happy to show someone how to extract the data in a form that could be
used in the OL processes (or do it for you). The app is BSD licensed
so anyone's free to hack on it as well.

Tom

(Thanks, Tom!!)