Comment 4 for bug 286995

George (george-archive) wrote :

On Sat, May 1, 2010 at 1:14 AM, Michael Engel <email address hidden> wrote:
> > Freebase id: /m/05wk45p
> > Author name: Don Dinkmeyer
> > Aliases:
> > Don Dinkmeyer Jr.,Don Dinkmeyer Sr.,Don Sr Dinkmeyer,
> > Open Library records:
> > OL2624799A,OL302305A,OL2757673A,OL2757574A,OL2686700A,
> >
> > Looks like the Junior and the Senior are two different authors, see one example:

Good catch. I certainly didn't mean to imply that I think Freebase is
error-free. I think it's generally higher quality than what's in Open
Library, but not in this case. I think it also provides a nice
combination of machine-powered and human powered-reconciliation
processes. At a minimum though, the listing can be used to identify
areas that need cleanup.

There were actually two Freebase records and six Open Library records
for what is, most likely, two authors:

Freebase name: Don Dinkmeyer http://www.freebase.com/view/m/05wk45p
  Don Dinkmeyer http://openlibrary.org/a/OL2624799A
  Dinkmeyer, Don C. http://openlibrary.org/a/OL302305A
  Don Dinkmeyer Jr. http://openlibrary.org/a/OL2757673A (0 books)
  Don Dinkmeyer Sr. http://openlibrary.org/a/OL2757574A
  Don Sr Dinkeyer http://openlibrary.org/a/OL2686700A

Freebase name: Don C Dinkmeyer http://www.freebase.com/view/m/05wyhcb
  Don C Dinkmeyer http://openlibrary.org/a/OL3821345A

The Don Dinkmeyer Jr author record on Open Library has no books
associated with it, so I'm not even sure why it got created. Some of
the other OL records (e.g. Don Sr Dinkeyer) were obviously munged at
some stage in the processing pipe before getting to Freebase (perhaps
before getting to Open Library too).

It doesn't look like any of the Freebase community edited the
conflated record, so that's all apparently the result of overly
aggressive machine-based merging. I flagged the two separate records
for merger, which has since been voted on and completed, but now comes
the hard part - teasing apart the two authors.

I looked at the LoC and WorldCat and they do not appear to use Jr. and
Sr. at all. They use "Don Dinkmeyer" for the father, presumably
because he was the first and only at the time, and "Don Dinkmeyer,
1958-" for the son. This is apparently a variation on the bizarre
cataloging practices that librarians use, discussed a while back by
Karen. (Why not birth years for both? Why not Sr./Jr.? Why not
...?)

Here are the LoC authority records:

Dinkmeyer, Don C.
[They know the birth date and the fact that he's Sr., but don't
include it in the main heading]
http://authorities.loc.gov/cgi-bin/Pwebrecon.cgi?AuthRecID=2362233&v1=1&HC=1&SEQ=20100501103657&PID=W1x3SwNKrlJizonRsJ0SQ7NKGR91

Dinkmeyer, Don C., 1952-
http://authorities.loc.gov/cgi-bin/Pwebrecon.cgi?AuthRecID=946372&v1=1&HC=1&SEQ=20100501103549&PID=DvTGLGauLzedNKB8tuqgiZXm6K7Y

There's more strangeness in the Open Library records for one of the
books co-authored with Gary McKay, STET
(http://openlibrary.org/books/OL11407090M/Stet). The database lists
the wrong Gary McKay (combat author
http://openlibrary.org/authors/OL370554A/Gary_McKay) on the book, but
if you click through to the author page, the book isn't listed, so the
database is internally inconsistent.

I'm sure if you continued to browse around you could find other
problems, but I'm less concerned about how bad the data is than with
a) how it got that way and, more importantly, b) how it can be cleaned
up. Unfortunately, there doesn't appear to be much forthcoming in the
way of concrete plans.

Tom

p.s. Only tangentially related, but one of the cool things about
Freebase is that it's not limited to books and authors, so you can now
see father and son linked to each other and the other son James S.
Dinkmeyer and the book series article from Wikipedia is linked in as
well. Over time this mesh of data should get denser. It's most
interesting for people whom writing isn't their primary profession -
naval architects who mainly design fast sailboats, but also write
about how to do it, etc.