Comment 5 for bug 128394

Revision history for this message
Karen Coyle (kcoyle) wrote : Re: [Bug 128394] Re: FRBRizing and deduping

deduping and frbr-izing are two different things:

1) deduping: bringing together records for copies of the same edition of
the same book. we do this when new sources (e.g. new libraries) are
added to the database.

2) frbr-izing: bringing together records for different editions of the
same book.

Note that many books are only issued in one edition; frbr-izing affects
a small but very visible part of the bibliographic universe (about 5% is
the estimate). It covers popular works like Shakespeare and Mark Twain;
it should also bring together re-printings and translations with the
original work. think of it as a cluster of books with approximately the
same text, although having been published at different times by
different publishers.

kc

solrize wrote:
> I thought Edward had coded the algorithms and that we had done
> significant de-duping in the current catalog, but that there was more to
> do. I'd like to help with this if I can. As Alexis says, it is a big
> messy task, but the methods involve are also of interest for the search
> stuff I'm doing.
>
> We had a meeting quite a long time back where we discussed this in
> detail and I thought I understood it then, so maybe I'm way behind the
> times now.
>

--
-----------------------------------
Karen Coyle / Digital Library Consultant
<email address hidden> http://www.kcoyle.net
ph.: 510-540-7596 skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234
------------------------------------