1) deduping: bringing together records for copies of the same edition of
the same book. we do this when new sources (e.g. new libraries) are
added to the database.
2) frbr-izing: bringing together records for different editions of the
same book.
Note that many books are only issued in one edition; frbr-izing affects
a small but very visible part of the bibliographic universe (about 5% is
the estimate). It covers popular works like Shakespeare and Mark Twain;
it should also bring together re-printings and translations with the
original work. think of it as a cluster of books with approximately the
same text, although having been published at different times by
different publishers.
kc
solrize wrote:
> I thought Edward had coded the algorithms and that we had done
> significant de-duping in the current catalog, but that there was more to
> do. I'd like to help with this if I can. As Alexis says, it is a big
> messy task, but the methods involve are also of interest for the search
> stuff I'm doing.
>
> We had a meeting quite a long time back where we discussed this in
> detail and I thought I understood it then, so maybe I'm way behind the
> times now.
>
deduping and frbr-izing are two different things:
1) deduping: bringing together records for copies of the same edition of
the same book. we do this when new sources (e.g. new libraries) are
added to the database.
2) frbr-izing: bringing together records for different editions of the
same book.
Note that many books are only issued in one edition; frbr-izing affects
a small but very visible part of the bibliographic universe (about 5% is
the estimate). It covers popular works like Shakespeare and Mark Twain;
it should also bring together re-printings and translations with the
original work. think of it as a cluster of books with approximately the
same text, although having been published at different times by
different publishers.
kc
solrize wrote:
> I thought Edward had coded the algorithms and that we had done
> significant de-duping in the current catalog, but that there was more to
> do. I'd like to help with this if I can. As Alexis says, it is a big
> messy task, but the methods involve are also of interest for the search
> stuff I'm doing.
>
> We had a meeting quite a long time back where we discussed this in
> detail and I thought I understood it then, so maybe I'm way behind the
> times now.
>
-- ------- ------- ------- ------- www.kcoyle. net ------- ------- ------- ------- -
-------
Karen Coyle / Digital Library Consultant
<email address hidden> http://
ph.: 510-540-7596 skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234
-------