Scanned books marked as serials aren't included in Open Library

Bug #127793 reported by brewster
10
Affects Status Importance Assigned to Milestone
Open Library
Confirmed
Medium
Edward Betts

Bug Description

Some catalogue records are missing for Douglas Lurton. Editing/reopening this bug as it is an important regression test.

This book is missing from Open Library, because the MARC record says it is a serial:

http://www.archive.org/details/yourlifepopularg21lurtrich

When we include serials in Open Library this item will be included.

Revision history for this message
Aaron Swartz (aaronsw) wrote :

This may be the same + quoting problem.

Changed in openlibrary:
assignee: nobody → solrize
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
solrize (solrize) wrote :

Let's see what happens after the reindex, to get the author text names instead of the tdb names. See also https://bugs.launchpad.net/openlibrary/+bug/126816 second item which describes assigning unique tokens to facet labels, to prevent terms in the facet labels from garbling up the nested searches. Alexis and I just discovered that newegg.com seems to do something similar (using numbers instead of alphabetic strings) for its faceting so that sort of validates the idea.

Revision history for this message
solrize (solrize) wrote :

With the new index, the (douglas lurton) search finds 4 books.

Changed in openlibrary:
status: Confirmed → In Progress
solrize (solrize)
Changed in openlibrary:
status: In Progress → Fix Committed
solrize (solrize)
Changed in openlibrary:
status: Fix Committed → Fix Released
Revision history for this message
solrize (solrize) wrote :

Reopen: I think at least two of these books have been scanned, but they no longer show up as having scans available.

Revision history for this message
solrize (solrize) wrote :

BK brought this up at the books meeting today. I had thought the scanned books were showing up more recently than April but now I'm not absolutely sure. Anyway, archive.org search shows exactly one Douglas Lurton book:

http://www.archive.org/search.php?query=douglas+lurton

finds

http://www.archive.org/details/thepowerofposit00lurtmiss

A similar openlibrary search for "douglas lurton" finds (among others) this book as

http://openlibrary.org/b/OL6071081M

going to the history page for that book and viewing the MARC record shows that it's not the same record as the one in the IA search engine. Examination of the most recent OL Json dump shows that this is the only OL record for the book. Looking at the update log after the json dump doesn't show any new records for the book, but there is some chance that the log is incomplete, so I'm hoping to get a new json dump soon. The current update log still has some malformed records discussed in another bug.

My guess is that we have gotten MARC records for this book from two or more different sources, and deduplication has thrown out all but one of them, which is not the IA one. Any thoughts?

Revision history for this message
solrize (solrize) wrote :
solrize (solrize)
description: updated
Changed in openlibrary:
status: Fix Released → Confirmed
Revision history for this message
solrize (solrize) wrote : Re: Missing "Douglas Lurton" records

I spoke with Edward online last night about this and he agrees about the deduplication theory.

We should make sure during import/deduplication that scan availability doesn't get separated from catalogue records of scanned books, even if we end up using catalogue data from some other source for the general bibliographic info about that book.

Revision history for this message
solrize (solrize) wrote :

Note, of the above Archive identifiers, thepowerofposit00lurtmiss is the only one found in the IA solr.

Revision history for this message
solrize (solrize) wrote :

Turning this over to Edward since it's a data import issue.

Changed in openlibrary:
assignee: solrize → edward-debian
Revision history for this message
Edward Betts (edwardbetts) wrote :
Changed in openlibrary:
status: Confirmed → Fix Released
status: Fix Released → Confirmed
description: updated
summary: - Missing "Douglas Lurton" records
+ Scanned books marked as serials aren't included in Open Library
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.