Investigate why we only have 400k IA full text works results

Bug #551354 reported by George
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Open Library
In Progress
High
Edward Betts

Bug Description

As per: http://upstream.openlibrary.org/search?q=*%3A*&has_fulltext=true - 394,258 hits

Seems like there should be more... like, maybe double?

Revision history for this message
George (george-archive) wrote :

Suggestion...

Take 100 random IA IDs and send them to openlibrary.org/ia/ID and see if/when we 404.

Perhaps we should relax the "ANDs" on when to throw stuff out?

Changed in openlibrary:
milestone: none → upstream-to-www
assignee: nobody → Edward Betts (edwardbetts)
importance: Undecided → Critical
Revision history for this message
Edward Betts (edwardbetts) wrote :

What do you mean by relax the "ANDs"?

Revision history for this message
George (george-archive) wrote :

Good question. It came in a flurry of suggestions from Brewster about the ingest process. Something about being a little too strict on what docs to include and what to reject. Feel free to ignore it, or ping him for clarification :)

George (george-archive)
Changed in openlibrary:
status: New → In Progress
Changed in openlibrary:
milestone: upstream-to-www → general-bucket
importance: Critical → Medium
importance: Medium → High
Revision history for this message
George (george-archive) wrote :

646,502 hits

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.