import new fulltext into SE
Bug #134164 reported by
solrize
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Open Library |
In Progress
|
High
|
solrize |
Bug Description
Current searchable fulltext all comes from an OCA snapshot taken in April, and quite a few more books have been added to OCA since then. At minimum the new books should be imported into the SE and this should be redone periodically. Best would be a way to make this happen automatically, either in real time, nightly, weekly, or whatever. But if there's a repeatable manual process, that's not so bad.
Right now there's no natural API to detect new OCA contents and the current snapshot was done with a bunch of hand-operated spidering scripts starting from an archive.org solr search. Maybe some improvements on the OCA side are possible.
Changed in openlibrary: | |
assignee: | nobody → solrize |
importance: | Undecided → Medium |
Changed in openlibrary: | |
status: | New → Confirmed |
To post a comment you must log in.
Per discussion with Siznax:
petabox/ www/common/ WorkBase. inc calls a functio updateSearchEng ine() when a new book appears (this is to update the www.archive.org search engine, not the openlibrary engine). updateSearchEngine lives in petabox/ www/common/ SearchEngine. inc and the relevant function is update(). Siznax suggests subclassing SearchEngine but isn't sure this is feasible (might have to write a new class). Will also have to discuss with Tracey any changes to this code.