import infobase logs to solr

Bug #244359 reported by solrize
This bug report is a duplicate of:  Bug #267853: Real time search update crashed again. Edit Remove
2
Affects Status Importance Assigned to Milestone
Open Library
In Progress
High
solrize

Bug Description

This is a rather old scheme that was implemented in tdb but more or less abandoned for performance reasons then. It's in infobase now and should hopefully work better than before. So, revive or reimplement the code that propagates database changes to Solr. This should include updating the fulltext index when books with ocaid's are added (which wasn't implemented before).

solrize (solrize)
Changed in openlibrary:
assignee: nobody → solrize
importance: Undecided → High
status: New → In Progress
Revision history for this message
Anand Chitipothu (anandology) wrote : Re: [Bug 244359] [NEW] import infobase logs to solr

> This is a rather old scheme that was implemented in tdb but more or less
> abandoned for performance reasons then. It's in infobase now and should
> hopefully work better than before.

There were performance reasons because the log parsing involved lot of
database queries. Now the log is a json string, which doesn't require
any db access for parsing.
So there are absolutely no performance issues.

Revision history for this message
solrize (solrize) wrote :

That is a good point, reading the logs doesn't take any db queries, so it's feasible to load the search engine completely from log records. Is there a way to generate a log for the entire infobase contents? That would let me stop dealing with bulk imports which is currently a messy semi-manual process.

Also, is there a big performance cost of generating those logs during large infobase imports?

Revision history for this message
Anand Chitipothu (anandology) wrote : Re: [Bug 244359] Re: import infobase logs to solr

> Also, is there a big performance cost of generating those logs during
> large infobase imports?

Plan is to generate these logs even for large imports.

There will be many operations that run on logs.

* database backup
* solr import
* generating feeds for books modified per day/week/month etc.

Revision history for this message
solrize (solrize) wrote :

My question about generating a log is whether there's a way to generate one for the existing infobase contents, not just new imports that haven't been done yet. Any advice?

Revision history for this message
Anand Chitipothu (anandology) wrote :

On Tue, Jul 8, 2008 at 4:24 AM, solrize <email address hidden> wrote:
> My question about generating a log is whether there's a way to generate
> one for the existing infobase contents, not just new imports that
> haven't been done yet. Any advice?

I am working on generating a JSON dump of the entire database. You
should be able to use that.

Revision history for this message
solrize (solrize) wrote :

Is a dump like this being made and/or available on ia311530 now?

Revision history for this message
Anand Chitipothu (anandology) wrote :

On Sat, Jul 12, 2008 at 5:42 AM, solrize <email address hidden> wrote:
> Is a dump like this being made and/or available on ia311530 now?

not yet.

Revision history for this message
solrize (solrize) wrote :

Any news about making a dump like this? Is it a high-overhead operation? I'm going to want it pretty soon to reindex the catalog and fulltext.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.