setup solr updates

Bug #520387 reported by Anand Chitipothu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Open Library
Fix Released
Critical
Edward Betts

Bug Description

Setup a program to listen changes from Infobase and update solr.

Infobase has an API to read log.

$ curl 'http://ia331526:7000/openlibrary.org/log/2010-02-10:0?limit=1'
{"data": [
{"action": "save", "timestamp": "2010-02-10T00:00:01.754451", "data": {"comment": null, "author": "/user/StatsBot", "ip": "207.241.231.243", "result": {"key": "/admin/stats/2010-02-10", "revision": 1}, "key": "/admin/stats/2010-02-10", "query": {"key": "/admin/stats/2010-02-10", "created": "2010-02-10T00:00:01.754451", "edits": {"total": 0, "bot": 0, "human": 0}, "last_modified": "2010-02-10T00:00:01.754451", "latest_revision": 1, "members": 0, "type": "/type/object", "id": 49159433, "revision": 1}, "machine_comment": null}, "site": "openlibrary.org"}],
"offset": "2010-02-10:558"
}

It returns a JSON response. data contains each executed query executed, timestamp, comment, user details and the result of the query. The result contains the modified keys and that can be used to find the keys to update in solr.

The offset in the response is the a position in the log, which can be used to construct the request to read next entries.

Changed in openlibrary:
milestone: none → upstream
assignee: nobody → Edward Betts (edwardbetts)
importance: Undecided → Critical
Revision history for this message
Edward Betts (edwardbetts) wrote :

How do I know when I've reached the end of a day?

Revision history for this message
Anand Chitipothu (anandology) wrote : Re: [Bug 520387] Re: setup solr updates

On 03-Mar-10, at 12:59 AM, Edward Betts wrote:

> How do I know when I've reached the end of a day?

You don't have to. It will scroll to the next day automatically.

Revision history for this message
Edward Betts (edwardbetts) wrote :

ValueError: No JSON object could be decoded

on this URL:

http://ia331526:7000/openlibrary.org/log/2010-01-20:131022769

Revision history for this message
Edward Betts (edwardbetts) wrote :

This is what the invalid JSON looks like:

"machine_comment": nu{"action": "save_many", "timestamp": "2010-01-20T20:47:28.414898",

Maybe a line was truncated?

Revision history for this message
Edward Betts (edwardbetts) wrote :

Scoll to next day isn't working for me, below is my code and sample output. Is there a bug in my code?

from urllib2 import urlopen
import simplejson

base = 'http://ia331526.us.archive.org:7000/openlibrary.org/log/'

offset = '2010-02-10:0'
while True:
    url = base + offset
    print offset
    offset = simplejson.load(urlopen(url))['offset']

Output:

2010-02-10:0
2010-02-10:1903050
2010-02-10:3226449
2010-02-10:4875475
2010-02-10:6178590
2010-02-10:7389640
2010-02-10:8611084
2010-02-10:9746424
2010-02-10:11075543
2010-02-10:12147195
2010-02-10:13158055
2010-02-10:14742076
Traceback (most recent call last):
  File "walk_update.py", line 10, in <module>
    offset = simplejson.load(urlopen(url))['offset']
  File "/usr/local/lib/python2.6/dist-packages/simplejson-2.0.9-py2.6-linux-x86_64.egg/simplejson/__init__.py", line 267, in load
    parse_constant=parse_constant, **kw)
  File "/usr/local/lib/python2.6/dist-packages/simplejson-2.0.9-py2.6-linux-x86_64.egg/simplejson/__init__.py", line 307, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python2.6/dist-packages/simplejson-2.0.9-py2.6-linux-x86_64.egg/simplejson/decoder.py", line 335, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python2.6/dist-packages/simplejson-2.0.9-py2.6-linux-x86_64.egg/simplejson/decoder.py", line 353, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

Revision history for this message
Anand Chitipothu (anandology) wrote :

Fixed.

Revision history for this message
Edward Betts (edwardbetts) wrote :

I tried running my example code once and I get this output:

2010-02-10:0
2010-02-10:1903050
2010-02-10:3226449
2010-02-10:4875475
2010-02-10:6178590
2010-02-10:7389640
2010-02-10:8611084
2010-02-10:9746424
2010-02-10:11075543
2010-02-10:12147195
2010-02-10:13158055
2010-02-10:14742076
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/usr/lib/python2.5/site-packages/simplejson-2.0.9-py2.5-linux-x86_64.egg/simplejson/__init__.py", line 267, in loa
d
    parse_constant=parse_constant, **kw)
  File "/usr/lib/python2.5/site-packages/simplejson-2.0.9-py2.5-linux-x86_64.egg/simplejson/__init__.py", line 307, in loa
ds
    return _default_decoder.decode(s)
  File "/usr/lib/python2.5/site-packages/simplejson-2.0.9-py2.5-linux-x86_64.egg/simplejson/decoder.py", line 335, in deco
de
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.5/site-packages/simplejson-2.0.9-py2.5-linux-x86_64.egg/simplejson/decoder.py", line 353, in raw_
decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

I ran it again and got this output:

2010-02-10:0
2010-02-10:1903050
2010-02-10:3226449
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/usr/lib/python2.5/site-packages/simplejson-2.0.9-py2.5-linux-x86_64.egg/simplejson/__init__.py", line 267, in load
    parse_constant=parse_constant, **kw)
  File "/usr/lib/python2.5/site-packages/simplejson-2.0.9-py2.5-linux-x86_64.egg/simplejson/__init__.py", line 307, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.5/site-packages/simplejson-2.0.9-py2.5-linux-x86_64.egg/simplejson/decoder.py", line 335, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.5/site-packages/simplejson-2.0.9-py2.5-linux-x86_64.egg/simplejson/decoder.py", line 351, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 399 column 1270 (char 533898)

Third attempt:

2010-02-10:0
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/usr/lib/python2.5/site-packages/simplejson-2.0.9-py2.5-linux-x86_64.egg/simplejson/__init__.py", line 264, in load
    return loads(fp.read(),
  File "/usr/lib/python2.5/socket.py", line 304, in read
    data = self._sock.recv(rbufsize)
  File "/usr/lib/python2.5/httplib.py", line 509, in read
    return self._read_chunked(amt)
  File "/usr/lib/python2.5/httplib.py", line 548, in _read_chunked
    chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: ''

Changed in openlibrary:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.