Intermittent inconsistency in pgtextindex
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
KARL3 |
Fix Released
|
Medium
|
Shane Hathaway |
Bug Description
A while back I was investigating an issue where under some circumstances the catalog would return a docid for a document that was no longer in the system, and therefore couldn't be loaded. Usually this was some non-sensical looking error like "None has no attribute, 'title'".
I wrote a test script, attached, which goes through and checks the catalog for consistency and set it up to run and email me the results every four hours via cron. What I've seen is the vast majority of the time, the catalog is consistent. But occasionally the pgtextindex will have a docid in it that is not in the system. This usually clears up by the next run. So it doesn't seem to be a matter of the database getting corrupted, but seems more a matter of an inconsistent read creeping in.
Changed in karl3: | |
milestone: | m89 → m90 |
Changed in karl3: | |
milestone: | m90 → m89 |
Changed in karl3: | |
status: | New → In Progress |
from repoze. catalog. indexes. field import CatalogFieldIndex catalog. indexes. keyword import CatalogKeywordIndex catalog. indexes. path2 import CatalogPathIndex2 catalog. indexes. text import CatalogTextIndex
from repoze.
from repoze.
from repoze.
from karl.tagging.index import TagIndex
from karl.utils import find_catalog
from karlserve.textindex import KarlPGTextIndex
from pyramid.traversal import find_resource
catalog = find_catalog(root) document_ map to_address. keys(): for_docid( docid), docid to_docid. keys():
dm = catalog.
for docid in dm.docid_
assert dm.address_
for address in dm.address_
assert find_resource(root, address), address
for name, index in catalog.items(): ndex)): fwd_index. values( ):
path = dm.address_ for_docid( docid)
assert path, docid rev_index. keys(): for_docid( docid)
if isinstance(index, (CatalogFieldIndex, CatalogKeywordI
print "Checking", name
for docids in index._
for docid in docids:
for docid in index._
path = dm.address_
assert path, docid
elif isinstance(index, CatalogTextIndex): _docwords. keys(): for_docid( docid)
print "Checking", name
for docid in index.index.
path = dm.address_
assert path, docid
elif isinstance(index, KarlPGTextIndex): for_docid( docid)
print "Checking", name
for docid in index.docids():
path = dm.address_
assert path, docid
elif isinstance(index, CatalogPathIndex2): to_path. keys(): for_docid( docid)
print "Checking", name
for docid in index.docid_
path = dm.address_
assert path, docid
elif isinstance(index, TagIndex): _tagid_ to_obj. values( ): for_docid( doc.item)
print "Checking", name
engine = index.site.tags
for doc in engine.
path = dm.address_
assert path, doc.item
else:
print "Skipping", name