Blowing Xapian max term length corrupts index

Reported by Mikkel Kamstrup Erlandsen on 2011-09-07
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zeitgeist Extensions
High
Mikkel Kamstrup Erlandsen
zeitgeist-extensions (Ubuntu)
Undecided
Unassigned

Bug Description

Xapian has a (not very well documented) max term length of 245 bytes. See fx. http://xapian.org/docs/omega/termprefixes.html. For some reason this is not always gracefully handled inside Xapian and busting that limit may occasionally corrupt the index.

This is reproducible by indexing long URLs (at least 245 bytes long). We already had a cap at 2000 characters, but that was apparently not good enough.

Changed in zeitgeist-extensions:
assignee: nobody → Mikkel Kamstrup Erlandsen (kamstrup)
importance: Undecided → High
status: New → Triaged
Changed in zeitgeist-extensions:
status: Triaged → Fix Committed
Changed in zeitgeist-extensions:
milestone: none → fts-0.0.12
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zeitgeist-extensions - 0.0.12-0ubuntu1

---------------
zeitgeist-extensions (0.0.12-0ubuntu1) oneiric; urgency=low

  * New upstream release:
    - fts can SIGSEGV ZG during reindex (LP: #617309)
    - zeitgeist-daemon crashed with RuntimeError in _check_index():
      basic_string::assign (LP: #839740)
    - Blowing Xapian max term length corrupts index (LP: #843668)
    - Can't recover from FTS index corruption (LP: #705944)
 -- Didier Roche <email address hidden> Thu, 08 Sep 2011 11:25:16 +0200

Changed in zeitgeist-extensions (Ubuntu):
status: New → Fix Released

Note from a Xapian developer; the report here says: "For some reason this is not always gracefully handled inside Xapian and busting that limit may occasionally corrupt the index." We're not aware of any situation in which adding a term longer than the limit can result in a corrupted index, and I don't recall any such report. If you have a way to reproduce such a corruption, we'd be interested in it, so that we can fix it.

Richard: Sure - I never personally could reproduce this issue, but one user seemed to get it very reliably. I can check with him to see if we can narrow it down.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers