Catalog indexes XML element names

Bug #101756 reported by Samuel Schluep
10
Affects Status Importance Assigned to Milestone
Silva
Fix Released
Medium
Unassigned

Bug Description

The Silva catalog's full-text index contains the XML element names, such as
'doc', 'p', 'path', 'image', 'link', 'em', etc. In my opinion this is a bug. The
full-text index should only contain XML contents not the XML element and XML
attribute names.

Tags: silva-1.6
Revision history for this message
Martijn Faassen (faassen) wrote :

This is indeed a design flaw in the way fulltext indexing takes place right now.
It would not be very hard to flatten this XML and leave out the tags, though
this needs to be carefully done and with automatic tests to make sure we don't
accidentally leave something out.

Revision history for this message
Daniel Nouri (daniel.nouri) wrote :

Fixed in r23270 by using a regular expression to strip out tags (test in r23269).

Changed in silva:
milestone: none → 1.6
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.