Comment 8 for bug 101440

Martijn Faassen (faassen) wrote :

I do want to review this before it's checked in. While the XML tag removal never
worked, fulltext indexing has been around for a while.

I'm a bit scared to see the DOM code; I know quite a bit about DOM and ParsedXML
but I'm trying to avoid them. :) The NodeFilter code and such worries me --
ParsedXML does have some implementation of this, but I remember it never really
got a lot of review so I'm worried about it failing in obscure cases. This is
why I'd prefer the simpler DOM tree walking approach.

I'm also slightly worried about the performance impact of this. A simple version
should be as fast as the XML generating form. I don't think that'll be too hard
to accomplish -- the XML generation in ParsedXML isn't particularly fancy
either, but some simple measurements extracting this information from large
documents would comfort me.