Silva

Bug #101440 (silva-1325)
Comment #22

Comment 22 for bug 101440

Revision history for this message

sacco (timothy-heap) wrote on 2006-01-25:

#22

> I don't see a reason to support the textStrip extension. Either we always do
> it or never, but no need to configure this when calling and complicated the
> code. I don't think we need to use any stripping.

I think we certainly can't do it always!

Just as the idea of the original version returning a list
consisting of the joined contents of each top-level element
was simply intended as an example of how to do something,
so was the textStrip parameter. The reason stripping
whitespace was chosen as the example is that I
frequently see XML documents which are over 50%
whitespace, particularly those generated in Python
(Python programmers don't tend to use tabs ;?> ),
but stripping can't simply be applied throughout;
however, I tend not to examine the XML internal to
Silva if at all possible, so you will know more than
me about whether things are better here.

An example of what?
Sometimes it may become necessary to treat a
node differently depending upon where it occurs in
the document tree, e.g.whether or not it occurs inside
another particular type of node. Unless this depends
only upon strictly "local" information (e.g. the difference
is that the node in question is a direct child of an
'li' element, in which case it may be possible to add
a suitable clause to the part of the function processing
element node) there are essentially two ways to deal with
this:
1) passing some (limited) information down the stack
(in this example via the textStrip parameter);
2) using an auxilliary function.

But even when an auxilliary function is used,
unless the situation is *really* complicated
(and it really shouldn't be in this case)
it is far neater and more maintainable to
make it mutually recursive (i.e. to call back
into the main function for the inner recursions);
in this case it is usually necessary to put
some info on the stack as well to alter the
behaviour of the inner recursions.

Summary: if you don't think we ever need to
strip then let's omit the parameter; however,
I wouldn't yet rule out using something similar
to tune the algorithm to the Silva document
model.

I think we certainly can't do it always!

Just as the idea of the original version returning a list 
consisting of the joined contents of each top-level element 
was simply intended as an example of how to do something,
so was the textStrip parameter.  The reason stripping 
whitespace was chosen as the example is that I 
frequently see XML documents which are over 50%
whitespace, particularly those generated in Python
(Python programmers don't tend to use tabs  ;?> ),
but stripping can't simply be applied throughout;
however, I tend not to examine the XML internal to
Silva if at all possible, so you will know more than
me about whether things are better here.

An example of what?  
Sometimes it may become necessary to treat a 
node differently depending upon where it occurs in 
the document tree, e.g.whether or not it occurs inside 
another particular type of node.  Unless this depends
only upon strictly "local" information (e.g. the difference
is that the node in question is a direct child of an
'li' element, in which case it may be possible to add
a suitable clause to the part of the function processing
element node) there are essentially two ways to deal with
this:
1) passing some (limited) information down the stack
    (in this example via the textStrip parameter);
2) using an auxilliary function.

But even when an auxilliary function is used, 
unless the situation is *really* complicated
(and it really shouldn't be in this case) 
it is far neater and more maintainable to 
make it mutually recursive (i.e. to call back
into the main function for the inner recursions);
in this case it is usually necessary to put
some info on the stack as well to alter the
behaviour of the inner recursions.

Summary: if you don't think we ever need to 
strip then let's omit the parameter; however,
I wouldn't yet rule out using something similar
to tune the algorithm to the Silva document
model.