Comment 6 for bug 1285625

Revision history for this message
scoder (scoder) wrote : Re: remove_blank_text has no effect on html.HTMLParser

I agree that the behaviour is not "perfect". However, it's not lxml doing it but libxml2, even in both cases, in and out. And I'm not going to reimplement libxml2's parser or serialiser in lxml just to improve the situation.

If you want to write up some generally usable functions that a) remove all ignorable whitespace from a (parsed) in-memory HTML tree and b) inject indentation and/or c) normalise the whitespace in all possible places where it improves the pretty printing experience when the tree gets serialised, then please do. I'll happily add them as a new feature to lxml.html.