I agree that the behaviour is not "perfect". However, it's not lxml doing it but libxml2, even in both cases, in and out. And I'm not going to reimplement libxml2's parser or serialiser in lxml just to improve the situation.
If you want to write up some generally usable functions that a) remove all ignorable whitespace from a (parsed) in-memory HTML tree and b) inject indentation and/or c) normalise the whitespace in all possible places where it improves the pretty printing experience when the tree gets serialised, then please do. I'll happily add them as a new feature to lxml.html.
I agree that the behaviour is not "perfect". However, it's not lxml doing it but libxml2, even in both cases, in and out. And I'm not going to reimplement libxml2's parser or serialiser in lxml just to improve the situation.
If you want to write up some generally usable functions that a) remove all ignorable whitespace from a (parsed) in-memory HTML tree and b) inject indentation and/or c) normalise the whitespace in all possible places where it improves the pretty printing experience when the tree gets serialised, then please do. I'll happily add them as a new feature to lxml.html.