Comment 3 for bug 1285625

Revision history for this message
scoder (scoder) wrote : Re: remove_blank_text has no effect on html.HTMLParser

> surely sending `remove_blank_text` to HTMLParser should be at least a warning if not an error, and pretty printing through html.tostring should be much the same?

No, why? It's perfectly reasonable to use the HTMLParser with "remove_blank_text=True" (e.g. to save memory) as it requests the removal of whitespace-only sections that do not contribute to the content of the document. Similarly, pretty printing documents should not alter their content, so it only adds (whitespace) text where it does not break anything.

If you have a specific use case where you need a specific way of formatting a document, you are free to implement that. That's so easy that it's not worth making lxml cater to everyone's needs. The FAQ also has a couple of notes on it. I guess that section could use some comments regarding HTML specifically.