Comment 1 for bug 2069088

Revision history for this message
Michael (arbosh) wrote :

Just a small addition: I'm aware of the W3C recommendation for a parser to replace all whitespace with a space. I also tested

>>> etree.fromstring('<img src="&#10;end"/>', parser=parser).attrib
>>> {'src': '\nend'}

But than I have no possibility to print the \n. And with &#10; the line length will be too long again and nothing is won.

So, if SMTP forces us to remain below 1000 letters in a line, if base64 supports to be split in several lines, if browsers and email programs accept newlines in attributes, because its allowed and valid ... the only thing missing is the option in a parser to do less - not to strip newlines here. Maybe this makes my report a feature request, sorry for this.

The point is, how to explain to clients "this is impossible because ... yea, W3C ... standards ... stuttering", if they want their logo to be shown in emails.

Is there any chance that such an option can be added to lxml?

As far as I tried and tested, custom parsers until now cannot help out.