Comment 2 for bug 1713329

Revision history for this message
scoder (scoder) wrote :

I can reproduce this with xmllint, which means that the behaviour is due to libxml2, not lxml

$ python3 -c 'print("<a>1</a> <a>2\0 2+</a> <a>3</a> <a>\0 4</a> <a>5</a>")' > h.html
$ xmllint --memory --html h.html
h.html:1: HTML parser error : Char 0x0 out of allowed range
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<a>1</a> <a>2 2+</a> <a>3</a> <a></a>
</body></html>