Comment 1 for bug 1903325

Revision history for this message
scoder (scoder) wrote :

This might be due to the safety limits that libxml2's default parser applies in order to defeat DoS attacks with large document content. You could try creating your own self-configured "lxml.html.HTMLParser" for parsing the document that has the "huge_tree=True" option set.

Obviously, disabling the parser limitations opens up your code to DoS attacks, but it's worth a try to see if that's the issue here.