lxml

Overview
Code
Bugs
Blueprints
Translations
Answers

Bug #1903325
Comment #1

Comment 1 for bug 1903325

Revision history for this message

scoder (scoder) wrote on 2020-11-06:

This might be due to the safety limits that libxml2's default parser applies in order to defeat DoS attacks with large document content. You could try creating your own self-configured "lxml.html.HTMLParser" for parsing the document that has the "huge_tree=True" option set.

Obviously, disabling the parser limitations opens up your code to DoS attacks, but it's worth a try to see if that's the issue here.