This might be due to the safety limits that libxml2's default parser applies in order to defeat DoS attacks with large document content. You could try creating your own self-configured "lxml.html.HTMLParser" for parsing the document that has the "huge_tree=True" option set.
Obviously, disabling the parser limitations opens up your code to DoS attacks, but it's worth a try to see if that's the issue here.
This might be due to the safety limits that libxml2's default parser applies in order to defeat DoS attacks with large document content. You could try creating your own self-configured "lxml.html. HTMLParser" for parsing the document that has the "huge_tree=True" option set.
Obviously, disabling the parser limitations opens up your code to DoS attacks, but it's worth a try to see if that's the issue here.