Fails parsing 28MB+ files
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Save these files locally (from browser)
https:/
https:/
try to parse them:
import lxml.html
from lxml import etree as ET
tree = ET.parse(
Gives:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "src\lxml\
File "src\lxml\
File "src\lxml\
File "src\lxml\
File "src\lxml\
File "src\lxml\
File "src\lxml\
File "src\lxml\
File "file:/
lxml.etree.
and
File "file:/
lxml.etree.
Works without problem with built in stdlib etree. (we could fall back to that if there is no other solution)
Got 8GB RAM.
Python : sys.version_
lxml.etree : (4, 6, 3, 0)
libxml used : (2, 9, 5)
libxml compiled : (2, 9, 5)
libxslt used : (1, 1, 30)
libxslt compiled : (1, 1, 30)
See the "huge_tree" option. /lxml.de/ parsing. html#parser- options
https:/