lxml.etree.XMLSyntaxError: Memory allocation failed - but no memory used
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
New
|
Undecided
|
Unassigned |
Bug Description
Python : sys.version_
lxml.etree : (5, 1, 0, 0)
libxml used : (2, 12, 3)
libxml compiled : (2, 12, 3)
libxslt used : (1, 1, 39)
libxslt compiled : (1, 1, 39)
I am parsing a very large XML (500G) file using lxml.etree.
The individual records are not very large. This runs for about an hour, memory
not getting close to 100M while it runs. Machine has hundreds of gigabytes of
memory, and its mostly not utilised while this ran.
with open(largexmlfi
for _, elem in lxml.etree.
xmlstream, events=("end",), remove_
):
# Don't actually try to do anything
assert elem
elem.clear()
while elem.getprevious() is not None:
del elem.getparent()[0]
After about 4931000 records processed, I get this
...
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "/mnt/docker/
lxml.etree.
It is very reproducible. It seems to fail exactly at the same place.
I should clarify - this error message has also been observed processing other unrelated large XML sources,
so it is unlikely to be the XML source that has some issue.