As you can see the rss memory was only 31M, so it was not aggregating memory
On Wed, 20 Mar 2024 at 08:56, Hans-Henrik Stærfeldt <
<email address hidden>> wrote:
> Tried out del elem[:], same result;
>
> [hzys@ip-10-79-219-18 build]( entrez-xml-parser-fail)$ python memorybug.py
> Testing lxml.etree.iterparse with cleanup
> count: 4931000/4931000 eta:0:00:00
> mem:31.52M/mnt/docker/All_Mammalia.xml:-1924384678:38:FATAL:PARSER:ERR_NO_MEMORY:
> Memory allocation failed
> Traceback (most recent call last):
> File "/develop/hzys/bdas/etl/entrez/mammals/elastic/build/memorybug.py",
> line 101, in <module>
> raise excpt
> File "/develop/hzys/bdas/etl/entrez/mammals/elastic/build/memorybug.py",
> line 98, in <module>
> test_lxml_iterate(argparser().parse_args())
> File "/develop/hzys/bdas/etl/entrez/mammals/elastic/build/memorybug.py",
> line 49, in test_lxml_iterate
> for _, elem in lxml.etree.iterparse( # pylint:
> disable=c-extension-no-member
> File "src/lxml/iterparse.pxi", line 208, in lxml.etree.iterparse.__next__
> File "src/lxml/iterparse.pxi", line 193, in lxml.etree.iterparse.__next__
> File "src/lxml/iterparse.pxi", line 228, in
> lxml.etree.iterparse._read_more_events
> File "src/lxml/parser.pxi", line 1451, in lxml.etree._FeedParser.feed
> File "src/lxml/parser.pxi", line 624, in
> lxml.etree._ParserContext._handleParseResult
> File "src/lxml/parser.pxi", line 633, in
> lxml.etree._ParserContext._handleParseResultDoc
> File "src/lxml/parser.pxi", line 743, in lxml.etree._handleParseResult
> File "src/lxml/parser.pxi", line 672, in lxml.etree._raiseParseError
> File "/mnt/docker/All_Mammalia.xml", line -1924384678
> lxml.etree.XMLSyntaxError: Memory allocation failed
>
> On Fri, 15 Mar 2024 at 18:35, scoder <email address hidden> wrote:
>
>> Thanks. Does it change anything if you replace the "elem.clear()" in
>> your code with "del elem[:]" ?
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/2057780
>>
>> Title:
>> lxml.etree.XMLSyntaxError: Memory allocation failed - but no memory
>> used
>>
>> Status in lxml:
>> New
>>
>> Bug description:
>> Python : sys.version_info(major=3, minor=10, micro=8,
>> releaselevel='final', serial=0)
>> lxml.etree : (5, 1, 0, 0)
>> libxml used : (2, 12, 3)
>> libxml compiled : (2, 12, 3)
>> libxslt used : (1, 1, 39)
>> libxslt compiled : (1, 1, 39)
>>
>>
>> I am parsing a very large XML (500G) file using lxml.etree.iterparse
>> like this
>> The individual records are not very large. This runs for about an hour,
>> memory
>> not getting close to 100M while it runs. Machine has hundreds of
>> gigabytes of
>> memory, and its mostly not utilised while this ran.
>>
>>
>> with open(largexmlfilepath, "rb") as xmlstream:
>> for _, elem in lxml.etree.iterparse( # pylint:
>> disable=c-extension-no-member
>> xmlstream, events=("end",), remove_blank_text=True, tag=tagname
>> ):
>> # Don't actually try to do anything
>> assert elem
>> elem.clear()
>> while elem.getprevious() is not None:
>> del elem.getparent()[0]
>>
>>
>> After about 4931000 records processed, I get this
>> ...
>> File "src/lxml/iterparse.pxi", line 210, in
>> lxml.etree.iterparse.__next__
>> File "src/lxml/iterparse.pxi", line 195, in
>> lxml.etree.iterparse.__next__
>> File "src/lxml/iterparse.pxi", line 230, in
>> lxml.etree.iterparse._read_more_events
>> File "src/lxml/parser.pxi", line 1432, in lxml.etree._FeedParser.feed
>> File "src/lxml/parser.pxi", line 609, in
>> lxml.etree._ParserContext._handleParseResult
>> File "src/lxml/parser.pxi", line 618, in
>> lxml.etree._ParserContext._handleParseResultDoc
>> File "src/lxml/parser.pxi", line 728, in lxml.etree._handleParseResult
>> File "src/lxml/parser.pxi", line 657, in lxml.etree._raiseParseError
>> File "/mnt/docker/All_Mammalia.xml", line -1924384678
>> lxml.etree.XMLSyntaxError: Memory allocation failed
>>
>> It is very reproducible. It seems to fail exactly at the same place.
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/lxml/+bug/2057780/+subscriptions
>>
>>
As you can see the rss memory was only 31M, so it was not aggregating memory
On Wed, 20 Mar 2024 at 08:56, Hans-Henrik Stærfeldt <
<email address hidden>> wrote:
> Tried out del elem[:], same result; 10-79-219- 18 build]( entrez- xml-parser- fail)$ python memorybug.py iterparse with cleanup 52M/mnt/ docker/ All_Mammalia. xml:-1924384678 :38:FATAL: PARSER: ERR_NO_ MEMORY: hzys/bdas/ etl/entrez/ mammals/ elastic/ build/memorybug .py", hzys/bdas/ etl/entrez/ mammals/ elastic/ build/memorybug .py", iterate( argparser( ).parse_ args()) hzys/bdas/ etl/entrez/ mammals/ elastic/ build/memorybug .py", iterparse( # pylint: c-extension- no-member iterparse. pxi", line 208, in lxml.etree. iterparse. __next_ _ iterparse. pxi", line 193, in lxml.etree. iterparse. __next_ _ iterparse. pxi", line 228, in iterparse. _read_more_ events parser. pxi", line 1451, in lxml.etree. _FeedParser. feed parser. pxi", line 624, in _ParserContext. _handleParseRes ult parser. pxi", line 633, in _ParserContext. _handleParseRes ultDoc parser. pxi", line 743, in lxml.etree. _handleParseRes ult parser. pxi", line 672, in lxml.etree. _raiseParseErro r All_Mammalia. xml", line -1924384678 XMLSyntaxError: Memory allocation failed /bugs.launchpad .net/bugs/ 2057780 XMLSyntaxError: Memory allocation failed - but no memory info(major= 3, minor=10, micro=8, 'final' , serial=0) iterparse lepath, "rb") as xmlstream: iterparse( # pylint: c-extension- no-member blank_text= True, tag=tagname iterparse. pxi", line 210, in iterparse. __next_ _ iterparse. pxi", line 195, in iterparse. __next_ _ iterparse. pxi", line 230, in iterparse. _read_more_ events parser. pxi", line 1432, in lxml.etree. _FeedParser. feed parser. pxi", line 609, in _ParserContext. _handleParseRes ult parser. pxi", line 618, in _ParserContext. _handleParseRes ultDoc parser. pxi", line 728, in lxml.etree. _handleParseRes ult parser. pxi", line 657, in lxml.etree. _raiseParseErro r All_Mammalia. xml", line -1924384678 XMLSyntaxError: Memory allocation failed /bugs.launchpad .net/lxml/ +bug/2057780/ +subscriptions
>
> [hzys@ip-
> Testing lxml.etree.
> count: 4931000/4931000 eta:0:00:00
> mem:31.
> Memory allocation failed
> Traceback (most recent call last):
> File "/develop/
> line 101, in <module>
> raise excpt
> File "/develop/
> line 98, in <module>
> test_lxml_
> File "/develop/
> line 49, in test_lxml_iterate
> for _, elem in lxml.etree.
> disable=
> File "src/lxml/
> File "src/lxml/
> File "src/lxml/
> lxml.etree.
> File "src/lxml/
> File "src/lxml/
> lxml.etree.
> File "src/lxml/
> lxml.etree.
> File "src/lxml/
> File "src/lxml/
> File "/mnt/docker/
> lxml.etree.
>
> On Fri, 15 Mar 2024 at 18:35, scoder <email address hidden> wrote:
>
>> Thanks. Does it change anything if you replace the "elem.clear()" in
>> your code with "del elem[:]" ?
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https:/
>>
>> Title:
>> lxml.etree.
>> used
>>
>> Status in lxml:
>> New
>>
>> Bug description:
>> Python : sys.version_
>> releaselevel=
>> lxml.etree : (5, 1, 0, 0)
>> libxml used : (2, 12, 3)
>> libxml compiled : (2, 12, 3)
>> libxslt used : (1, 1, 39)
>> libxslt compiled : (1, 1, 39)
>>
>>
>> I am parsing a very large XML (500G) file using lxml.etree.
>> like this
>> The individual records are not very large. This runs for about an hour,
>> memory
>> not getting close to 100M while it runs. Machine has hundreds of
>> gigabytes of
>> memory, and its mostly not utilised while this ran.
>>
>>
>> with open(largexmlfi
>> for _, elem in lxml.etree.
>> disable=
>> xmlstream, events=("end",), remove_
>> ):
>> # Don't actually try to do anything
>> assert elem
>> elem.clear()
>> while elem.getprevious() is not None:
>> del elem.getparent()[0]
>>
>>
>> After about 4931000 records processed, I get this
>> ...
>> File "src/lxml/
>> lxml.etree.
>> File "src/lxml/
>> lxml.etree.
>> File "src/lxml/
>> lxml.etree.
>> File "src/lxml/
>> File "src/lxml/
>> lxml.etree.
>> File "src/lxml/
>> lxml.etree.
>> File "src/lxml/
>> File "src/lxml/
>> File "/mnt/docker/
>> lxml.etree.
>>
>> It is very reproducible. It seems to fail exactly at the same place.
>>
>> To manage notifications about this bug go to:
>> https:/
>>
>>