Serialisation error when writing a large file (> 2.5 GB) with xmlfile
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Confirmed
|
Medium
|
Unassigned |
Bug Description
No idea what's causing this but I'd guess it's memory related. It's been reproduced with lxml 3.6.0 on other systems.
Python : sys.version_
lxml.etree : (3, 6, 0, 0)
libxml used : (2, 9, 2)
libxml compiled : (2, 9, 2)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)
The code to reproduce the bug can be found at https:/
Tracebacks like
http://
I'm experiencing a similar solution. Unfortunately the links to the tracebacks don't work anymore, so I'll post mine:
Traceback (most recent call last): bin/stetl" , line 4, in <module> _('pkg_ resources' ).run_script( 'Stetl= =1.3', 'stetl') python2. 7/dist- packages/ pkg_resources/ __init_ _.py", line 739, in run_script require( requires) [0].run_ script( script_ name, ns) python2. 7/dist- packages/ pkg_resources/ __init_ _.py", line 1494, in run_script lib/python2. 7/dist- packages/ Stetl-1. 3-py2.7. egg/EGG- INFO/scripts/ stetl", line 41, in <module> lib/python2. 7/dist- packages/ Stetl-1. 3-py2.7. egg/EGG- INFO/scripts/ stetl", line 32, in main lib/python2. 7/dist- packages/ Stetl-1. 3-py2.7. egg/stetl/ etl.py" , line 159, in run lib/python2. 7/dist- packages/ Stetl-1. 3-py2.7. egg/stetl/ chain.py" , line 174, in run comp.process( packet) lib/python2. 7/dist- packages/ Stetl-1. 3-py2.7. egg/stetl/ component. py", line 218, in process process( packet) lib/python2. 7/dist- packages/ Stetl-1. 3-py2.7. egg/stetl/ component. py", line 218, in process process( packet) lib/python2. 7/dist- packages/ Stetl-1. 3-py2.7. egg/stetl/ component. py", line 218, in process process( packet) lib/python2. 7/dist- packages/ Stetl-1. 3-py2.7. egg/stetl/ component. py", line 204, in process stetlbgt/ subfeaturehandl er.py", line 146, in invoke serializer. pxi", line 1347, in lxml.etree. xmlfile. __exit_ _ serializer. pxi", line 1685, in lxml.etree. _IncrementalFil eWriter. _close serializer. pxi", line 1691, in lxml.etree. _IncrementalFil eWriter. _handle_ error serializer. pxi", line 199, in lxml.etree. _raiseSerialisa tionError SerialisationEr ror: unknown error -2055577339
File "/usr/local/
__import_
File "/usr/lib/
self.
File "/usr/lib/
exec(code, namespace, namespace)
File "/usr/local/
main()
File "/usr/local/
etl.run()
File "/usr/local/
chain.run()
File "/usr/local/
packet = self.first_
File "/usr/local/
packet = self.next.
File "/usr/local/
packet = self.next.
File "/usr/local/
packet = self.next.
File "/usr/local/
packet = self.invoke(packet)
File "/etl/bgt/
xf.flush()
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "src/lxml/
lxml.etree.
Yes, I'm still using Python 2.7, and I know I need to upgrade asap, but unfortunately I'm not able to do that at this moment due to other priorities. This error still occurs in the latest version of lxml (version 4.4.2), and also in version 3.7.
I also did some investigation, and I found suspect this error is caused by the fact that libxml returns a signed 32 bit value when calling the function xmlOutputBuffer Close. See http:// www.xmlsoft. org/html/ libxml- xmlIO.html# xmlOutputBuffer Close.
My local test file which I monitored was slightly larger than 2.1 GB, or 2^31. The error code was just a bit smaller than -2^31. So I decided to sum both of them (negating the sign of the error code of course), and the sum was exactly 2^32.
I have no hope that this error will ever be fixed in lxml, since that would have major complications (or hopefully they're working on a 64 bit port). But I hope it will be feasible for lxml to have this error fixed.
The file which I'm eventually generating is valid XML, so as a workaround I decided to catch the etree.Serialisat...