iterparse cannot parse gzip compressed files
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
New
|
Undecided
|
Unassigned |
Bug Description
the parsing tutorial states that gzip-compressed files can be feeded to the parser. however, it doesn't:
# BEGIN console replay
u@h:/tmp/lxml-gz $ pip install lxml
Collecting lxml
Using cached https:/
Installing collected packages: lxml
Successfully installed lxml-4.4.1
u@h:/tmp/lxml-gz $ echo "<root/>" > test.xml
u@h:/tmp/lxml-gz $ gzip test.xml
u@h:/tmp/lxml-gz $ file text.xml.gz
text.xml.gz: gzip compressed data, was "text.xml", last modified: Sun Sep 8 18:51:23 2019, from Unix, original size 8
u@h:/tmp/lxml-gz $ python
>>> for _, el in etree.iterparse
... print(el)
...
Traceback (most recent call last):
File "<input>", line 1, in <module>
for _, el in etree.iterparse
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "src/lxml/
File "./test.xml.gz", line 1
lxml.etree.
>>>
u@h:/tmp/lxml-gz $ gunzip test.xml.gz
u@h:/tmp/lxml-gz $ python
>>> from lxml import etree
>>> for _, el in etree.iterparse
... print(el)
...
<Element root at 0x7f2c530996c8>
# END console replay
# BEGIN versions info
Python : sys.version_
lxml.etree : (4, 4, 1, 0)
libxml used : (2, 9, 9)
libxml compiled : (2, 9, 9)
libxslt used : (1, 1, 33)
libxslt compiled : (1, 1, 33)
# END versions info
there's also thois quqestion on SO: https:/
summary: |
- gzip compressed files aren't parsed + iterparse cannot parse gzip compressed files |