in Docker while parsing: XMLSyntaxError: Document is empty

Bug #2002630 reported by Ivan Pogrebkov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
New
Undecided
Unassigned

Bug Description

This script fails in any docker container i tried (ubuntu, alpine, ol8) but runs fine on mac locally. seems like its libxml2 2.9.14 version issue.

from requests import get
from lxml import etree
r = get('https://printbar.ru/synsfiles/yandex/market/idrr_full.xml')
with open('test.xml', 'wb') as f:
    f.write(r.content)

lines = []
tree = etree.iterparse(source='test.xml', events=('end',))
try:
    for (ev, el) in tree:
        lines.append(el.sourceline)
finally:
    print(max(lines))

===== docker python3.10-slim (ol8) conf:

Python : sys.version_info(major=3, minor=10, micro=0, releaselevel='final', serial=0)
lxml.etree : (4, 9, 1, 0)
libxml used : (2, 9, 14)
libxml compiled : (2, 9, 14)
libxslt used : (1, 1, 35)
libxslt compiled : (1, 1, 35)

===== docker python3.10-slim (ol8) eror:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "src/lxml/iterparse.pxi", line 210, in lxml.etree.iterparse.__next__
  File "src/lxml/iterparse.pxi", line 195, in lxml.etree.iterparse.__next__
  File "src/lxml/iterparse.pxi", line 230, in lxml.etree.iterparse._read_more_events
  File "src/lxml/parser.pxi", line 1376, in lxml.etree._FeedParser.feed
  File "src/lxml/parser.pxi", line 606, in lxml.etree._ParserContext._handleParseResult
  File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
  File "test.xml", line 1
lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1

===== docker ubuntu:latest conf:

Python : sys.version_info(major=3, minor=10, micro=6, releaselevel='final', serial=0)
lxml.etree : (4, 9, 1, 0)
libxml used : (2, 9, 14)
libxml compiled : (2, 9, 14)
libxslt used : (1, 1, 35)
libxslt compiled : (1, 1, 35)

===== docker ubuntu:latest error:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "src/lxml/iterparse.pxi", line 210, in lxml.etree.iterparse.__next__
  File "src/lxml/iterparse.pxi", line 195, in lxml.etree.iterparse.__next__
  File "src/lxml/iterparse.pxi", line 230, in lxml.etree.iterparse._read_more_events
  File "src/lxml/parser.pxi", line 1376, in lxml.etree._FeedParser.feed
  File "src/lxml/parser.pxi", line 606, in lxml.etree._ParserContext._handleParseResult
  File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
  File "test.xml", line 1
lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1

===== and finally mac conf that works fine locally:

Python : sys.version_info(major=3, minor=10, micro=8, releaselevel='final', serial=0)
lxml.etree : (4, 9, 1, 0)
libxml used : (2, 9, 13)
libxml compiled : (2, 9, 13)
libxslt used : (1, 1, 35)
libxslt compiled : (1, 1, 35)

maybe i can downgrade libxml2 version somehow?

Revision history for this message
Ivan Pogrebkov (van4oza) wrote :

this file only works on my macs somehow, but not in any containers even with libxml2 2.9.13

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.