End-of-line normalization differs between etree.XML and etree.iterparse

Bug #1788449 reported by Audric Schiltknecht
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Invalid
Undecided
Unassigned

Bug Description

Normalization of end-of-line (ie. convert \r\n to \n) differs between using etree.XML (or etree.parse) and etree.iterparse.

A small example is attached.
Expected output: none
Current output:
Traceback (most recent call last):
  File "lxml-eol-normalization.py", line 22, in <module>
    repr(crlf_root.text))
AssertionError: 'line1\nline2' != 'line1\r\nline2'

Environment:

Python 3.6.5 (default, May 11 2018, 04:00:52)
[GCC 8.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> from lxml import etree
>>>
>>> print("%-20s: %s" % ('Python', sys.version_info))
Python : sys.version_info(major=3, minor=6, micro=5, releaselevel='final', serial=0)
>>> print("%-20s: %s" % ('lxml.etree', etree.LXML_VERSION))
lxml.etree : (4, 2, 1, 0)
>>> print("%-20s: %s" % ('libxml used', etree.LIBXML_VERSION))
libxml used : (2, 9, 8)
>>> print("%-20s: %s" % ('libxml compiled', etree.LIBXML_COMPILED_VERSION))
libxml compiled : (2, 9, 8)
>>> print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_VERSION))
libxslt used : (1, 1, 32)
>>> print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_COMPILED_VERSION))
libxslt compiled : (1, 1, 32)

Revision history for this message
Audric Schiltknecht (audric-schiltknecht) wrote :
description: updated
description: updated
description: updated
Revision history for this message
scoder (scoder) wrote :

Thank you for the test script which made it easy to reproduce this. However, I can also reproduce this with xmllint, which means that the problem is in libxml2 and not in lxml.

$ python -c 'print("<test><![CDATA[line1\r\nline2]]></test>")' | xmllint - | python -c 'import sys; print(repr(sys.stdin.read()))'
'<?xml version="1.0"?>\n<test><![CDATA[line1\nline2]]></test>\n'

$ python -c 'print("<test><![CDATA[line1\r\nline2]]></test>")' | xmllint --push - | python -c 'import sys; print(repr(sys.stdin.read()))'
'<?xml version="1.0"?>\n<test><![CDATA[line1\r\nline2]]></test>\n'

Changed in lxml:
status: New → Invalid
Revision history for this message
Audric Schiltknecht (audric-schiltknecht) wrote :

Oh, I see, I forgot to test directly with libxml2! Sorry for the noise, I will report it there.
Thanks!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.