End-of-line normalization differs between etree.XML and etree.iterparse

Bug #1788449 reported by Audric Schiltknecht on 2018-08-22
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Undecided
Unassigned

Bug Description

Normalization of end-of-line (ie. convert \r\n to \n) differs between using etree.XML (or etree.parse) and etree.iterparse.

A small example is attached.
Expected output: none
Current output:
Traceback (most recent call last):
  File "lxml-eol-normalization.py", line 22, in <module>
    repr(crlf_root.text))
AssertionError: 'line1\nline2' != 'line1\r\nline2'

Environment:

Python 3.6.5 (default, May 11 2018, 04:00:52)
[GCC 8.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> from lxml import etree
>>>
>>> print("%-20s: %s" % ('Python', sys.version_info))
Python : sys.version_info(major=3, minor=6, micro=5, releaselevel='final', serial=0)
>>> print("%-20s: %s" % ('lxml.etree', etree.LXML_VERSION))
lxml.etree : (4, 2, 1, 0)
>>> print("%-20s: %s" % ('libxml used', etree.LIBXML_VERSION))
libxml used : (2, 9, 8)
>>> print("%-20s: %s" % ('libxml compiled', etree.LIBXML_COMPILED_VERSION))
libxml compiled : (2, 9, 8)
>>> print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_VERSION))
libxslt used : (1, 1, 32)
>>> print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_COMPILED_VERSION))
libxslt compiled : (1, 1, 32)

description: updated
description: updated
description: updated
scoder (scoder) wrote :

Thank you for the test script which made it easy to reproduce this. However, I can also reproduce this with xmllint, which means that the problem is in libxml2 and not in lxml.

$ python -c 'print("<test><![CDATA[line1\r\nline2]]></test>")' | xmllint - | python -c 'import sys; print(repr(sys.stdin.read()))'
'<?xml version="1.0"?>\n<test><![CDATA[line1\nline2]]></test>\n'

$ python -c 'print("<test><![CDATA[line1\r\nline2]]></test>")' | xmllint --push - | python -c 'import sys; print(repr(sys.stdin.read()))'
'<?xml version="1.0"?>\n<test><![CDATA[line1\r\nline2]]></test>\n'

Changed in lxml:
status: New → Invalid

Oh, I see, I forgot to test directly with libxml2! Sorry for the noise, I will report it there.
Thanks!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers