XML comment on first line and root element on second line get squeezed together after write

Bug #1855136 reported by Pontus
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
New
Undecided
Unassigned

Bug Description

I am using lxml to validate, parse and modify a bunch of XML files. All the input XML files have a XML comment as the first line. After the processing of a XML file, i.e. when it has been written to a new output XML file, the XML comment on the first line and the root element on the second line has been squeezed together on one line. For me this is a bug.

Here is a minimal example showing what I mean (files also attached to bug report):

input.xml:
----------
<!-- $Revision: $ $URL: $ -->
<RootElement>
    <!-- A comment -->
    <ChildElement1/>

    <!-- Another comment -->
    <ChildElement2>
    </ChildElement2>
</RootElement>
----------

processing.py:
----------
from lxml import etree as ET
root = ET.parse("input.xml")
# Modify XML file
# ...
root.write("output.xml")
----------

output.xml:
----------
<!-- $Revision: $ $URL: $ --><RootElement>
    <!-- A comment -->
    <ChildElement1/>

    <!-- Another comment -->
    <ChildElement2>
    </ChildElement2>
</RootElement>
----------

This might not be the biggest issue in the world, but it is a bit annoying. Both the first line with the XML comment and the second line with the root element tend to be quite long in my XML files. I sometimes use these XML files as a reference and the root element contains various attributes that I'd like to see on the screen without scrolling right in the editor. I also understand that it is a very easy thing to post-process the XML file and separate the two lines with standard file operations in Python. But then again, I shouldn't have to.

For the record: Windows (CRLF) or Linux (LF) line endings in the input.xml file makes no difference.

By using the write function with the C14N method, i.e.:
root.write("output.xml", method="c14n")
the first XML comment stays on its own line. But there are too many other changes in the XML, e.g. all attributes get sorted in alphabetical order. I would like as few changes as possible to get clear and relevant diffs.

----------

Python : sys.version_info(major=3, minor=8, micro=0, releaselevel='final', serial=0)
lxml.etree : (4, 4, 2, 0)
libxml used : (2, 9, 5)
libxml compiled : (2, 9, 5)
libxslt used : (1, 1, 30)
libxslt compiled : (1, 1, 30)

Revision history for this message
Pontus (pontusnyman) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.