etree.tostring(): XML pretty printing does not work on non-indented XML files.

Bug #910018 reported by jenisys
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lxml
Won't Fix
Low
scoder

Bug Description

Pretty printing an XML file via lxml.etree.tostring(..., pretty_print=True) does not work in some cases.
These means no indentation will be applied corresponding lines/elements according to the XML tree depth level.

EXAMPLE:
from lxm import etree
from lxm import objectify

xmldoc = """\
<root>
<alice />
<bob />
</root>
"""

root = etree.fromstring(xmldoc)
formatted = etree.tostring(root, pretty_print=True)
expected = """\
<root>
    <alice />
    <bob />
</root>
"""
assert expected == formatted, "OOPS, fails."
# -- VARIANT 2: formatted = etree.tostring(root.getroottree(), pretty_print=True)

WORKAROUND:
Formatting/pretty printing works when you create the "root" object via lxml.objectify.fromstring() instead of etree.fromstring().
Just replace the following line in the example from above:

  root = objectify.fromstring(xmldoc)

VERSION-INFO:
  Python : (2, 6, 6, 'final', 0)
  lxml.etree : (2, 3, 2, 0)
  libxml used : (2, 7, 3)
  libxml compiled : (2, 7, 3)
  libxslt used : (1, 1, 24)
  libxslt compiled : (1, 1, 24)

Revision history for this message
scoder (scoder) wrote :

This doesn't really have much to do with lxml.

http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output

I'm inclined to close this as "won't fix". While lxml could potentially influence the heuristic here, I don't think there is a right way to do it.

Revision history for this message
jenisys (jenisys) wrote :

From my point of view, you can close the issue (if you don't think it is important).
I will use the work-around, that is described above, for my purposes.
I just think it is a little bit weird, because "xmllint --format" just provides the desired behavior (xmllint is bundled w/ libxml2).

Thanks for the pointer into the FAQ.
I must have overlooked it (who is reading the manuals anyway ,-).

Revision history for this message
Valentin Lab (vaab) wrote :

I must agree that it's quite complicated to have xml pretty printed thanks to lxml. And that's a shame. Even objectify will have cases where some subelement won't be prettified. Sadly, the best solution for me is to pipe result in xmllint --format.

I understand the fact unveilled in the FAQ, but it seems quite obvious for me that you won't use the pretty_print fonctionality if space are meaningfull for you.

Revision history for this message
scoder (scoder) wrote :

If space is not meaningful to you, you should use the "remove_blank_text" option in the parser and/or use a DTD. It's more of a parser issue than a serialiser issue, really. And lxml can't know during parsing that you are going to ask for pretty printing during serialisation.

Changed in lxml:
assignee: nobody → Stefan Behnel (scoder)
importance: Undecided → Low
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.