etree.tostring(): XML pretty printing does not work on non-indented XML files.

Bug #910018 reported by jenisys on 2011-12-30
This bug affects 2 people
Affects Status Importance Assigned to Milestone

Bug Description

Pretty printing an XML file via lxml.etree.tostring(..., pretty_print=True) does not work in some cases.
These means no indentation will be applied corresponding lines/elements according to the XML tree depth level.

from lxm import etree
from lxm import objectify

xmldoc = """\
<alice />
<bob />

root = etree.fromstring(xmldoc)
formatted = etree.tostring(root, pretty_print=True)
expected = """\
    <alice />
    <bob />
assert expected == formatted, "OOPS, fails."
# -- VARIANT 2: formatted = etree.tostring(root.getroottree(), pretty_print=True)

Formatting/pretty printing works when you create the "root" object via lxml.objectify.fromstring() instead of etree.fromstring().
Just replace the following line in the example from above:

  root = objectify.fromstring(xmldoc)

  Python : (2, 6, 6, 'final', 0)
  lxml.etree : (2, 3, 2, 0)
  libxml used : (2, 7, 3)
  libxml compiled : (2, 7, 3)
  libxslt used : (1, 1, 24)
  libxslt compiled : (1, 1, 24)

scoder (scoder) wrote :

This doesn't really have much to do with lxml.

I'm inclined to close this as "won't fix". While lxml could potentially influence the heuristic here, I don't think there is a right way to do it.

jenisys (jenisys) wrote :

From my point of view, you can close the issue (if you don't think it is important).
I will use the work-around, that is described above, for my purposes.
I just think it is a little bit weird, because "xmllint --format" just provides the desired behavior (xmllint is bundled w/ libxml2).

Thanks for the pointer into the FAQ.
I must have overlooked it (who is reading the manuals anyway ,-).

Valentin Lab (vaab) wrote :

I must agree that it's quite complicated to have xml pretty printed thanks to lxml. And that's a shame. Even objectify will have cases where some subelement won't be prettified. Sadly, the best solution for me is to pipe result in xmllint --format.

I understand the fact unveilled in the FAQ, but it seems quite obvious for me that you won't use the pretty_print fonctionality if space are meaningfull for you.

scoder (scoder) wrote :

If space is not meaningful to you, you should use the "remove_blank_text" option in the parser and/or use a DTD. It's more of a parser issue than a serialiser issue, really. And lxml can't know during parsing that you are going to ask for pretty printing during serialisation.

Changed in lxml:
assignee: nobody → Stefan Behnel (scoder)
importance: Undecided → Low
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers