Comments/PIs before doctype are lost

Bug #1421921 reported by Olli Pottonen on 2015-02-14
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Low
Olli Pottonen

Bug Description

>>> import lxml.etree
>>> doc = lxml.etree.fromstring('<?xml version="1.0"?><!--foo--><!DOCTYPE a><a/>').getroottree()
>>> etree.tostring(doc)

Expected result:
'<!--foo--><!DOCTYPE a>\n<a/>'
actual result:
'<!DOCTYPE a>\n<a/>'

Version info:
Python : sys.version_info(major=2, minor=7, micro=3, releaselevel='lxml.etree : (2, 3, 2, 0)
libxml used : (2, 8, 0)
libxml compiled : (2, 8, 0)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)

Olli Pottonen (olli-pottonen) wrote :

The comment is parsed correctly and doc.getroottree().getprevious() returns it as expected. The bug is in serialization. _writePrevSiblings() in serializer.pxi omits declaration (as it should) and everything before it (as it should not).

Because lxml handles the declaration itself, instead of relying on libxml2, this is difficult to get right.

Olli Pottonen (olli-pottonen) wrote :

I was wrong, it is not difficult to get right, just couple of additional lines of code.

scoder (scoder) wrote :
Changed in lxml:
importance: Undecided → Low
milestone: none → 3.5
status: New → Fix Committed
assignee: nobody → Olli Pottonen (olli-pottonen)
scoder (scoder) wrote :

Fixed in lxml 3.5.0.

Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers