Comments/PIs before doctype are lost

Bug #1421921 reported by Olli Pottonen
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fix Released
Olli Pottonen

Bug Description

>>> import lxml.etree
>>> doc = lxml.etree.fromstring('<?xml version="1.0"?><!--foo--><!DOCTYPE a><a/>').getroottree()
>>> etree.tostring(doc)

Expected result:
'<!--foo--><!DOCTYPE a>\n<a/>'
actual result:
'<!DOCTYPE a>\n<a/>'

Version info:
Python : sys.version_info(major=2, minor=7, micro=3, releaselevel='lxml.etree : (2, 3, 2, 0)
libxml used : (2, 8, 0)
libxml compiled : (2, 8, 0)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)

Revision history for this message
Olli Pottonen (olli-pottonen) wrote :

The comment is parsed correctly and doc.getroottree().getprevious() returns it as expected. The bug is in serialization. _writePrevSiblings() in serializer.pxi omits declaration (as it should) and everything before it (as it should not).

Because lxml handles the declaration itself, instead of relying on libxml2, this is difficult to get right.

Revision history for this message
Olli Pottonen (olli-pottonen) wrote :

I was wrong, it is not difficult to get right, just couple of additional lines of code.

Revision history for this message
scoder (scoder) wrote :
Changed in lxml:
importance: Undecided → Low
milestone: none → 3.5
status: New → Fix Committed
assignee: nobody → Olli Pottonen (olli-pottonen)
Revision history for this message
scoder (scoder) wrote :

Fixed in lxml 3.5.0.

Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.