missing doctype when serialized

Bug #659367 reported by Tomasz Melcer
36
This bug affects 7 people
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Medium
Olli Pottonen

Bug Description

    In [1]: from lxml import etree

I've got an HTML document:

    In [2]: root = etree.fromstring(u'''<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">\n<HTML></HTML>''', etree.HTMLParser())

Its doctype is parsed correctly:

    In [3]: root.getroottree().docinfo.doctype
    Out[3]: u'<!DOCTYPE html PUBLIC "-//IETF//DTD HTML//EN">'

But when serializing it, I am losing it:

    In [4]: etree.tostring(root.getroottree(), method='html')
    Out[4]: '<html></html>'

I expected to get the doctype here too.

Python : (2, 6, 6, 'final', 0)
lxml.etree : (2, 2, 8, 0)
libxml used : (2, 7, 7)
libxml compiled : (2, 7, 7)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)

Revision history for this message
scoder (scoder) wrote :

Makes sense. I'd accept a pull request that inserts the doctype in the _tostring() function (serialiser.pxi) if None was provided, the document has an internal or external subset and " write_complete_document" is set.

Note that the doctype would have to be reconstructed from the DTD, as done in the DocInfo() class. This functionality would need to be factored out.

Changed in lxml:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
scoder (scoder) wrote :
Changed in lxml:
milestone: none → 3.5
status: Confirmed → In Progress
Revision history for this message
scoder (scoder) wrote :
Revision history for this message
scoder (scoder) wrote :
Changed in lxml:
assignee: nobody → Olli Pottonen (olli-pottonen)
status: In Progress → Fix Committed
Revision history for this message
scoder (scoder) wrote :

Fixed in lxml 3.5.0.

Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.