missing doctype when serialized
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Fix Released
|
Medium
|
Olli Pottonen |
Bug Description
In [1]: from lxml import etree
I've got an HTML document:
In [2]: root = etree.fromstrin
Its doctype is parsed correctly:
In [3]: root.getroottre
Out[3]: u'<!DOCTYPE html PUBLIC "-//IETF//DTD HTML//EN">'
But when serializing it, I am losing it:
In [4]: etree.tostring(
Out[4]: '<html></html>'
I expected to get the doctype here too.
Python : (2, 6, 6, 'final', 0)
lxml.etree : (2, 2, 8, 0)
libxml used : (2, 7, 7)
libxml compiled : (2, 7, 7)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)
Makes sense. I'd accept a pull request that inserts the doctype in the _tostring() function (serialiser.pxi) if None was provided, the document has an internal or external subset and " write_complete_ document" is set.
Note that the doctype would have to be reconstructed from the DTD, as done in the DocInfo() class. This functionality would need to be factored out.