lxml

missing doctype when serialized

Bug #659367 reported by Tomasz Melcer on 2010-10-12

36

This bug affects 7 people

Affects		Status	Importance	Assigned to	Milestone
	lxml	Fix Released	Medium	Olli Pottonen	lxml 3.5

Bug Description

In [1]: from lxml import etree

I've got an HTML document:

In [2]: root = etree.fromstring(u'''<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">\n<HTML></HTML>''', etree.HTMLParser())

Its doctype is parsed correctly:

In [3]: root.getroottree().docinfo.doctype
Out[3]: u'<!DOCTYPE html PUBLIC "-//IETF//DTD HTML//EN">'

But when serializing it, I am losing it:

In [4]: etree.tostring(root.getroottree(), method='html')
Out[4]: '<html></html>'

I expected to get the doctype here too.

Python : (2, 6, 6, 'final', 0)
lxml.etree : (2, 2, 8, 0)
libxml used : (2, 7, 7)
libxml compiled : (2, 7, 7)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)

Revision history for this message

scoder (scoder) wrote on 2013-06-29:

#1

Makes sense. I'd accept a pull request that inserts the doctype in the _tostring() function (serialiser.pxi) if None was provided, the document has an internal or external subset and " write_complete_document" is set.

Note that the doctype would have to be reconstructed from the DTD, as done in the DocInfo() class. This functionality would need to be factored out.

Changed in lxml:
importance:	Undecided → Medium
status:	New → Confirmed

Revision history for this message

scoder (scoder) wrote on 2015-02-14:

#2

proposed fix:
https://github.com/opottone/lxml/commit/3c3dc943aab64924935b0782d8691da3e022afef

Changed in lxml:
milestone:	none → 3.5
status:	Confirmed → In Progress

Revision history for this message

scoder (scoder) wrote on 2015-02-14:

#3

sorry, wrong url - proposed fix:
https://github.com/opottone/lxml/commit/5c07f98f3ac6f643303da13c32fbf94f6af2c153

Revision history for this message

scoder (scoder) wrote on 2015-02-16:

#4

https://github.com/lxml/lxml/commit/6c47a480f81a676627805eaa88ba6b4f74668734

Changed in lxml:
assignee:	nobody → Olli Pottonen (olli-pottonen)
status:	In Progress → Fix Committed

Revision history for this message

scoder (scoder) wrote on 2015-11-13:

#5

Fixed in lxml 3.5.0.

Changed in lxml:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.