Support for XHTML?

Bug #1747680 reported by danny0838 on 2018-02-06
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Undecided
Unassigned

Bug Description

Does lxml have a built-in support for XHTML?

It seems that lxml.html, whether specified method as 'html' or 'xml', does not work right for outputting (HTML compatible) XHTML content.

Issue for output (using 'html'):

>>> from lxml import html
>>> html.tostring(html.fromstring('''<html><head><meta charset="UTF-8"><script src="foo.js"></script></head><body><span class="bar"></span><br>&nbsp;</body></html>'''), method='html')
b'<html><head><meta charset="UTF-8"><script src="foo.js"></script></head><body><input type="checkbox" checked><span class="bar"></span><br>&#160;</body></html>'

While the expected behavior is '<meta charset="UTF-8" />' instead of '<meta charset="UTF-8">', '<input type="checkbox" checked="checked" />' instead of '<input type="checkbox" checked>', '<br />' instead of '<br>', and '&nbsp;' instead of '&#160;'.

Issue for output (as 'xml'):

>>> from lxml import html
>>> html.tostring(html.fromstring('''<html><head><meta charset="UTF-8"><script src="foo.js"></script></head><body><span class="bar"></span><br>&nbsp;</body></html>'''), method="xml")
b'<html><head><meta charset="UTF-8"/><script src="foo.js"/></head><body><input type="checkbox" checked="checked"/><span class="bar"/><br/>&#160;</body></html>'

While the expected behavior is '<script src="foo.js"></script>' instead of '<script src="foo.js"/>', '<span class="bar"></span>' instead of '<span class="bar"/>', and '&nbsp;' instead of '&#160;'.

Do I miss something for supporting XHTML? If yes, how should we get lxml work with XHTML? If no, could you add a support for XHTML?

---
Python : sys.version_info(major=3, minor=6, micro=4, releaselevel='final', serial=0)
lxml.etree : (4, 1, 1, 0)
libxml used : (2, 9, 5)
libxml compiled : (2, 9, 5)
libxslt used : (1, 1, 30)
libxslt compiled : (1, 1, 30)

danny0838 (danny0838) on 2018-02-06
description: updated
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers