Comment 0 for bug 1747680

Revision history for this message
danny0838 (danny0838) wrote :

Does lxml have a built-in support for XHTML?

It seems that lxml.html, whether specified method as 'html' or 'xml', does not work right for outputting (HTML compatible) XHTML content.

Issue for output (using 'html'):

>>> from lxml import html
>>> html.tostring(html.fromstring('''<html><head><meta charset="UTF-8"><script src="foo.js"></script></head><body><span class="bar"></span><br>&nbsp;</body></html>'''), method='html')
b'<html><head><meta charset="UTF-8"><script src="foo.js"></script></head><body><input type="checkbox" checked><span class="bar"></span><br>&#160;</body></html>'

While the expected behavior is '<meta charset="UTF-8" />' instead of '<meta charset="UTF-8">', '<input type="checkbox" checked="checked" />' instead of '<input type="checkbox" checked>', '<br />' instead of '<br>', and '&nbsp;' instead of '&#160;'.

Issue for output (as 'xml'):

>>> from lxml import html
>>> html.tostring(html.fromstring('''<html><head><meta charset="UTF-8"><script src="foo.js"></script></head><body><span class="bar"></span><br>&nbsp;</body></html>'''), method="xml")
b'<html><head><meta charset="UTF-8"/><script src="foo.js"/></head><body><input type="checkbox" checked="checked"/><span class="bar"/><br/>&#160;</body></html>'

While the expected behavior is '<script src="foo.js"></script>' instead of '<script src="foo.js"/>', '<span class="bar"></span>' instead of '<span class="bar"/>', and '&nbsp;' instead of '&#160;'.

Do I miss something for supporting XHTML? And if yes, how should we get lxml work with XHTML? If not, could you add a support for XHTML?

---
Python : sys.version_info(major=3, minor=6, micro=4, releaselevel='final', serial=0)
lxml.etree : (4, 1, 1, 0)
libxml used : (2, 9, 5)
libxml compiled : (2, 9, 5)
libxslt used : (1, 1, 30)
libxslt compiled : (1, 1, 30)