It seems that lxml.html, whether specified method as 'html' or 'xml', does not work right for outputting (HTML compatible) XHTML content.
Issue for output (using 'html'):
>>> from lxml import html
>>> html.tostring(html.fromstring('''<html><head><meta charset="UTF-8"><script src="foo.js"></script></head><body><span class="bar"></span><br> </body></html>'''), method='html')
b'<html><head><meta charset="UTF-8"><script src="foo.js"></script></head><body><input type="checkbox" checked><span class="bar"></span><br> </body></html>'
While the expected behavior is '<meta charset="UTF-8" />' instead of '<meta charset="UTF-8">', '<input type="checkbox" checked="checked" />' instead of '<input type="checkbox" checked>', '<br />' instead of '<br>', and ' ' instead of ' '.
Issue for output (as 'xml'):
>>> from lxml import html
>>> html.tostring(html.fromstring('''<html><head><meta charset="UTF-8"><script src="foo.js"></script></head><body><span class="bar"></span><br> </body></html>'''), method="xml")
b'<html><head><meta charset="UTF-8"/><script src="foo.js"/></head><body><input type="checkbox" checked="checked"/><span class="bar"/><br/> </body></html>'
While the expected behavior is '<script src="foo.js"></script>' instead of '<script src="foo.js"/>', '<span class="bar"></span>' instead of '<span class="bar"/>', and ' ' instead of ' '.
Do I miss something for supporting XHTML? And if yes, how should we get lxml work with XHTML? If not, could you add a support for XHTML?
Does lxml have a built-in support for XHTML?
It seems that lxml.html, whether specified method as 'html' or 'xml', does not work right for outputting (HTML compatible) XHTML content.
Issue for output (using 'html'):
>>> from lxml import html html.fromstring ('''<html> <head>< meta charset= "UTF-8" ><script src="foo. js"></script> </head> <body>< span class=" bar"></ span><br> < /body>< /html>' ''), method='html') "UTF-8" ><script src="foo. js"></script> </head> <body>< input type="checkbox" checked><span class=" bar"></ span><br>  < /body>< /html>'
>>> html.tostring(
b'<html><head><meta charset=
While the expected behavior is '<meta charset="UTF-8" />' instead of '<meta charset="UTF-8">', '<input type="checkbox" checked="checked" />' instead of '<input type="checkbox" checked>', '<br />' instead of '<br>', and ' ' instead of ' '.
Issue for output (as 'xml'):
>>> from lxml import html html.fromstring ('''<html> <head>< meta charset= "UTF-8" ><script src="foo. js"></script> </head> <body>< span class=" bar"></ span><br> < /body>< /html>' ''), method="xml") "UTF-8" /><script src="foo. js"/></ head><body> <input type="checkbox" checked= "checked" /><span class=" bar"/>< br/>  </body> </html> '
>>> html.tostring(
b'<html><head><meta charset=
While the expected behavior is '<script src="foo. js"></script> ' instead of '<script src="foo.js"/>', '<span class=" bar"></ span>' instead of '<span class="bar"/>', and ' ' instead of ' '.
Do I miss something for supporting XHTML? And if yes, how should we get lxml work with XHTML? If not, could you add a support for XHTML?
--- info(major= 3, minor=6, micro=4, releaselevel= 'final' , serial=0)
Python : sys.version_
lxml.etree : (4, 1, 1, 0)
libxml used : (2, 9, 5)
libxml compiled : (2, 9, 5)
libxslt used : (1, 1, 30)
libxslt compiled : (1, 1, 30)