Try this with plain libxml2:
"""
$ echo '<html><head><title><![CDATA[title]]></title></head><body><![CDATA[body]]></body></html>' | xmllint --html -
-:1: HTML parser error : htmlParseStartTag: invalid element name
<html><head><title><![CDATA[title]]></title></head><body><![CDATA[body]]></body> ^
-:1: HTML parser error : htmlParseStartTag: invalid element name
<html><head><title><![CDATA[title]]></title></head><body><![CDATA[body]]></body> ^
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><title></title></head>
<body></body>
</html>
"""
"CDATA" is an XML thing. It has no meaning in HTML: /developer. mozilla. org/en- US/docs/ Web/API/ CDATASection# Specifications
https:/
Try this with plain libxml2: <head>< title>< ![CDATA[ title]] ></title> </head> <body>< ![CDATA[ body]]> </body> </html> ' | xmllint --html - head><title> <![CDATA[ title]] ></title> </head> <body>< ![CDATA[ body]]> </body>
^ head><title> <![CDATA[ title]] ></title> </head> <body>< ![CDATA[ body]]> </body>
^ www.w3. org/TR/ REC-html40/ loose.dtd"> title>< /title> </head>
"""
$ echo '<html>
-:1: HTML parser error : htmlParseStartTag: invalid element name
<html><
-:1: HTML parser error : htmlParseStartTag: invalid element name
<html><
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://
<html>
<head><
<body></body>
</html>
"""