It seems unlikely to me that something like an unterminated script would cause it to choke. It certainly does appear that the tag is registered as unterminated, though:
examples = [
'<html><head><title>Foo</title><script>This is an unterminated script',
'<html><head><script>Mismatched close</head><body><p>Hello</p></body></html>'
]
from lxml import etree
for content in examples:
tree = etree.fromstring(content, etree.HTMLParser(recover=True))
print '%s => %s' % (content, tree.xpath('//script')[0].text)
The example provided here does appear to have an unterminated script, but none of the examples I have do (attaching them now).
It seems unlikely to me that something like an unterminated script would cause it to choke. It certainly does appear that the tag is registered as unterminated, though:
examples = [ <head>< title>Foo< /title> <script> This is an unterminated script', <head>< script> Mismatched close</ head><body> <p>Hello< /p></body> </html> '
'<html>
'<html>
]
from lxml import etree
for content in examples: g(content, etree.HTMLParse r(recover= True)) '//script' )[0].text)
tree = etree.fromstrin
print '%s => %s' % (content, tree.xpath(
The example provided here does appear to have an unterminated script, but none of the examples I have do (attaching them now).