2018-08-25 19:10:27 |
Tim Tisdall |
description |
```
>>> from lxml.html import fromstring
>>> t = u"""\xef\xbb\xbf<!DOCTYPE html><html><head><title>test</title></head><body><h1>test</h1></body></html>"""
>>> tree = fromstring(t)
>>> print(tree)
<Element div at 0x7fdd6c9de940>
>>> tree.head
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/lxml/html/__init__.py", line 298, in head
return self.xpath('//head|//x:head', namespaces={'x':XHTML_NAMESPACE})[0]
IndexError: list index out of range
>>>
```
According to Wikipedia the `EF BB BF` is the BOM for UTF-8
Python : sys.version_info(major=2, minor=7, micro=7, releaselevel='final', serial=0)
lxml.etree : (4, 2, 4, 0)
libxml used : (2, 9, 8)
libxml compiled : (2, 9, 8)
libxslt used : (1, 1, 32)
libxslt compiled : (1, 1, 32) |
>>> from lxml.html import fromstring
>>> t = u"""\xef\xbb\xbf<!DOCTYPE html><html><head><title>test</title></head><body><h1>test</h1></body></html>"""
>>> tree = fromstring(t)
>>> print(tree)
<Element div at 0x7fdd6c9de940>
>>> tree.head
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/lxml/html/__init__.py", line 298, in head
return self.xpath('//head|//x:head', namespaces={'x':XHTML_NAMESPACE})[0]
IndexError: list index out of range
>>>
According to Wikipedia the `EF BB BF` is the BOM for UTF-8
Python : sys.version_info(major=2, minor=7, micro=7, releaselevel='final', serial=0)
lxml.etree : (4, 2, 4, 0)
libxml used : (2, 9, 8)
libxml compiled : (2, 9, 8)
libxslt used : (1, 1, 32)
libxslt compiled : (1, 1, 32) |
|