2017-01-06 11:46:04 |
Elias Dorneles da Silveira Junior |
bug |
|
|
added bug |
2017-01-06 11:46:28 |
Elias Dorneles da Silveira Junior |
description |
Using the latest version of both lxml and html5lib:
>>> import html5lib
>>> html5lib.__version__
u'0.999999999'
>>> import lxml.etree
>>> lxml.etree.LXML_VERSION
(3, 7, 1, 0)
Trying to use html5parser.fromstring with an unicode text input fails with TypeError unexpected keyword argument:
$ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml.html import html5parser
>>> html5parser.fromstring(u'')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/lxml/html/html5parser.py", line 147, in fromstring
guess_charset=guess_charset)
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/lxml/html/html5parser.py", line 64, in document_fromstring
return parser.parse(html, useChardet=guess_charset).getroot()
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 235, in parse
self._parse(stream, False, None, *args, **kwargs)
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 85, in _parse
self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/_tokenizer.py", line 36, in __init__
self.stream = HTMLInputStream(stream, **kwargs)
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream
return HTMLUnicodeInputStream(source, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'useChardet'
Details about installed packages:
Python : sys.version_info(major=2, minor=7, micro=6, releaselevel='final', serial=0)
lxml.etree : (3, 7, 1, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 3)
libxslt used : (1, 1, 29)
libxslt compiled : (1, 1, 29)
I also get the same problem using Python 3:
$ python
Python 3.4.3 (default, Nov 17 2016, 01:08:31)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml.html import html5parser
>>> html5parser.fromstring('')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/lxml/html/html5parser.py", line 147, in fromstring
guess_charset=guess_charset)
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/lxml/html/html5parser.py", line 64, in document_fromstring
return parser.parse(html, useChardet=guess_charset).getroot()
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/html5parser.py", line 235, in parse
self._parse(stream, False, None, *args, **kwargs)
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/html5parser.py", line 85, in _parse
self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/_tokenizer.py", line 36, in __init__
self.stream = HTMLInputStream(stream, **kwargs)
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream
return HTMLUnicodeInputStream(source, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'useChardet' |
Using the latest version of both lxml and html5lib:
```
>>> import html5lib
>>> html5lib.__version__
u'0.999999999'
>>> import lxml.etree
>>> lxml.etree.LXML_VERSION
(3, 7, 1, 0)
```
Trying to use html5parser.fromstring with an unicode text input fails with TypeError unexpected keyword argument:
$ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml.html import html5parser
>>> html5parser.fromstring(u'')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/lxml/html/html5parser.py", line 147, in fromstring
guess_charset=guess_charset)
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/lxml/html/html5parser.py", line 64, in document_fromstring
return parser.parse(html, useChardet=guess_charset).getroot()
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 235, in parse
self._parse(stream, False, None, *args, **kwargs)
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 85, in _parse
self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/_tokenizer.py", line 36, in __init__
self.stream = HTMLInputStream(stream, **kwargs)
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream
return HTMLUnicodeInputStream(source, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'useChardet'
Details about installed packages:
Python : sys.version_info(major=2, minor=7, micro=6, releaselevel='final', serial=0)
lxml.etree : (3, 7, 1, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 3)
libxslt used : (1, 1, 29)
libxslt compiled : (1, 1, 29)
I also get the same problem using Python 3:
$ python
Python 3.4.3 (default, Nov 17 2016, 01:08:31)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml.html import html5parser
>>> html5parser.fromstring('')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/lxml/html/html5parser.py", line 147, in fromstring
guess_charset=guess_charset)
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/lxml/html/html5parser.py", line 64, in document_fromstring
return parser.parse(html, useChardet=guess_charset).getroot()
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/html5parser.py", line 235, in parse
self._parse(stream, False, None, *args, **kwargs)
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/html5parser.py", line 85, in _parse
self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/_tokenizer.py", line 36, in __init__
self.stream = HTMLInputStream(stream, **kwargs)
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream
return HTMLUnicodeInputStream(source, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'useChardet' |
|
2017-01-06 11:46:51 |
Elias Dorneles da Silveira Junior |
description |
Using the latest version of both lxml and html5lib:
```
>>> import html5lib
>>> html5lib.__version__
u'0.999999999'
>>> import lxml.etree
>>> lxml.etree.LXML_VERSION
(3, 7, 1, 0)
```
Trying to use html5parser.fromstring with an unicode text input fails with TypeError unexpected keyword argument:
$ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml.html import html5parser
>>> html5parser.fromstring(u'')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/lxml/html/html5parser.py", line 147, in fromstring
guess_charset=guess_charset)
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/lxml/html/html5parser.py", line 64, in document_fromstring
return parser.parse(html, useChardet=guess_charset).getroot()
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 235, in parse
self._parse(stream, False, None, *args, **kwargs)
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 85, in _parse
self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/_tokenizer.py", line 36, in __init__
self.stream = HTMLInputStream(stream, **kwargs)
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream
return HTMLUnicodeInputStream(source, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'useChardet'
Details about installed packages:
Python : sys.version_info(major=2, minor=7, micro=6, releaselevel='final', serial=0)
lxml.etree : (3, 7, 1, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 3)
libxslt used : (1, 1, 29)
libxslt compiled : (1, 1, 29)
I also get the same problem using Python 3:
$ python
Python 3.4.3 (default, Nov 17 2016, 01:08:31)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml.html import html5parser
>>> html5parser.fromstring('')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/lxml/html/html5parser.py", line 147, in fromstring
guess_charset=guess_charset)
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/lxml/html/html5parser.py", line 64, in document_fromstring
return parser.parse(html, useChardet=guess_charset).getroot()
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/html5parser.py", line 235, in parse
self._parse(stream, False, None, *args, **kwargs)
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/html5parser.py", line 85, in _parse
self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/_tokenizer.py", line 36, in __init__
self.stream = HTMLInputStream(stream, **kwargs)
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream
return HTMLUnicodeInputStream(source, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'useChardet' |
Using the latest version of both lxml and html5lib:
>>> import html5lib
>>> html5lib.__version__
u'0.999999999'
>>> import lxml.etree
>>> lxml.etree.LXML_VERSION
(3, 7, 1, 0)
Trying to use html5parser.fromstring with an unicode text input fails with TypeError unexpected keyword argument:
$ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml.html import html5parser
>>> html5parser.fromstring(u'')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/lxml/html/html5parser.py", line 147, in fromstring
guess_charset=guess_charset)
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/lxml/html/html5parser.py", line 64, in document_fromstring
return parser.parse(html, useChardet=guess_charset).getroot()
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 235, in parse
self._parse(stream, False, None, *args, **kwargs)
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 85, in _parse
self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/_tokenizer.py", line 36, in __init__
self.stream = HTMLInputStream(stream, **kwargs)
File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream
return HTMLUnicodeInputStream(source, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'useChardet'
Details about installed packages:
Python : sys.version_info(major=2, minor=7, micro=6, releaselevel='final', serial=0)
lxml.etree : (3, 7, 1, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 3)
libxslt used : (1, 1, 29)
libxslt compiled : (1, 1, 29)
I also get the same problem using Python 3:
$ python
Python 3.4.3 (default, Nov 17 2016, 01:08:31)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml.html import html5parser
>>> html5parser.fromstring('')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/lxml/html/html5parser.py", line 147, in fromstring
guess_charset=guess_charset)
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/lxml/html/html5parser.py", line 64, in document_fromstring
return parser.parse(html, useChardet=guess_charset).getroot()
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/html5parser.py", line 235, in parse
self._parse(stream, False, None, *args, **kwargs)
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/html5parser.py", line 85, in _parse
self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/_tokenizer.py", line 36, in __init__
self.stream = HTMLInputStream(stream, **kwargs)
File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream
return HTMLUnicodeInputStream(source, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'useChardet' |
|
2017-03-16 09:11:52 |
Ondergetekende |
attachment added |
|
0001-Build-a-retry-mechanism-around-html5lib-s-unpredicta.patch https://bugs.launchpad.net/lxml/+bug/1654544/+attachment/4838845/+files/0001-Build-a-retry-mechanism-around-html5lib-s-unpredicta.patch |
|
2017-03-16 09:12:21 |
Ondergetekende |
attachment added |
|
0002-Make-sure-the-html5lib-tests-are-included-in-CI.patch https://bugs.launchpad.net/lxml/+bug/1654544/+attachment/4838846/+files/0002-Make-sure-the-html5lib-tests-are-included-in-CI.patch |
|
2017-08-12 15:01:40 |
scoder |
lxml: milestone |
|
3.9.0 |
|
2017-08-12 15:04:26 |
scoder |
lxml: importance |
Undecided |
Medium |
|
2017-08-12 15:04:26 |
scoder |
lxml: status |
New |
Fix Committed |
|
2017-09-19 10:27:01 |
scoder |
lxml: status |
Fix Committed |
Fix Released |
|