Activity log for bug #1654544

Date Who What changed Old value New value Message
2017-01-06 11:46:04 Elias Dorneles da Silveira Junior bug added bug
2017-01-06 11:46:28 Elias Dorneles da Silveira Junior description Using the latest version of both lxml and html5lib: >>> import html5lib >>> html5lib.__version__ u'0.999999999' >>> import lxml.etree >>> lxml.etree.LXML_VERSION (3, 7, 1, 0) Trying to use html5parser.fromstring with an unicode text input fails with TypeError unexpected keyword argument: $ python Python 2.7.6 (default, Oct 26 2016, 20:30:19) [GCC 4.8.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from lxml.html import html5parser >>> html5parser.fromstring(u'') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/lxml/html/html5parser.py", line 147, in fromstring guess_charset=guess_charset) File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/lxml/html/html5parser.py", line 64, in document_fromstring return parser.parse(html, useChardet=guess_charset).getroot() File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 235, in parse self._parse(stream, False, None, *args, **kwargs) File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 85, in _parse self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs) File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/_tokenizer.py", line 36, in __init__ self.stream = HTMLInputStream(stream, **kwargs) File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream return HTMLUnicodeInputStream(source, **kwargs) TypeError: __init__() got an unexpected keyword argument 'useChardet' Details about installed packages: Python : sys.version_info(major=2, minor=7, micro=6, releaselevel='final', serial=0) lxml.etree : (3, 7, 1, 0) libxml used : (2, 9, 3) libxml compiled : (2, 9, 3) libxslt used : (1, 1, 29) libxslt compiled : (1, 1, 29) I also get the same problem using Python 3: $ python Python 3.4.3 (default, Nov 17 2016, 01:08:31) [GCC 4.8.4] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from lxml.html import html5parser >>> html5parser.fromstring('') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/lxml/html/html5parser.py", line 147, in fromstring guess_charset=guess_charset) File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/lxml/html/html5parser.py", line 64, in document_fromstring return parser.parse(html, useChardet=guess_charset).getroot() File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/html5parser.py", line 235, in parse self._parse(stream, False, None, *args, **kwargs) File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/html5parser.py", line 85, in _parse self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs) File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/_tokenizer.py", line 36, in __init__ self.stream = HTMLInputStream(stream, **kwargs) File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream return HTMLUnicodeInputStream(source, **kwargs) TypeError: __init__() got an unexpected keyword argument 'useChardet' Using the latest version of both lxml and html5lib: ``` >>> import html5lib >>> html5lib.__version__ u'0.999999999' >>> import lxml.etree >>> lxml.etree.LXML_VERSION (3, 7, 1, 0) ``` Trying to use html5parser.fromstring with an unicode text input fails with TypeError unexpected keyword argument: $ python Python 2.7.6 (default, Oct 26 2016, 20:30:19) [GCC 4.8.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from lxml.html import html5parser >>> html5parser.fromstring(u'') Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/lxml/html/html5parser.py", line 147, in fromstring     guess_charset=guess_charset)   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/lxml/html/html5parser.py", line 64, in document_fromstring     return parser.parse(html, useChardet=guess_charset).getroot()   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 235, in parse     self._parse(stream, False, None, *args, **kwargs)   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 85, in _parse     self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/_tokenizer.py", line 36, in __init__     self.stream = HTMLInputStream(stream, **kwargs)   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream     return HTMLUnicodeInputStream(source, **kwargs) TypeError: __init__() got an unexpected keyword argument 'useChardet' Details about installed packages: Python : sys.version_info(major=2, minor=7, micro=6, releaselevel='final', serial=0) lxml.etree : (3, 7, 1, 0) libxml used : (2, 9, 3) libxml compiled : (2, 9, 3) libxslt used : (1, 1, 29) libxslt compiled : (1, 1, 29) I also get the same problem using Python 3: $ python Python 3.4.3 (default, Nov 17 2016, 01:08:31) [GCC 4.8.4] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from lxml.html import html5parser >>> html5parser.fromstring('') Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/lxml/html/html5parser.py", line 147, in fromstring     guess_charset=guess_charset)   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/lxml/html/html5parser.py", line 64, in document_fromstring     return parser.parse(html, useChardet=guess_charset).getroot()   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/html5parser.py", line 235, in parse     self._parse(stream, False, None, *args, **kwargs)   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/html5parser.py", line 85, in _parse     self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/_tokenizer.py", line 36, in __init__     self.stream = HTMLInputStream(stream, **kwargs)   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream     return HTMLUnicodeInputStream(source, **kwargs) TypeError: __init__() got an unexpected keyword argument 'useChardet'
2017-01-06 11:46:51 Elias Dorneles da Silveira Junior description Using the latest version of both lxml and html5lib: ``` >>> import html5lib >>> html5lib.__version__ u'0.999999999' >>> import lxml.etree >>> lxml.etree.LXML_VERSION (3, 7, 1, 0) ``` Trying to use html5parser.fromstring with an unicode text input fails with TypeError unexpected keyword argument: $ python Python 2.7.6 (default, Oct 26 2016, 20:30:19) [GCC 4.8.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from lxml.html import html5parser >>> html5parser.fromstring(u'') Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/lxml/html/html5parser.py", line 147, in fromstring     guess_charset=guess_charset)   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/lxml/html/html5parser.py", line 64, in document_fromstring     return parser.parse(html, useChardet=guess_charset).getroot()   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 235, in parse     self._parse(stream, False, None, *args, **kwargs)   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 85, in _parse     self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/_tokenizer.py", line 36, in __init__     self.stream = HTMLInputStream(stream, **kwargs)   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream     return HTMLUnicodeInputStream(source, **kwargs) TypeError: __init__() got an unexpected keyword argument 'useChardet' Details about installed packages: Python : sys.version_info(major=2, minor=7, micro=6, releaselevel='final', serial=0) lxml.etree : (3, 7, 1, 0) libxml used : (2, 9, 3) libxml compiled : (2, 9, 3) libxslt used : (1, 1, 29) libxslt compiled : (1, 1, 29) I also get the same problem using Python 3: $ python Python 3.4.3 (default, Nov 17 2016, 01:08:31) [GCC 4.8.4] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from lxml.html import html5parser >>> html5parser.fromstring('') Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/lxml/html/html5parser.py", line 147, in fromstring     guess_charset=guess_charset)   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/lxml/html/html5parser.py", line 64, in document_fromstring     return parser.parse(html, useChardet=guess_charset).getroot()   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/html5parser.py", line 235, in parse     self._parse(stream, False, None, *args, **kwargs)   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/html5parser.py", line 85, in _parse     self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/_tokenizer.py", line 36, in __init__     self.stream = HTMLInputStream(stream, **kwargs)   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream     return HTMLUnicodeInputStream(source, **kwargs) TypeError: __init__() got an unexpected keyword argument 'useChardet' Using the latest version of both lxml and html5lib: >>> import html5lib >>> html5lib.__version__ u'0.999999999' >>> import lxml.etree >>> lxml.etree.LXML_VERSION (3, 7, 1, 0) Trying to use html5parser.fromstring with an unicode text input fails with TypeError unexpected keyword argument: $ python Python 2.7.6 (default, Oct 26 2016, 20:30:19) [GCC 4.8.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from lxml.html import html5parser >>> html5parser.fromstring(u'') Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/lxml/html/html5parser.py", line 147, in fromstring     guess_charset=guess_charset)   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/lxml/html/html5parser.py", line 64, in document_fromstring     return parser.parse(html, useChardet=guess_charset).getroot()   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 235, in parse     self._parse(stream, False, None, *args, **kwargs)   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 85, in _parse     self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/_tokenizer.py", line 36, in __init__     self.stream = HTMLInputStream(stream, **kwargs)   File "/home/elias/.virtualenvs/tmp-6aaa3c35e219018b/local/lib/python2.7/site-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream     return HTMLUnicodeInputStream(source, **kwargs) TypeError: __init__() got an unexpected keyword argument 'useChardet' Details about installed packages: Python : sys.version_info(major=2, minor=7, micro=6, releaselevel='final', serial=0) lxml.etree : (3, 7, 1, 0) libxml used : (2, 9, 3) libxml compiled : (2, 9, 3) libxslt used : (1, 1, 29) libxslt compiled : (1, 1, 29) I also get the same problem using Python 3: $ python Python 3.4.3 (default, Nov 17 2016, 01:08:31) [GCC 4.8.4] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from lxml.html import html5parser >>> html5parser.fromstring('') Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/lxml/html/html5parser.py", line 147, in fromstring     guess_charset=guess_charset)   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/lxml/html/html5parser.py", line 64, in document_fromstring     return parser.parse(html, useChardet=guess_charset).getroot()   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/html5parser.py", line 235, in parse     self._parse(stream, False, None, *args, **kwargs)   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/html5parser.py", line 85, in _parse     self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/_tokenizer.py", line 36, in __init__     self.stream = HTMLInputStream(stream, **kwargs)   File "/home/elias/.virtualenvs/tmp-200d3a9b52ebdd89/lib/python3.4/site-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream     return HTMLUnicodeInputStream(source, **kwargs) TypeError: __init__() got an unexpected keyword argument 'useChardet'
2017-03-16 09:11:52 Ondergetekende attachment added 0001-Build-a-retry-mechanism-around-html5lib-s-unpredicta.patch https://bugs.launchpad.net/lxml/+bug/1654544/+attachment/4838845/+files/0001-Build-a-retry-mechanism-around-html5lib-s-unpredicta.patch
2017-03-16 09:12:21 Ondergetekende attachment added 0002-Make-sure-the-html5lib-tests-are-included-in-CI.patch https://bugs.launchpad.net/lxml/+bug/1654544/+attachment/4838846/+files/0002-Make-sure-the-html5lib-tests-are-included-in-CI.patch
2017-08-12 15:01:40 scoder lxml: milestone 3.9.0
2017-08-12 15:04:26 scoder lxml: importance Undecided Medium
2017-08-12 15:04:26 scoder lxml: status New Fix Committed
2017-09-19 10:27:01 scoder lxml: status Fix Committed Fix Released