Crash with latest html5lib

Bug #1603299 reported by gazpachoking on 2016-07-15
78
This bug affects 13 people
Affects Status Importance Assigned to Milestone
Beautiful Soup
Undecided
Unassigned
beautifulsoup4 (Ubuntu)
High
Unassigned

Bug Description

html5lib treebuilders._base is now public and renamed _base in latest versions. Bs4 needs updated to work with these versions, currently crashing with:

  File "c:\python27\lib\site-packages\bs4\builder\_html5lib.py", line 70, in <mo
dule>
    class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder):
AttributeError: 'module' object has no attribute '_base'

gazpachoking (chase-sterling) wrote :

This crash occurs on 4.4.1 with html5lib>=0.99999999

Leonard Richardson (leonardr) wrote :

Revision 406 restores compatibility with html5lib.

Changed in beautifulsoup:
status: New → Fix Committed
Carlos Sanchez (papachoco) wrote :

Any ETA for the fix to be available?

Carlos

Changed in beautifulsoup:
status: Fix Committed → Fix Released
Brad Erickson (eosrei) wrote :

Could this binary package be copied to Xenial(16.04)? Manual download/install of the deb from https://launchpad.net/ubuntu/yakkety/amd64/python3-bs4/4.5.1-1 fixes the issue. Thanks!

James Lu (tacocat) on 2016-11-29
no longer affects: variety (Ubuntu)
James Lu (tacocat) wrote :

This issue is breaking variety in Xenial (https://bugs.launchpad.net/variety/+bug/1645572), so I'm seconding the request to fix it there as well.

Changed in beautifulsoup4 (Ubuntu):
status: New → Confirmed
Levente Torok (toroklev) wrote :

By using remarkable (a markdown editor) I ran into the same error.
I suggest to use the following version of html5lib exactly

sudo pip install html5lib==0.9999999

with seven 9s exactly.

Joshua Powers (powersj) wrote :

Hi Folks,

If we are going to SRU a fix back to an existing release we are going to need to have a test case/steps-to-reproduce. Can someone give us a python code snippet we could use to test this out? It would help get the ball rolling.

Thanks!

Changed in beautifulsoup4 (Ubuntu):
importance: Undecided → High
James Lu (tacocat) wrote :

I realize this bug doesn't impact any version of html5lib officially in xenial. However, for whatever reason if a user decides to install a newer version of html5lib (e.g. via pip), bs4 will break as a side effect and possibly render other programs unusable.

AFAIK it's 'pip install html5lib' (to get html5lib >= 0.99999999) and then trying to import bs4 by any means that triggers the issue.

gl@nucleus:~$ pip install html5lib
Collecting html5lib
  Downloading html5lib-0.999999999-py2.py3-none-any.whl (112kB)
    100% |████████████████████████████████| 122kB 2.8MB/s
Collecting six (from html5lib)
  Downloading six-1.10.0-py2.py3-none-any.whl
Collecting setuptools>=18.5 (from html5lib)
  Downloading setuptools-35.0.2-py2.py3-none-any.whl (390kB)
    100% |████████████████████████████████| 399kB 2.0MB/s
Collecting webencodings (from html5lib)
  Downloading webencodings-0.5.1-py2.py3-none-any.whl
Collecting appdirs>=1.4.0 (from setuptools>=18.5->html5lib)
  Downloading appdirs-1.4.3-py2.py3-none-any.whl
Collecting packaging>=16.8 (from setuptools>=18.5->html5lib)
  Downloading packaging-16.8-py2.py3-none-any.whl
Collecting pyparsing (from packaging>=16.8->setuptools>=18.5->html5lib)
  Downloading pyparsing-2.2.0-py2.py3-none-any.whl (56kB)
    100% |████████████████████████████████| 61kB 4.8MB/s
Installing collected packages: six, appdirs, pyparsing, packaging, setuptools, webencodings, html5lib
Successfully installed appdirs html5lib-0.999 packaging pyparsing setuptools-20.7.0 six-1.10.0 webencodings
You are using pip version 8.1.1, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
gl@nucleus:~$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import bs4
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/bs4/__init__.py", line 30, in <module>
    from .builder import builder_registry, ParserRejectedMarkup
  File "/usr/lib/python2.7/dist-packages/bs4/builder/__init__.py", line 314, in <module>
    from . import _html5lib
  File "/usr/lib/python2.7/dist-packages/bs4/builder/_html5lib.py", line 70, in <module>
    class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder):
AttributeError: 'module' object has no attribute '_base'
>>>

Changed in beautifulsoup4 (Ubuntu):
status: Confirmed → Fix Released
no longer affects: beautifulsoup4 (Ubuntu Xenial)
James Lu (tacocat) wrote :

Hi,

Is there any particular reason this was unmarked from Xenial? Although I don't believe mixing Python library versions is an optimal configuration, this bug truly does break setups.

Amit kumar (kumaramit228) wrote :

Hi

I am using ubuntu 16.04 and I am facing the same issue with python 2. I tried all the different versions of HTMLib mentioned here but nothing seems t help. Any help will be appreciated.
Thanku.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers