Crash with latest html5lib

Bug #1603299 reported by gazpachoking
78
This bug affects 13 people
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Released
Undecided
Unassigned
beautifulsoup4 (Ubuntu)
Fix Released
High
Unassigned

Bug Description

html5lib treebuilders._base is now public and renamed _base in latest versions. Bs4 needs updated to work with these versions, currently crashing with:

  File "c:\python27\lib\site-packages\bs4\builder\_html5lib.py", line 70, in <mo
dule>
    class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder):
AttributeError: 'module' object has no attribute '_base'

Revision history for this message
gazpachoking (chase-sterling) wrote :
Revision history for this message
gazpachoking (chase-sterling) wrote :

This crash occurs on 4.4.1 with html5lib>=0.99999999

Revision history for this message
Leonard Richardson (leonardr) wrote :

Revision 406 restores compatibility with html5lib.

Changed in beautifulsoup:
status: New → Fix Committed
Revision history for this message
Carlos Sanchez (papachoco) wrote :

Any ETA for the fix to be available?

Carlos

Changed in beautifulsoup:
status: Fix Committed → Fix Released
Revision history for this message
Brad Erickson (eosrei) wrote :

Could this binary package be copied to Xenial(16.04)? Manual download/install of the deb from https://launchpad.net/ubuntu/yakkety/amd64/python3-bs4/4.5.1-1 fixes the issue. Thanks!

James Lu (tacocat)
no longer affects: variety (Ubuntu)
Revision history for this message
James Lu (tacocat) wrote :

This issue is breaking variety in Xenial (https://bugs.launchpad.net/variety/+bug/1645572), so I'm seconding the request to fix it there as well.

Changed in beautifulsoup4 (Ubuntu):
status: New → Confirmed
Revision history for this message
Levente Torok (toroklev) wrote :

By using remarkable (a markdown editor) I ran into the same error.
I suggest to use the following version of html5lib exactly

sudo pip install html5lib==0.9999999

with seven 9s exactly.

Revision history for this message
Joshua Powers (powersj) wrote :

Hi Folks,

If we are going to SRU a fix back to an existing release we are going to need to have a test case/steps-to-reproduce. Can someone give us a python code snippet we could use to test this out? It would help get the ball rolling.

Thanks!

Changed in beautifulsoup4 (Ubuntu):
importance: Undecided → High
Revision history for this message
James Lu (tacocat) wrote :

I realize this bug doesn't impact any version of html5lib officially in xenial. However, for whatever reason if a user decides to install a newer version of html5lib (e.g. via pip), bs4 will break as a side effect and possibly render other programs unusable.

AFAIK it's 'pip install html5lib' (to get html5lib >= 0.99999999) and then trying to import bs4 by any means that triggers the issue.

gl@nucleus:~$ pip install html5lib
Collecting html5lib
  Downloading html5lib-0.999999999-py2.py3-none-any.whl (112kB)
    100% |████████████████████████████████| 122kB 2.8MB/s
Collecting six (from html5lib)
  Downloading six-1.10.0-py2.py3-none-any.whl
Collecting setuptools>=18.5 (from html5lib)
  Downloading setuptools-35.0.2-py2.py3-none-any.whl (390kB)
    100% |████████████████████████████████| 399kB 2.0MB/s
Collecting webencodings (from html5lib)
  Downloading webencodings-0.5.1-py2.py3-none-any.whl
Collecting appdirs>=1.4.0 (from setuptools>=18.5->html5lib)
  Downloading appdirs-1.4.3-py2.py3-none-any.whl
Collecting packaging>=16.8 (from setuptools>=18.5->html5lib)
  Downloading packaging-16.8-py2.py3-none-any.whl
Collecting pyparsing (from packaging>=16.8->setuptools>=18.5->html5lib)
  Downloading pyparsing-2.2.0-py2.py3-none-any.whl (56kB)
    100% |████████████████████████████████| 61kB 4.8MB/s
Installing collected packages: six, appdirs, pyparsing, packaging, setuptools, webencodings, html5lib
Successfully installed appdirs html5lib-0.999 packaging pyparsing setuptools-20.7.0 six-1.10.0 webencodings
You are using pip version 8.1.1, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
gl@nucleus:~$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import bs4
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/bs4/__init__.py", line 30, in <module>
    from .builder import builder_registry, ParserRejectedMarkup
  File "/usr/lib/python2.7/dist-packages/bs4/builder/__init__.py", line 314, in <module>
    from . import _html5lib
  File "/usr/lib/python2.7/dist-packages/bs4/builder/_html5lib.py", line 70, in <module>
    class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder):
AttributeError: 'module' object has no attribute '_base'
>>>

Changed in beautifulsoup4 (Ubuntu):
status: Confirmed → Fix Released
no longer affects: beautifulsoup4 (Ubuntu Xenial)
Revision history for this message
James Lu (tacocat) wrote :

Hi,

Is there any particular reason this was unmarked from Xenial? Although I don't believe mixing Python library versions is an optimal configuration, this bug truly does break setups.

Revision history for this message
Amit kumar (kumaramit228) wrote :

Hi

I am using ubuntu 16.04 and I am facing the same issue with python 2. I tried all the different versions of HTMLib mentioned here but nothing seems t help. Any help will be appreciated.
Thanku.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers