Exception in "find" when using htm5lib

Bug #1184417 reported by Richard Brooksby
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
New
Undecided
Unassigned

Bug Description

Exception when finding a tag BeautifulSoup4 when html5lib is installed. Does not occur with lxml. See transcript below and attachment (which also contains the transcript).

Also, try a simple soup.find('h1') on this document and it won't be found with html5lib, but can be found with lxml or without either.

  $ virtualenv bug
  $ bug/bin/pip install BeautifulSoup4
  Downloading beautifulsoup4-4.2.0.tar.gz (138kB): 138kB downloaded
  ...
  $ bug/bin/pip install html5lib
  Downloading html5lib-1.0b1.tar.gz (882kB): 882kB downloaded
  ...
  $ bug/bin/python
  >>> import bs4
  >>> bs4.BeautifulSoup(open('foo.html')).find('body').find('div').find('h1')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/Volumes/Silverbird-HD/Local/Users/rb/tmp/bs-bug/tool/lib/python2.7/site-packages/bs4/element.py", line 1146, in find
      l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
    File "/Volumes/Silverbird-HD/Local/Users/rb/tmp/bs-bug/tool/lib/python2.7/site-packages/bs4/element.py", line 1167, in find_all
      return self._find_all(name, attrs, text, limit, generator, **kwargs)
    File "/Volumes/Silverbird-HD/Local/Users/rb/tmp/bs-bug/tool/lib/python2.7/site-packages/bs4/element.py", line 495, in _find_all
      i = next(generator)
    File "/Volumes/Silverbird-HD/Local/Users/rb/tmp/bs-bug/tool/lib/python2.7/site-packages/bs4/element.py", line 1185, in descendants
      current = current.next_element
  AttributeError: 'NoneType' object has no attribute 'next_element'

Revision history for this message
Richard Brooksby (rptb1) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.