Crash with html5lib 0.95 when creating BeautifulSoup object

Reported by armakuni on 2012-02-29
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Beautiful Soup
Undecided
Unassigned

Bug Description

Following code causes a crash:

html = """
<!DOCTYPE html>
<html>
    <head>
        <title> - </title>
    </head>
    <body>
        <em><em></em></em>
    </body>
</html>

"""

soup = BeautifulSoup(html)

I tried to strip everything extra from HTML. The HTML should be valid HTML5. I am using Python 2.7.2 in Windows, Beautiful Soup 4.0.0b9 and html5lib 0.95. Code works with html5lib 0.90.

Traceback is following:
  File "D:\bs\bs4\__init__.py", line 168, in __init__
    self._feed()
  File "D:\bs\bs4\__init__.py", line 181, in _feed
    self.builder.feed(self.markup)
  File "D:\bs\bs4\builder\_html5lib.py", line 37, in feed
    doc = parser.parse(markup, encoding=self.user_specified_encoding)
  File "D:\bs\html5lib\html5parser.py", line 247, in parse
    parseMeta=parseMeta, useChardet=useChardet)
  File "D:\bs\html5lib\html5parser.py", line 115, in _parse
    self.mainLoop()
  File "D:\bs\html5lib\html5parser.py", line 209, in mainLoop
    new_token = phase.processStartTag(new_token)
  File "D:\bs\html5lib\html5parser.py", line 514, in processStartTag
    return self.startTagHandler[token["name"]](token)
  File "D:\bs\html5lib\html5parser.py", line 1151, in startTagFormatting
    self.addFormattingElement(token)
  File "D:\bs\html5lib\html5parser.py", line 1003, in addFormattingElement
    elif self.isMatchingFormattingElement(node, element):
  File "D:\bs\html5lib\html5parser.py", line 984, in isMatchingFormattingElement
    elif len(node1.attributes) != len(node2.attributes):
TypeError: object of type 'AttrList' has no len()

Changed in beautifulsoup:
status: New → Fix Committed
Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers