Comment 2 for bug 336443

Revision history for this message
Stefano Rivera (stefanor) wrote : Re: [exchange] BeautifulSoup version error?

The blame for this one looks to be squarely on the shoulders of BeautifulSoup >= 3.1:

# Beautiful Soup is now based on HTMLParser rather than SGMLParser, which is gone in Python 3. There's some bad HTML that SGMLParser handled but HTMLParser doesn't, usually to do with attribute values that aren't closed or have brackets inside them:

  <a href="foo</a>, </a><a href="bar">baz</a>
  <a b="<a>">', '<a b="<a>"></a><a>"></a>

A later version of Beautiful Soup will allow you to plug in different parsers to make tradeoffs between speed and the ability to handle bad HTML.