Beautiful Soup fails to santize unquoted style tags

Bug #403640 reported by Kasuko on 2009-07-23
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Beautiful Soup

Bug Description

This bug is manifesting it self in the program Sipie from

It attempts to parse the page here

In the source of this page is a tag <input type="password" name="password" style={height:21px;} value="" size="30" maxlength="20"> where the style tag is not quoted and beautiful soup misses this resulting in the following:

Traceback (most recent call last):
  File "/usr/bin/gtkSipie", line 8, in <module>
    load_entry_point('Sipie==0.1196144357', 'gui_scripts', 'gtkSipie')()
  File "/usr/lib/python2.6/site-packages/Sipie/", line 88, in gtkPlayer
    for selectable in sipie.getStreams():
  File "/usr/lib/python2.6/site-packages/Sipie/", line 375, in getStreams
    streams = self.tryGetStreams()
  File "/usr/lib/python2.6/site-packages/Sipie/", line 299, in tryGetStreams
    soup = BeautifulSoup(data)
  File "/usr/lib/python2.6/site-packages/", line 1499, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/", line 1230, in __init__
  File "/usr/lib/python2.6/site-packages/", line 1263, in _feed
  File "/usr/lib/python2.6/", line 108, in feed
  File "/usr/lib/python2.6/", line 148, in goahead
    k = self.parse_starttag(i)
  File "/usr/lib/python2.6/", line 263, in parse_starttag
    % (rawdata[k:endpos][:20],))
  File "/usr/lib/python2.6/", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: junk characters in start tag: u'{height:21px;} value', at line 145, column 26

I am currently running Arch Linux with beautiful-soup version but there have been reports on the sourceforge page for Sipie that the problem is occuring on other platforms as well, apparently 3.0.7 was able to sanitize this.

Any other info I can gather I would be glad to give, just ask.

Thank You

Kasuko (kasuko) on 2009-07-23
description: updated
Leonard Richardson (leonardr) wrote :

The parsers used by BS4 handle this markup correctly.

Changed in beautifulsoup:
status: New → Fix Committed
Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers