Problem with chardet integration (need a bytearray)

Bug #571812 reported by Felipe Kellermann on 2010-04-29
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Undecided
Unassigned

Bug Description

I have a fix for this bug. Registering it anyway so I can link this to a fix branch.

Here is the issue: when BeautifulSoup uses chardet, a str is used when a byte array is required. Thus this exception is generated when chardet is used (below). My patch adds the proper code to create a byte array to properly use chardet.

{{{
  File "Fetcher.py", line 262, in parse
    self._soup = BeautifulSoup(self._raw, convertEntities=BeautifulStoneSoup.HTML_ENTITIES)
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/BeautifulSoup.py", line 1517, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/BeautifulSoup.py", line 1142, in __init__
    self._feed(isHTML=isHTML)
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/BeautifulSoup.py", line 1166, in _feed
    smartQuotesTo=self.smartQuotesTo, isHTML=isHTML)
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/BeautifulSoup.py", line 1787, in __init__
    u = self._convertFrom(chardet.detect(self.markup)['encoding'])
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/chardet/__init__.py", line 24, in detect
    u.feed(aBuf)
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/chardet/universaldetector.py", line 116, in feed
    if prober.feed(aBuf) == constants.eFoundIt:
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/chardet/charsetgroupprober.py", line 60, in feed
    st = prober.feed(aBuf)
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/chardet/utf8prober.py", line 53, in feed
    codingState = self._mCodingSM.next_state(c)
  File "/home/felipek/Projects/trunk/code/contrib/WebCat/chardet/codingstatemachine.py", line 44, in next_state
    byteCls = self._mModel['classTable'][c]
}}}

Related branches

Changed in beautifulsoup:
assignee: nobody → Felipe Kellermann (felipekellermann)
status: New → Fix Committed
Leonard Richardson (leonardr) wrote :

Are you using a custom/bleeding-edge version of chardet? That might explain why I don't see this error.

Aaron DeVore (aaron-devore) wrote :

Your problem is in this line in chardet

byteCls = self._mModel['classTable'][c]

It should instead be

byteCls = self._mModel['classTable'][ord(c)]

Try switching to the very latest release version of chardet, version 2.0.1.

Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers