GBK encoding problems

Bug #1263000 reported by scj
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Invalid
Undecided
Unassigned

Bug Description

when I use beautiflsoup to process a html file,I met a error:"UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in position 161: illegal multibyte sequence" .even changed the encoding to gb18030,It doesn't work.can you help me to solve it.I use python3.3 .

Revision history for this message
Leonard Richardson (leonardr) wrote : Re: [Bug 1263000] [NEW] GBK encoding problems

What is the file you are using, and what code are you using to process it?

I can make a guess at the answer: "\xa0" is a Latin-1 byte sequence. If a
gb18030 document contains "\xa0", then it is not really a gb18030
document--it has no encoding at all. You will not be able to convert it to
Unicode without removing \xa0 and similar characters, or replacing them
with their gb18030 equivalents.

The detwingle() method will fix the problem of Latin-1 byte sequences
embedded in UTF-8, but I don't think it will work for gb18030.

Leonard

On Fri, Dec 20, 2013 at 2:29 AM, scj <email address hidden> wrote:

> Public bug reported:
>
> when I use beautiflsoup to process a html file,I met a
> error:"UnicodeEncodeError: 'gbk' codec can't encode character '\xa0' in
> position 161: illegal multibyte sequence" .even changed the encoding to
> gb18030,It doesn't work.can you help me to solve it.I use python3.3 .
>
> ** Affects: beautifulsoup
> Importance: Undecided
> Status: New
>
> --
> You received this bug notification because you are subscribed to
> Beautiful Soup.
> https://bugs.launchpad.net/bugs/1263000
>
> Title:
> GBK encoding problems
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/beautifulsoup/+bug/1263000/+subscriptions
>

Changed in beautifulsoup:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.