IPython - Enhanced Interactive Python

IPython doesn't handle utf8 strings correctly

Reported by Stephan Peijnik on 2008-10-29
This bug report is a duplicate of:  Bug #339642: unicode bug - encoding input. Edit Remove
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
IPython
Confirmed
Undecided
Unassigned
ipython (Debian)
Confirmed
Unknown

Bug Description

The original bug came in via bugs.debian.org (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=495439). Could you please comment on this problem and take care of it if needed?

It seems that IPython 0.9.1 is also affected by this bug after a quick check. I have included sample output (of IPython 0.9.1) below.
The test in [2] worked correctly (as expected), but [3] and [4] fail, which they obviously should not.

Regards,

Stephan

--snip--
In [1]: import sys

In [2]: 'ä'.decode(sys.stdin.encoding)
Out[2]: u'\xe4'

In [3]: u'ä'
Out[3]: u'\xc3\xa4'

In [4]: 'ä'
Out[4]: '\xc3\xa4'

Marcelo Fernandez (fernandezm) wrote :

Commenting the line 2018 in /usr/share/python-support/ipython/IPython/iplib.py (IPython 0.8.4, Ubuntu 8.10 version) seems to fix this issue. But I don't know if this breaks anything else... in my informal tests it works fine.

1985 def runsource(self, source, filename='<input>', symbol='single'):
[...]
2014 # if the source code has leading blanks, add 'if 1:\n' to it
2015 # this allows execution of indented pasted code. It is tempting
2016 # to add '\n' at the end of source to run commands like ' a=1'
2017 # directly, but this fails for more complicated scenarios
2018 #source=source.encode(self.stdin_encoding)
2019 if source[:1] in [' ', '\t']:
2020 source = 'if 1:\n%s' % source
2021
2022 try:
2023 code = self.compile(source,filename,symbol)
2024 except (OverflowError, SyntaxError, ValueError, TypeError):
2025 # Case 1
2026 self.showsyntaxerror(filename)
2027 return None

I'll take a look at the trunk version. Could you confirm if this fix doesn't affect the rest of the program?

Thanks
Marcelo

Changed in ipython:
status: Unknown → Fix Released
Marcelo Fernandez (fernandezm) wrote :

Why this bug were closed? Which was the error to patch?

I browsed the ipython repository, but I still find that "source = source.encode(self.stdin_encoding)" line.

Thanks.
Marcelo

Stephan Peijnik (speijnik) wrote :

Sorry for the delay.

I haven't heard from the submitter of the (Debian) bug for a while and so decided to close the bug report for now. Feel free to reopen it.

Regards,

Stephan

Stephan Peijnik (speijnik) wrote :

OK, something went horribly wrong there. The bug that was linked here was actually unrelated (basically the wrong bug number). I fixed that now. The bug is still open in Debian.

Regards,

Stephan

Marcelo Fernandez (fernandezm) wrote :

Ok, I changed the IPython bug status to "Confirmed", but I can't do the same to the Debian Package (and it says "Fix Released", which is wrong based on your comment). It is changeable only by the project mantainer or the bug supervisor.

Regards,
Marcelo

Changed in ipython:
status: New → Confirmed
Stephan Peijnik (speijnik) wrote :

Well, shouldn't that be updated by the bug watch automatically? IIRC this should happen the next time the bug report gets synched.

Regards,
Stephan

Marcelo Fernandez (fernandezm) wrote :

Oh, sorry, I don't know Launchpad very well (yet) :-)

Regards,
Marcelo

jjunho (jjunho) wrote :

I'm just sending my outputs to be checked... My system is Ubuntu Hardy and I mainly use IPython in Portuguese and Korean, so I have my machine set to use UTF-8 and never had any problem till now...

Below I send two examples with Portuguese accented letters and some Korean characters.

=================
Python 2.5.2 (r252:60911, Oct 5 2008, 19:24:49)
Type "copyright", "credits" or "license" for more information.

IPython 0.8.4 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object'. ?object also works, ?? prints more.

In [1]: u'áéíóúàâêôãõç'
Out[1]: u'\xc3\xa1\xc3\xa9\xc3\xad\xc3\xb3\xc3\xba\xc3\xa0\xc3\xa2\xc3\xaa\xc3\xb4\xc3\xa3\xc3\xb5\xc3\xa7'

In [2]: u'가나다라마바사'
Out[2]: u'\xea\xb0\x80\xeb\x82\x98\xeb\x8b\xa4\xeb\x9d\xbc\xeb\xa7\x88\xeb\xb0\x94\xec\x82\xac'

########

Python 2.5.2 (r252:60911, Oct 5 2008, 19:24:49)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> u'áéíóúàâêôãõç'
u'\xe1\xe9\xed\xf3\xfa\xe0\xe2\xea\xf4\xe3\xf5\xe7'
>>> u'가나다라마바사'
u'\uac00\ub098\ub2e4\ub77c\ub9c8\ubc14\uc0ac'
========================

Hope you guys can fix this soon! :) Thanks and keep up the good work!!!

Stephan Peijnik (speijnik) wrote :

I can confirm that the fix proposed by Marcelo Fernandez seems to fix the issue and causes IPython to behave the same way the Python interpreter does.

-- Stephan

Changed in ipython (Debian):
status: Fix Released → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.