Ubuntu
command-not-found package

[11.10 beta1] UnicodeDecodeError crash on localized input in multiple encodings/languages

Bug #839609 reported by Dennis Chua on 2011-09-02

108

This bug affects 16 people

Affects		Status	Importance	Assigned to	Milestone
	command-not-found	Fix Released	Critical	Zygmunt Krynicki
	command-not-found (Ubuntu)	Fix Released	Undecided	Unassigned

Bug Description

The command-not-found package crashes on input of a simplified chinese character representing a bogus command. The problem was found with in 11.10 beta1, for both the x86/i386 and amd64 systems. Debugging the python script in /usr/lib/command-not-found shows that a UnicodeDecodeError is thrown. The crash_guard() callback framework catches this and reports the error.

Here are further observations.

(1) With the same simplified chinese input, 11.04 handles the test case gracefully, returning a message
explaining that the command is not found.

(2) Between these two series, python has change: 11.04 (Python 2.7.1+) versus 11.10 beta1 (Python 2.7.2+).

To elaborate on this problem, the following files have been included:

(1) Screen shots showing step-by-step how to reproduce the bug. As switching to Simplified Chinese is
difficult to explain in words, a video was taken to show how this process.

(2) A screen shot showing /usr/lib/command-not-found script traced by means of the Python pdb module.
This shows the zh_CN.UTF-8 byte stream input and the point where UnicodeDecodeError is thrown.

This issue was investigated in 11.10 beta1 host running in VirtualBox.

===

Taken from To_Reproduce_Bug.txt attachment.

01_After_ISO_Installation.png The VirtualBox VM with default English locale.

02_Open_Lanugage_Support.png Prepare to switch to Simplified Chinese locale.
See the accompanying video for this process.

03_Enable_IBUS_Pinyin.png After switching to Simplified Chinese. Note the locale
environment variables. Click the IBUS keyboard icon
and select Pinyin input.

04_Pinyin_Enabled.png Ready for Pinyin input. Note the blue IBUS icon.

05_Type_Phonetic_Pinyin.png Type in two letters: 'w' followed by 'o'. Phonetically these
    correspond to the Chinese character representing 'I' or 'Myself'.
    IBUS displays options. You want the first one. Hit the space
    bar to choose it.

06_Chinese_Input_Complete.png Chinese 'wo' in zh_CN.UTF-8 is ready to be passed to the Bash.
Hit the return key to do so.

07_Crash_command_not_found.png Bash calls command-not-found, which can't handle the input.

08_Disable_Pinyin_Input.png Instruct IBUS to disable Simplified Chinese input.

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: command-not-found 0.2.43ubuntu1 [modified: usr/lib/command-not-found]
ProcVersionSignature: Ubuntu 3.0.0-9.15-generic 3.0.3
Uname: Linux 3.0.0-9-generic x86_64
Architecture: amd64
Date: Fri Sep 2 10:23:08 2011
InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Beta amd64 (20110901)
PackageArchitecture: all
SourcePackage: command-not-found
UpgradeStatus: No upgrade log present (probably fresh install)

Tags:

Related branches

lp:~zyga/command-not-found/fix-839609

Merged into lp:~command-not-found-developers/command-not-found/trunk at revision 141

Zygmunt Krynicki: Approve on 2011-09-21

lp:~command-not-found-developers/command-not-found/trunk

lp:ubuntu/oneiric/command-not-found

Revision history for this message

Dennis Chua (dmcvocation) wrote on 2011-09-02:

Dependencies.txt Edit (955 bytes, text/plain; charset="utf-8")
ProcEnviron.txt Edit (124 bytes, text/plain; charset="utf-8")

Revision history for this message

Dennis Chua (dmcvocation) wrote on 2011-09-02:

To see the image and video file attachments, follow this - https://chinstrap.canonical.com/~dchua/bug_839609/

Launchpad Janitor (janitor) on 2011-09-04

Changed in command-not-found (Ubuntu):
status:	New → Confirmed

Revision history for this message

Dennis Chua (dmcvocation) wrote on 2011-09-07:

More effort put into this problem yielded a likely solution (i.e. 'hack'). First of all, the Python Exception was

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

This was with the bogus Simplified Chinese command 我。The expected way command-not-found should have handled this would be something like:

root@u-VirtualBox:~# mgcc
未找到 'mgcc' 命令，您要输入的是否是：
命令 'mlcc' 来自于包 'mlterm-tools' (universe)
命令 'cgcc' 来自于包 'sparse' (multiverse)
命令 'gcc' 来自于包 'gcc' (main)
命令 'gcc' 来自于包 'pentium-builder' (universe)
mgcc：找不到命令

Now the hack involves updating two Python files in the package:

(1) /usr/lib/command-not-found (line 24) :
cnf.install(unicode=True) ==> cnf.install(unicode=False)

(2) /usr/share/pyshared/CommandNotFound/util.py (line 9):
_ = gettext.translation("command-not-found", fallback=True).ugettext ==>
_ = gettext.translation("command-not-found", fallback=True).lgettext

With these edits in place, command-not-found can now handle the test case:

root@u-VirtualBox:~# 我
我：找不到命令

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2011-09-07:

Thanks for the analysis, this seems solid. I need to check if the fix also works on non-CJK encodings/languages.

out of curiosity, what is your output of `locale`

Changed in command-not-found:
importance:	Undecided → Critical
summary:	- [11.10 beta1] UnicodeDecodeError crash on simplified chinese input of - fake command + [11.10 beta1] UnicodeDecodeError crash on localized input in multiple + encodings/languages

Revision history for this message

Dennis Chua (dmcvocation) wrote on 2011-09-08:

You're welcome. The command-not-found package is very useful. I'm happy to have helped; it was interesting diving into Python's facilities for multi-byte I/O and I18/L10n/gettext. I hope this solves the problem comprehensively.

Here is my locale:

u@u-VirtualBox:~$ locale
LANG=zh_CN.UTF-8
LANGUAGE=zh_CN:en_US:en
LC_CTYPE=zh_CN.UTF-8
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE=zh_CN.UTF-8
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES=zh_CN.UTF-8
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=

Zygmunt Krynicki (zyga) on 2011-09-14

Changed in command-not-found:
status:	New → In Progress
assignee:	nobody → Zygmunt Krynicki (zkrynicki)

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2011-09-14:

This bug is actually caused by invalid handling of input (sys.argv), not output. When binary string (in utf-8) is coerced with unicode strings (that are part of translated system messages) UnicodeDecode error is raised as, by default, python coerces unicode and binary strings by converting the binary string to unicode assuming ansi encoding.

A possible fix is to properly decode sys.argv arguments. I've tried this by hard-coding UTF-8 input but it would be nice to fix this in general too.

Changed in command-not-found:
status:	In Progress → Triaged

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2011-09-14:

Fix proposed for merging. Anyone interested is free to review the branch and check that it actually fixes the problem on their system.

Zygmunt Krynicki (zyga) on 2011-09-21

Changed in command-not-found:
status:	Triaged → Fix Released

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-09-22:

This bug was fixed in the package command-not-found - 0.2.44ubuntu1

---------------
command-not-found (0.2.44ubuntu1) oneiric; urgency=low

  * merged lp:~zkrynicki/command-not-found/fix-839609
    LP: #839609
  * scan.data:
    - updated to current oneiric
-- Michael Vogt <email address hidden> Tue, 20 Sep 2011 15:48:12 +0200

Changed in command-not-found (Ubuntu):
status:	Confirmed → Fix Released

Revision history for this message

Dennis Chua (dmcvocation) wrote on 2011-09-22:

screenshots.tar Edit (2.7 MiB, application/x-tar)

The screenshots of the bug reproduced using simplified chinese are attached.

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2011-09-23:

#10

Dennis Chua: could you please upgrade command-not-found and confirm that the bug no longer occurs?

Revision history for this message

Dennis Chua (dmcvocation) wrote on 2011-09-23:

#11

Reviewed this issue with Oneiric Beta2, updating command-not-found to 0.2.44ubuntu1. With the bogus Simplified Chinese test, command-not-found does not throw an exception.

However, the output text does not appear to coincide with the language encoding of the shell environment. Compare the following, Natty vs. Oneiric Beta2:

=== Natty. command-not-found 0.2.41ubuntu2 ===

u@u-VirtualBox:~$ 我
我：找不到命令

u@u-VirtualBox:~$ mgcc
未找到 'mgcc' 命令，您要输入的是否是：
命令 'mlcc' 来自于包 'mlterm-tools' (universe)
命令 'cgcc' 来自于包 'sparse' (multiverse)
命令 'gcc' 来自于包 'gcc' (main)
命令 'gcc' 来自于包 'pentium-builder' (universe)
mgcc：找不到命令
u@u-VirtualBox:~$

u@u-VirtualBox:~$ locale
LANG=zh_CN.UTF-8
LANGUAGE=zh_CN:en_US:en
LC_CTYPE="zh_CN.UTF-8"
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE="zh_CN.UTF-8"
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=

=== Oneiric Beta2. command-not-found 0.2.44ubuntu1 ===

u@u-VirtualBox:~$ 我
我: command not found

u@u-VirtualBox:~$ mgcc
No command 'mgcc' found, did you mean:
Command 'mlcc' from package 'mlterm-tools' (universe)
Command 'cgcc' from package 'sparse' (multiverse)
Command 'gcc' from package 'gcc' (main)
Command 'gcc' from package 'pentium-builder' (universe)
mgcc: command not found

Revision history for this message

Dennis Chua (dmcvocation) wrote on 2011-09-23:

#12

Clearly the changes addressed the Unicode decoding exception. Can we close this issue, and open a separate one for the mismatch in output encoding?

Revision history for this message

Reinis Zumbergs (reinis-zumbergs) wrote on 2011-09-23:

#13

This fix solves my reported problem with Latvian special characters

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntucommand-not-found package

[11.10 beta1] UnicodeDecodeError crash on localized input in multiple encodings/languages

Bug Description

Related branches

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
command-not-found package