Unicode characters are handled uncorrectly

Bug #305554 reported by ZelinskiyIS
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
bash (Ubuntu)
Incomplete
Low
Unassigned

Bug Description

Binary package hint: gnome-terminal

Steps to reproduce:

1. Start gnome-terminal.
2. cd /tmp
3. mkdir "some unicode directory name of length N" (e.g. mkdir "αβγ", N=3)
4. cd "this directory"
5. Type some command, that don't fit in one terminal line, but only fits in two: (e.g.
echo "abcd"|bzip2|bunzip2|bzip2|bunzip2|bzip2|bunzip2|bzip2|bunzip2), execute it)
6. press 'up arrow' key to bring the command back again.
7. press 'left arrow' and wait for the cursor to appear in the middle of the first line.
8. Now either:
Press HOME - the cursor will fall so many symbol positions beyong the line beginning as many Grecian Unicode letters there are: N.
Or:
Try to set the cursor at the end of first line by pressing right arrow - you will not be able to locate the last N characters.
If you now try to edit the comand, there will be some mess.

The same has been tested with Cyrrilics Unicode symbols.
The effect with more exotic symbols is even more powerful: try "ॣय़" as directory name.

It seems that Gnome-terminal uses some unicode-unaware function to calculate strings length.

The issue is rather unpleasant, because localized Ubuntu versions use many directories with Unicode leters in names. E.G. Desktop is "Рабочий стол" in Russian localization.
**********************************
Description: Ubuntu 8.10
Release: 8.10
x86_64 distributive

Gnome terminal version from "apt-cache policy gnome-terminal"
2.24.1.1-0ubuntu1

ProblemType: Bug
Architecture: amd64
DistroRelease: Ubuntu 8.10
ExecutablePath: /usr/bin/gnome-terminal
NonfreeKernelModules: ath_hal fglrx
Package: gnome-terminal 2.24.1.1-0ubuntu1
ProcEnviron:
 PATH=/home/username/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
 LANG=ru_RU.UTF-8
 SHELL=/bin/bash
SourcePackage: gnome-terminal
Uname: Linux 2.6.27-9-generic x86_64

Tags: apport-bug
Revision history for this message
ZelinskiyIS (ivze) wrote :
Revision history for this message
Pedro Villavicencio (pedro) wrote :

do you get the same with terminator (terminal emulator)? this is probably a vte issue rather than a gnome-terminal one.

Changed in gnome-terminal:
assignee: nobody → desktop-bugs
importance: Undecided → Low
status: New → Incomplete
Revision history for this message
ZelinskiyIS (ivze) wrote :

Yes, this is confirmed with terminator.
And with Konsole from KDE.

Revision history for this message
ZelinskiyIS (ivze) wrote :

So, this seems to be a bug of some underlying system. Shall I post a bug report somewhere else?
Or it hass allready been done?

Revision history for this message
tshirtman (gabriel-pettier) wrote :

I'm affected by this to, but not in a tty, (but the characters are not displayed correctly in this case).

I was thinking of a readline bug but I'm not sure.

Changed in cl-readline (Ubuntu):
assignee: Ubuntu Desktop Bugs (desktop-bugs) → nobody
Revision history for this message
ZelinskiyIS (ivze) wrote :

I have performed some research about the unwanted editing behaviour and found out that the bug is localised in "bash" interpreter (GNU bash, version 3.2.48(1)-release).

I found that in case of bash, such behaviour is triggered by using Multibyte UTF-8 symbols in PS1 variable in a special way. Default ubuntu (tested for 9.04) .bashrc script works so that when bash is launched from within gnome-terminal, xterm or some other software terminals (bash looks at TERM variable), PS1 is set to \[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\u@\h:\w\$. The long construction includes a sequence \e]0;some text\a. The sequence is interpreted by software terminal as a command to change the window title to "some text". In a situation when current working directory has multibyte symbols in path, some multibyte characters happen to appear between \e]0; and \a. I performed some tests by means of manually setting PS1 values and found out that messy command line edting is triggered by appearance of any multibyte characters inside \e]0 and \a. Any characters outside those do not trigger the messy behaviour.

I have attached data, captured by "script" utility, that shows the bug. The utility reproduces data, that appeared on my terminal with timing preserved. To play the data, use a terminal that supports title changing (gnome-terminal and xterm must work), unpack the archive, ensure that terminal size is 80x24 and run "scriptreplay times typescript 2". In particular, you will see that having three two-byte unicode symbols betwee \e]0; and \a causes a three-position miss when editing a long line with the use of backspace. The test shows only a part of messy editing behavior. Some tests, described above in this bug report, have not been performed.

I suppose that this bug is in "readline" library. However, because readline is statically compiled into bash, I think that it should be treated as a bash bug.

Revision history for this message
ZelinskiyIS (ivze) wrote :

Readline library is built in bash, so this is a bash bug.

affects: cl-readline (Ubuntu) → bash (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.