Comment 5 for bug 367369

Revision history for this message
ZelinskiyIS (ivze) wrote : Re: bash utf-8 characters issue

Greetings, everyone!

I am experiencing the same problem when using Russian letters (such as Абвгд... :)) in bash.
I am the creator of a separate bug report, quite an old one, which is about the same problem. I have linked the report to this one by marking my bug as a duplicate of this.

I have performed some investigantion into the problem. Here it comes, copypasted from bug #305554 :
#Copypaste#############################################################
I have performed some research about the unwanted editing behaviour and found out that the bug is localised in "bash" interpreter (GNU bash, version 3.2.48(1)-release).

I found that in case of bash, such behaviour is triggered by using Multibyte UTF-8 symbols in PS1 variable in a special way. Default ubuntu (tested for 9.04) .bashrc script works so that when bash is launched from within gnome-terminal, xterm or some other software terminals (bash looks at TERM variable), PS1 is set to \[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\u@\h:\w\$. The long construction includes a sequence \e]0;some text\a. The sequence is interpreted by software terminal as a command to change the window title to "some text". In a situation when current working directory has multibyte symbols in path, some multibyte characters happen to appear between \e]0; and \a. I performed some tests by means of manually setting PS1 values and found out that messy command line edting is triggered by appearance of any multibyte characters inside \e]0 and \a. Any characters outside those do not trigger the messy behaviour.

I have attached data, captured by "script" utility, that shows the bug. The utility reproduces data, that appeared on my terminal with timing preserved. To play the data, use a terminal that supports title changing (gnome-terminal and xterm must work), unpack the archive, ensure that terminal size is 80x24 and run "scriptreplay times typescript 2". In particular, you will see that having three two-byte unicode symbols betwee \e]0; and \a causes a three-position miss when editing a long line with the use of backspace. The test shows only a part of messy editing behavior. Some tests, described above in this bug report, have not been performed.

I suppose that this bug is in "readline" library. However, because readline is statically compiled into bash, I think that it should be treated as a bash bug.
#End of copypaste#############################################################