gnome-terminal doesn't recognise C1 controls

Bug #1297051 reported by boon
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
vte3 (Ubuntu)
New
Undecided
Unassigned

Bug Description

gnome-terminal seems not to recognise the C1 control characters.

The particular character that is a problem for me is CSI. However there may be a generic issue with non-support of this whole range of characters.

This range of characters should only be recognised when the encoding is a character set that is defined to include the C1 control characters but, at a quick look, that is all of the ISO-8859-x character sets and Unicode. (C1 control characters require encoding as a 2 byte sequence when the encoding is UTF-8. As unlikely as this may be to occur in practice, UTF-8 is not inconsistent with C1 control characters.)

Part of the motivation for raising this bug report is that PuTTY seems to have declined in reliability recently and so I looked at why I am using PuTTY as opposed to gnome-terminal. Correct support of C1 control characters is one reason. This works in PuTTY. It does not appear to work in gnome-terminal. Perhaps resources would be better spent making gnome-terminal work as well as PuTTY does, rather than attempting to get PuTTY fixed.

ProblemType: Bug
DistroRelease: Ubuntu 13.10
Package: gnome-terminal 3.6.1-0ubuntu6
ProcVersionSignature: Ubuntu 3.11.0-18.32-generic 3.11.10.4
Uname: Linux 3.11.0-18-generic x86_64
NonfreeKernelModules: nvidia
ApportVersion: 2.12.5-0ubuntu2.2
Architecture: amd64
Date: Tue Mar 25 11:08:00 2014
InstallationDate: Installed on 2011-10-25 (881 days ago)
InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Release amd64 (20111012)
MarkForUpload: True
SourcePackage: gnome-terminal
UpgradeStatus: Upgraded to saucy on 2013-11-08 (136 days ago)

Revision history for this message
boon (boon-9ft1s) wrote :
Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

Could you provide concrete escape sequences (like an echo command, or a short text file to cat)?

I can't figure out how to test this. CSI is traditionally ESC + [. This is used e.g. to change the foreground color:
echo -e '\x1B[31mred\x1B[0m'

The CSI you're referring to seems to be an alternate encoding of the same functionality, starting with 0x9B (whose UTF-8 encoding is 0xC2 0x9B) instead of ESC [. That is:
echo -e '\xC2\x9B31mred\xC2\x9B0m'

but this doesn't work for me, not even in Putty or xterm. What am I doing wrong?

Note: gnome-terminal tries to emulate xterm. If you're asking for something that is supported by xterm, you have reasonable chances. If the feature is specific to putty, it's unlikely that your request will get implemented.

affects: gnome-terminal (Ubuntu) → vte3 (Ubuntu)
Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

I figured out it works in xterm and putty with ISO-8859-x charsets, just not with UTF-8.

Reported the request upstream: https://bugzilla.gnome.org/show_bug.cgi?id=730154

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

I've added a patch to the upstream bugreport.

boon, could you please test that?

Could you please also let us know what application(s) produce these kinds of escape sequences?

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

> Please excuse my ignorance but I don't know how to do that. Can you tell
> me what commands to type?
> Is it in a repository somewhere or do I have to build from source?

You need to build from source, which goes something like this:

wget ftp://ftp.gnome.org/pub/GNOME/sources/vte/0.37/vte-0.37.2.tar.xz
tar xf vte-0.37.2.tar.xz
cd vte-0.37.2
patch -p1 < [the patch filename that fixes this bug]
./configure
[if there are any errors, install the missing packages and re-run]
make
./src/vte-2.91 --encoding=latin1 # or whichever other encoding you wish to use
[and then try your application in this new window

> Pretty much any sensible application should do that (regarding CSI at
> least) because ... why transmit two characters (ESC [) instead of one
> character (0x9B)?

Pretty much all sensible applications have been using UTF-8 for almost a decade now, and in this encoding both of these escapes take up 2 bytes. And these days when you watch videos of cats online, who cares about 1 more byte? :)

So far you're the only person who filed this bug against VTE, and it doesn't even work in xterm with UTF-8, which implies that the usage of C1 is extremely uncommon.

> In my case the remote system is a mix of built-in programs (for example,
> the editor) and custom-written programs, most of which assume that a
> one-character CSI (0x9B) works.

Really out of curiosity, could you please name a few of these "built-in programs (for example, the editor)" along with the system they're running on? (I don't care about the custom-written programs that much, but I do care about those that have a potential that other people also use them.)

That being said, if the patch works for you, I'd be happy to apply it.

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

Could you please try the 2nd patch? It should fix RI, OSC and friends.

What's the terminator character used by VMS when emitting an OSC sequence? The terminator can be either a BEL ('\a', ASCII 7) or an ST, whereas the ST has two version: the 7-bit clean ESC \, and the C1 counterpart 0x9C.

With my current patch vte accepts any of these:
ESC ] ..... BEL
ESC ] ..... ESC \
0x9D ..... BEL
0x9D ..... 0x9C

but doesn't recognize mixed C0-C1 usage:
ESC ] ..... 0x9C
0x9D ..... ESC \

I hope it's good enough and noone would be stupid enough to use the two mixed with each other.

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

> in the program that I built from source as per your instructions, this feature seems to have disappeared.

This program is a test application for testing the actual terminal emulation only. If you wish to see gnome-terminal getting this feature (beware, it's a bit hairy, don't break your system, make a backup, blahblahblah, standard disclaimer, and if anything goes wrong and gnome-terminal doesn't start up, just open an xterm to fix things)...

You can either:

- Upgrade to Trusty (with Saucy the story is more complicated: there's another patch you'd have to apply), download vte-0.34.9 (the exact vte version that's shipped by Ubuntu Trusty), manually figure out how to apply the patch (it doesn't apply automatically, requires C coding knowledge), run ./configure --prefix=/usr --libexecdir=/usr/lib/libvte-2.90-9; make; sudo make install; and then quit all instances of gnome-terminal and restart it

Or (this is the one I'd recommend, it'll give you a much newer gnome-terminal):

- Take vte-0.37.2, run configure with the same flags as above, make, sudo make install (note: your old vte and vte-0.37 will co-exist on your system and your gnome-terminal will still use the old version), and then download gnome-terminal-3.13.2, ./configure --enable-distro-packaging --prefix=/usr --libexecdir=/usr/lib/gnome-terminal; make; sudo make install; quit all gnome-terminal instances. In case anything goes wrong, re-install the gnome-terminal and gnome-terminal-data Ubuntu packages.

Or (the lazy approach):

- Wait for Ubuntu V.V. (15.04) that will hopefully ship this feature.

Revision history for this message
boon (boon-9ft1s) wrote :

I've tested the second patch.

RI now works.

I tried OSC ... ST in both the unmixed C1 form and the unmixed ESC form. They both work.

Hunting around the internet, the unmixed ESC form seems more common. I am not worried that they wouldn't work in a mixed form. That would be fairly perverse. I am no expert on the spec but if the C1 forms are intended to be "equivalent" to the corresponding ESC forms then the mixed forms _ought_ to work but I am not hassling anyone to support that.

The C1 form of CSI still works i.e. no regression.

Thanks for your work !

I will wait until it appears in a Ubuntu release.

*****

I noticed a couple of other issues.

1. Resizing of the terminal by the remote host does not work by default. I was only doing basic resizing (such as would work on a real terminal).

ESC[?3l should get width 80 (that sequence ends with a lower case letter L), and
ESC[?3h should get width 132

(or the equivalent CSI forms).

Looking at the source, it is necessary to send ESC[?40h before either of the above resize sequences will work. Not sure whether that's correct behaviour but I am happy to do that so am adding this comment only in case it helps someone else.

2. Answerback does not work

The way answerback is supposed to work is that the remote host sends an ENQ character (CTRL/E i.e. character with value 0x05) and the terminal is supposed to send whatever string it has been configured to send, where that configuration should allow the user to configure to send control characters so that, at least, the user can configure to send a trailing carriage return.

(For security reasons, the remote host should not be allowed to set the answerback message or at least not by default i.e. not without the user configuring to allow that. An escape sequence exists for the purpose of setting the answerback message from the remote host but I think it is not supported by this emulator anyway.)

Looking at the source, it seems as if it recognises the ENQ but deliberately sends no response.

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

> Looking at the source, it is necessary to send ESC[?40h

This is the intended behavior, matching xterm and http://invisible-island.net/xterm/ctlseqs/ctlseqs.html

> Answerback does not work

Yup. What would be a practical use for this feature? Note that VTE is not developed along the lines of aiming for 100% coverage of VT100/102/220/whatever... features are rather added when they turn out to be widely used and required, trying to keep things simple in the mean time and getting rid of legacy rarely used hardly useful features. I bet you could find quite a few similar ones that we don't support.

Revision history for this message
boon (boon-9ft1s) wrote :

We use answerback extensively in order to identify the make and model of terminal i.e. every terminal that we have, physical or emulated, is configured to respond to ENQ with its make and model (followed by a carriage return).

In an ideal world we would not have to do this because every terminal emulator would emulate perfectly the make and model of real terminal that it claims to emulate.

Knowing the actual make and model allows us to account for quirks and limitations of individual emulations e.g. knowing that I have to send ESC[?40h for a width change.

Considering that the code bothers to recognise the ENQ - but deliberately sends no response - even if it just returned the value of an environment variable that I can set, that would be workable, although a command line argument or something out of the profile would be better.

Bear in mind that the starting point of this was ... why am I using PuTTY and what would have to change in gnome-terminal for it to replace PuTTY? Answerback works in PuTTY.

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

> In an ideal world we would not have to do this because...

I disagree. Such an approach would prevent innovation, at least, there wouldn't be a way to communicate new features towards applications. In an ideal world, you could dynamically query the terminal for features and it would respond in a well-defined (standard but extendable) format. Or at least they'd identify themselves with a hardcoded name and version number (again, using a well-defined format), similarly to browsers' user-agent.

> quirks and limitations of individual emulations e.g. knowing that I have to send ESC[?40h for a width change.

This doesn't sounds like a quirk to me. The fact that xterm does so makes me believe that whichever physical terminal implemented this feature probably implemented it this way and hence this is the standard way to do it. If you omitted \e[?40h it's a bug in your app that you should fix and emit this escape unconditionally for all terminals. I might be wrong though.

Not having a standard response for ENQ, not even a container syntax (e.g. a fixed leading escape sequence and trailing character) makes it pretty braindamaged straight away. It only works if you make up your own arbitrary in-house rules (e.g. terminate by newline), configure all your terminals and change all your apps, something that probably nobody in the world is willing to do except for you. There's no way for an app to know if the response was indeed sent as a response, or (maybe just some of its characters) typed in by the user. Having to configure the terminals is already a wrong approach anyways, it's a thing that should work out of the box without configuring. Having to change all the applications running inside terminals to behave accordingly, maintaining them consistently (and duplicating relevant code in all of them - or are you maintaining your own screen drawing library?) sounds like a nightmare.

There are similar already existing methods for getting the actual terminal version and capabilities - note that all of them suck, but at least they are used widely. There's the TERM variable and the corresponding underlying termcap/terminfo database and common screen libraries (ncurses, slang) using these; there's the \e[c and \e[>c escapes that are recognized more commonly and have a well-defined response syntax+semantics, there's $VTE_VERSION (well, until you log in to a remote host). Plus, you can always just safely use the common subset of escape characters that are understood by all terminals.

Many modern terminal emulators (e.g. konsole, terminology, st) don't support setting an ENQ response either. If you rely on this feature, it sounds to me that you're using a really odd nonstandard way to solve a problem.

Revision history for this message
boon (boon-9ft1s) wrote :

http://www.vt100.net/ is a good source of information because it uses the original manuals for the real terminals.

http://www.vt100.net/docs/vt510-rm/DA1 defines how \e[c works but it isn't adequate to express the capability of a terminal. In particular it doesn't cater for emulations that are incomplete. There are nowhere near enough attribute values defined to specify what might be missing from an emulation. The specification is unclear as to whether claiming a basic conformance level means that all mandatory features at that level are present. The specification doesn't say what those mandatory features might be. The idea is right.

It's true that we could have defined the answerback response to have a syntax that basically matches the response to \e[c but I think we would need to define the semantics ourselves. That's academic though as gnome-terminal doesn't support an answerback.

I could find no trace on the above-mentioned web site, or any other, of \e[?40h being a valid command in a real terminal. I think it might be an xterm invention.

The TERM variable is problematic with real terminals.

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

> It's true that we could have defined the answerback response to have a syntax that basically matches the response to \e[c ...

That's a crucial issue here. If all terminals responded in a well-defined syntax (i.e. <some_unique_prefix>terminalname<terminator>) then I'd happily move ahead and hardcode "VTE" or even "VTE <versionnumber>". But that's not the case, even putty defaults to emitting "PUTTY" which you apparently had to change to "PUTTY^M" and it still has the problem that you can't distinguish this from a string typed by the user, or let's say if the user quickly pressed the letter 'x' you might misbelieve that the terminal type is "xPUTTY" and so ugly heuristics begin... In other words, in order to make the answerback useful, the answerback *has* to be configurable because of its broken design, and sysadmins need to do a lot of configuration within a local system to get something useful out of this.

In gnome-terminal (and generally in Gnome) the approach seems to be just the bare minimum of absolutely necessary config options, and preferably no hidden settings. Adding support for the answerback would require an API between VTE and Gnome-terminal, and a preference setting that UI folks probably wouldn't approve. We've already removed more important and more popular options. This is why I don't think this feature will ever be implemented.

Could you go with ^[[>c please, and treat version number of 4 digits around 3600 or so as VTE? Or create a trivial one-line patch for your VTE?

> I could find no trace on the above-mentioned web site, or any other, of \e[?40h being a valid command in a real terminal. I think it might be an xterm invention.

VTE treats xterm as the primary reference. If you want something to be changed, you'd have to prove explicitly that xterm is doing it wrong.

Revision history for this message
boon (boon-9ft1s) wrote :

Coming first to \e[?40h ...

a) I grepped the VT2xx manual.

b) I browsed the VT5xx manual.

c) I tested it on a real VT5xx terminal.

d) I happened to find it documented in "man 5 dtterm" on a particular flavour of Unix. (dtterm was/is an X-windows terminal emulator.)

I am highly confident that the above escape sequence is an invention of emulators and never existed with the function ascribed to it in a real terminal. (That doesn't preclude the possibility that the escape sequence does have some other, undocumented, function in a real terminal, which function was not evident to me. Hence it is not ideal to send the escape sequence blindly to all terminals.)

I am not after a change to its behaviour. In fact the only thing wrong with it is that it defaults the wrong way, causing the emulated terminal to fail to be compatible with the real thing. However I am not even after fixing the default. It is too late. Changing the default now would be broken just as having the default wrong in the first place was broken.

Revision history for this message
boon (boon-9ft1s) wrote :

Here's an alternative suggestion for identifying the terminal implementation: Implement the DA3 sequence and implement the extension proposed below.

DA3 is supposed to elicit a globally unique, persistent terminal identifier (its serial number if you like), as documented here:

http://www.vt100.net/docs/vt510-rm/DA3

However that makes little sense in a world where most(?) terminals are emulated and does not apparently even work on a real VT5xx (just returns 00000000 as the id). So the following extension is proposed.

Respond to DA3 with DCS!|00000000;swidST where swid identifies the software product name and version. The syntax of swid could be borrowed from RFC 2616 (HTTP 1.1) Section 3.8. I would prefer that product names are explicitly qualified by (preceded by) the Java-style label-reversed domain name. Hence a company called PCsoft that registers pcsoft.com and sells a product called SuperTerm might send a product name and version of com.pcsoft.SuperTerm/1.0

(DA3 is seemingly not implemented at all in xterm so there is minimal scope for breakage in that regard.)

Product names should be compared without regard to case.

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

> Coming first to \e[?40h ...

I'm really not an expert on the terminal emulation topic (especially in these rarely used areas that you're interested in), don't feel comfortable changing anything. (In my personal opinion, no matter how physical terminals worked a couple of decades ago, on a modern windowing environment the only way a terminal's size could be changed should be the user's direct resizing request just as you'd resize your webbrowser or whatever other graphical app. Applications running inside the terminal should not be able to resize/move/iconize/raise the window. Again, this is my private personal opinion.)

Vte developers seem to take xterm as the primary reference and emulate its most common feature, and diverge only if there's a good reason. Could you please ask xterm's author to change the behavior (or maybe he has a good insight on what that ?40h is required)? If xterm fixes this, I'm happy to adjust gnome-terminal.

> DA3 is supposed to elicit a globally unique

I don't understand you coming up with the reversed domain name idea. The standard clearly says it's 4 digits encoded in hex, 1 identifying the terminal "manufacturing site" (that is, the spec hardcodes that there'll never be more than 256 of them, hardware manufacturers and software emulators altogether; actually if the same company has more firms then they all should have separate IDs according to the spec, how is it any of its business? - moreover, not a single word about how these numbers should be allocated), the other three is the unique serial number (hardcoding that no manufacturer will ever create more than 16M terminals - not an unreasonable limit for hardware units, but unclear whether the serial number should be unique for all instances in case of a software emulator, and then the limit is way too low). In my personal opinion, this is another ancient and crappy standard.

Revision history for this message
boon (boon-9ft1s) wrote :

>"Applications running inside the terminal should not be able to resize/move/iconize/raise the window."

That is a completely valid choice. However \e[?40h fails to implement that. All it does is force the host to send one extra escape sequence (if it knows that it needs to send it). As such therefore I don't really see the point of \e[?40h. However it would have been harmless if it had been "high" by default i.e. defaulted to allowing resize. As previously stated, noone should change this now. A change now would just cause even more compatibility problems.

The logical and simple way of implementing that choice is as a configuration choice within the terminal emulator. PuTTY does that. Under Terminal/Features there is an option "disable remote-controlled terminal resizing". If the *user* sets that option then it behaves as you want. The host really can't resize the terminal no matter what escape sequences it sends.

However in our environment we would never want that option. Most screens get by with a 24x80 window but some screens *need* 132 columns and some screens *need* more than 24 rows, so when the user goes into one of those screens the host will resize the window to the required size and when the user exits such a screen, the host will resize back. This is somewhat visually distracting but the alternatives - within the limitations of a character-cell terminal interface and of our application - are worse still or not practical for us.

I have no problem if someone wants to add the above configuration option, as long as it is off by default. However I have no use for such a configuration option. So I won't be requesting it.

In fact, if anything, we have the opposite problem i.e. fat-fingered users who accidentally resize their terminal window and then raise support calls because the application doesn't work properly. :-)

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

I was actually wondering about the same... App-controled resized kinda only makes sense with another setting that would inhibit a window manager initiated resize.

Gnome-terminal is actually vte (the real terminal emulation) + gnome-terminal (only the UI menus, tabs and such). If you don't care about the menu system that much, you can easily write a terminal emulator in c/python/vala/whatever around vte that doesn't allow WM resizes.

Btw I don't think changing the default of ?40 or removing that feature would break too many things. I don't know.

Revision history for this message
boon (boon-9ft1s) wrote :

Regarding DA3, as you point out, the spec is woefully inadequate for today's world. It also doesn't seem to "work" for the real terminal that I tested. It is also not currently implemented at all by xterm. That is why I have no qualms about proposing to extend the DA3 response sequence for other related purposes.

Ignoring the prefix and suffix strings (including ignoring the leading and trailing escape sequences), the real terminal that I tested transmitted "00000000" as its serial number. I am proposing that an extra semi-colon separated parameter be transmitted when terminal emulators respond to DA3. So a terminal emulator would transmit "00000000;swid" where "swid" contains the software product name and the version. I proposed a specific syntax and convention for "swid" that would ensure uniqueness across software manufacturers (with syntax borrowed from HTTP).

The proposal to use DA3 in this way would address the objections that you had to the way I am currently using ENQ.

A reasonably-implemented receiver of such a response should be able to handle the extra parameter e.g. ignore it.

This is the way DA1 works. The terminal (or terminal emulator) can send in response to DA1 a long string of semi-colon separated parameters and the host should ignore parameters that it doesn't understand or that it is not interested in. The trailing escape sequence for DA3 / the end of the escape sequence for DA1 ensures that the receiver of the response can find the end of the response even if it does not expect all the parameters. In the unlikely event that any host is actually using DA3 currently with xterm and just looks at the first 8 characters of the response, expecting to find the serial number as hex digits, then that is what it will find (a useless value but no more useless than the real terminal that I tested).

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

The response to \e[>c should contain the version number. Well, in case of xterm it contains that. Now, some emulators (e.g. vte) put their own version number there, while some others (e.g. konsole) put the version of xterm it claims to be compatible with.

Imagine there'd be a brand new escape sequence, to which the response should (according to the specification) be the name of the terminal emulator and the version number (e.g. "xterm-310", "konsole-2.14" etc.). And some pieces of software begin to depend on this (if no software depended on this then what would be the point?).

Now a new terminal emulator (let's call it "asdf") comes along and figures out that certain apps just don't work there because they're aware of xterm and konsole, but not of asdf. So they decide for practical reasons to go against the standard and report "xterm-310" instead of "asdf-0.1" because they're compatible with xterm-310 (who knows, maybe even with newer releases, maybe not), and they want that certain app to work. But "asdf-0.2" also adds a cool new feature that xterm doesn't have. What to do then, how to advertise it? Shall they report "xterm-310 (er, no, wait, actually asdf-0.2)"? Where will this end?

Could this be made any simpler than the complete nightmare with browser User-Agent strings? Is it worth starting at all?

I don't know and I'm not the one to make a decision. If xterm comes up with something that looks promising, I'll port that to gnome-terminal. That's all I can do, apart for speaking up against solutions that I don't see viable.

Given that so far you're the only one I'm aware requesting this feature, I'd say that the simplest is if you solve it your way for yourself, by patching vte, or relying on the version number being 3600-ish, and try to solve your problem without relying on answerback - plenty of other people managed to do this.

Revision history for this message
boon (boon-9ft1s) wrote :

I don't disagree that sending the software product name and version invites some problems. However coming back to what I wrote initially - "Knowing the actual make and model allows us to account for quirks and limitations of individual emulations" - is the crux of it. I don't actually care about cool new features. It is the missing or broken features that cause me a problem.

I will wait until the changes that have been made (thanks) make it into the mainline release and then I will probably try to patch ENQ support in, particularly since the code is all there except that it sends a response of length 0.

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

I've added C1 support to git master. It'll hopefully make it into Ubuntu 15.10 W W.

Revision history for this message
boon (boon-9ft1s) wrote :

I have upgraded one Linux computer to Wily Werewolf, so I took the opportunity of retesting this.

I can confirm that this now works. So this bug can be closed.

As suggested above, it is unlikely that any system that expects to be talking to real terminals will use UTF-8. Hence it is most likely necessary to set the encoding in gnome-terminal to another one (e.g. ISO 8859-1). Strangely, that does not appear to be possible to do in a gnome-terminal profile via the GUI or via the gnome-terminal command line. However it does work if you edit it in to the profile using something external to gnome-terminal. So, in case it helps someone else, to set the encoding in a gnome-terminal profile, you will probably want to use gconf-editor or gconftool

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

It is possible to set the encoding, temporarily (for the given tab only) under Terminal, or permanently as the new default for a profile under Profile Preferences -> Compatibility.

Revision history for this message
Georg Sauthoff (g-sauthoff) wrote :

@boon - I'm curious, what application of yours uses C1 controls in UTF-8 encoded text?

And for which features does it use such sequences?

I'm asking because I couldn't find any application so far that actually uses C1 controls.

Revision history for this message
boon (boon-9ft1s) wrote :

With the disclaimer that the original report was from 9 years ago(!) ... the application does *not* use C1 controls *with* UTF-8 text. That was just an observation on my part regarding how the character set and encoding would impact on any attempt to change the implementation to recognise C1 controls. The application has no support for Unicode characters in general and hence no need for UTF-8. Everything is effectively ISO-8859-1 i.e. clean traditional vanilla 8 bit data.

gnome-terminal is used here to access a "mainframe" host on the network that runs an in-house application. This host does not run Linux (shocking, I know). Hence I can assure you that it uses C1 controls but you will not be able to verify that. :-) This application would be described as "legacy" but 9 years down the track and it is still lingering on.

*Some* of the relevant escape sequences or other controls that exist in both a two-byte 7 bit form and a one-byte 8 bit (C1) form - as far as traditional 8 bit data goes - are mentioned above.

At a quick look, some of the relevant sequences that are actually used in the application:

RI = Reverse Index - used to move the cursor up, or scroll the scrolling region if the cursor is already at the top line of the scrolling region

CSI = Control Sequence Introducer - used for zillions of terminal escape sequences (and this was the main one that was causing a problem)

OSC = Operating System Command - has several functions but used here to set the window title and icon title

ST = String Terminator - used in several functions but used in setting the window title and the icon title and, more interestingly, used in sixel mode when drawing barcodes :-0

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :
Download full text (3.6 KiB)

I'm quite curious here, because I added C1 (even in UTF-8) support to VTE (GNOME Terminal) because of this very bugreport. Later I think I regretted that decision, but it's likely to stay because the main developer wants to keep it. See https://bugzilla.gnome.org/show_bug.cgi?id=730154 and https://gitlab.gnome.org/GNOME/vte/-/issues/209.

So...

@boon, you have to have your terminal configured to use ISO-8859-1 or so, setting it to UTF-8 cannot work with your application. Is this correct?

And if your terminal removed support for non-UTF-8, which might happen sooner or later with VTE, you'll be screwed at those raw C1 bytes. Right? You'd have to look for a different terminal emulator, or some other workaround.

(Just for curiosity: does your app output any printable 0xA0..0xFF ISO-8859-1 accented letters and other symbols that need to be properly handled, or is it solely the C1 ones?)

So suppose VTE / GNOME Terminal removes support for encodings other than UTF-8, which breaks your app. And suppose that you'd prefer to keep using VTE rather than switching to e.g. Xterm.

---

My first thought: The obvious well-known workaround is to use "luit", a tool that's supposed to convert data between the terminal's UTF-8 and the app's ISO-8859-1 (or similar) encoding.

However, it seems to me that if an application emits a raw C1 byte in ISO-8859-1, luit leaves that as that raw byte (hence producing invalid UTF-8) rather than converting it to the corresponding U+0080..U+009F character. The standard charset conversion method iconv() does this conversion correctly, and in Unicode the characters U+0080..U+009F are defined to be the same as the bytes 0x80..0x9F of ISO-8859-1. That is, it seems to me (without having checked its source) that "luit" explicitly, deliberately breaks these characters, rather than letting them through properly converted to UTF-8 and letting the terminal emulator decide what to do with them.

Of course you might patch it to fix it, and if you decide to do so, you might as well add C1 -> C0 conversion to it, I guess it might not be that hard.

---

An alternate approach is to use "screen". Start it up, press ^A and then type ":encoding iso-8859-1". This switches the inside encoding, but leaves the outside encoding (towards the graphical terminal) at UTF-8. It understands C1 controls (e.g. printf 'normal \x9b1m bold' works as expected), and it converts them to C0 towards the graphical emulator.

(This latter claim I can confirm by running a "script" outside of "screen", but it also makes sense: "screen" cannot pass-thru these controls; in order to implement scrolling, detached operation, reattaching etc. it has to imagine what state its charcells are (e.g. bold) and has to implement that look in the outer terminal, there's no room here for remembering and would be no point in remembering what control operation resulted in that bold cell being there in its memory.)

Maybe "tmux" also supports this kind of conversion, I don't know.

---

Currently there's 1 person I'm aware of who uses C1 support of VTE, that's you. You use it with non-UTF-8, with UTF-8 it's useless for you. If VTE dropped support for non-UTF-8, the number of C1 users...

Read more...

Revision history for this message
boon (boon-9ft1s) wrote :

>you have to have your terminal configured to use ISO-8859-1 or so, setting it to UTF-8 cannot work with your application. Is this correct?

That is correct. In respect of Preferences / Compatibility / Encoding ...

* set to "Unicode - UTF8", it malfunctions (the C1 controls don't work, hence layout is completely stuffed, but ASCII comes out correctly - both as expected)

* set to "Western - ISO-8859-1", it works.

>if your terminal removed support for non-UTF-8, which might happen sooner or later with VTE, you'll be screwed at those raw C1 bytes

Let me say up front ... if it doesn't support ISO-8859-1 any more then it aint a terminal emulator!! So you would need to update the man page. ;-)

Also, I hope that any such change would be flagged **well in advance**.

As to whether I would be screwed ... there are two answers to that

a) I can change the application either to insert a translation layer to UTF-8 (although this would be mostly pointless) or just to use the equivalent two byte sequences instead of the C1 controls

b) What I can't change is software on which the application depends (for which I don't have source and can't change e.g. the underlying operating system)

So ... yes ... approximately screwed.

>does your app output any printable 0xA0..0xFF ISO-8859-1

With the understanding that we are talking hundreds of thousands of lines of code ... to the best of my knowledge, it does not use any such characters, just the C1 controls as far as bytes with the top bit set go.

Another consideration is ... what terminal model the emulator "advertises" compatibility with. I couldn't see any gnome-terminal setting to downgrade that but (from memory)

* if you advertise compatibility with VT200 series then you "must" support C1 controls
* if you advertise compatibility with VT100 series then you "must not" support C1 controls

In reality, terminal emulators have traditionally taken a more laissez faire approach of just supporting whatever features they want to support. ;-)

(which is why I am in the position that PuTTY works about 80% and gnome-terminal works about 80%, but their defects are not the same, and I have no fully working solution)

Whether changing the compatibility level helps I don't know. That is to say, if it advertised VT100 and actually implemented that spec *rigidly*, I don't know whether I would lose any features on which the application depends and which couldn't feasibly be worked around. I have the feeling that the loss of the VT200-specific keys would be painful, for example.

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :
Download full text (7.4 KiB)

@boon Thanks for your responses! Some go quite beyond the original topic, but I don't mind having an interesting discussion.

> Let me say up front ... if it doesn't support ISO-8859-1 any more then it aint a terminal emulator!! So you would need to update the man page. ;-)

Updating the man page will be the least of our worries... since we don't have one :-) (Some distros ship one, but that's a downstream decision.)

I've recently come across the opinion that Xterm, VTE and alike are _not_ "terminal emulators". A "terminal emulator" would need to set up an infrastructure on which the original DEC software runs, just like, let's say, ZX Spectrum or Commodore emulators, or (sort of) VMware work. Instead, analogously to WINE (Wine Is Not an Emulator), these are "terminals", more specifically "software terminals" rather than those old "hardware terminals". But not "emulators".

Terminology put aside... [well, not quite, (pun intended) there's a "software terminal" called Terminology which the next paragraph happens to apply to.]

There are many-many new players in the world of terminal emul... I mean software terminals. And surprisingly many of them don't support anything other than UTF-8. Which, in my opinion, is a welcomed decision.

The world is changing. Maybe some see a beauty in maintaining faithful compatibility with some old hardware (now 30-45-ish year old), I see a beauty in making progress, and switching to Unicode is definitely one. Maybe there's a software terminal that will maintain DEC VT102 or whatever compatibility for hundreds or thousands of years, VTE is not that.

And by dropping non-UTF-8 support, VTE will definitely remain a software terminal. Or as people tend to (perhaps incorrectly) call it, a terminal emulator.

> As to whether I would be screwed ... there are two answers to that
>
> a) I can change the application [...]
>
> b) What I can't change is software on which the application depends [...]

But does (b) emit C1 controls? Because if not then you should change your application; in fact, you should have done it 9.5 years ago when you filed this bugreport. If yes then I see your problem.

> Also, I hope that any such change would be flagged **well in advance**.

I'm not sure how exactly you wish to see such a flagging. The corresponding 'vte_terminal_set_encoding()' method has been marked as deprecated for 5 years now, it this good enough? Most likely the story will continue by the removal of the Preferences UI's Encoding dropdown box, while keeping the under-the-hood setting available in dconf for another few years. Consider that you have been warned well in advance. :)

> Another consideration is ... what terminal model the emulator "advertises" compatibility with

Advertising compatibility is a terrible mess.

Nowadays the primary source is the TERM variable, which is so limited in functionality that VTE has to choose between two bad solutions: piggyback xterm, or go its own route. A good system would allow the terminal itself to define its capabilities, not some 3rd party database (terminfo) maintained independenty, with potentially years of difference between the actual software and the description (and ...

Read more...

Revision history for this message
boon (boon-9ft1s) wrote :

>Updating the man page will be the least of our worries... since we don't have one

`man gnome-terminal`

`NAME`
` gnome-terminal - A terminal emulator for GNOME`
:
:

So apparently *someone* thinks it's a terminal emulator.

>does (b) emit C1 controls?

Yes.

>I'm not sure how exactly you wish to see such a flagging.

A version of gnome-terminal is in the future produced that issues a warning every time a profile is used that uses an encoding that is intended for future removal and specifying in that warning some kind of time / version for when that removal will occur. That would be ideal.

>I see a beauty in making progress, and switching to Unicode is definitely one.

The crux of it would seem to be

>to make the code simpler, cleaner, smaller, faster

because otherwise my need for ISO-8859-1 in no way prevents someone else using Unicode.

The actual problem is UTF-8 though, not Unicode (since Unicode includes the entirety of ISO-8859-1 as its first 256 code points). I understand that that does not present an obvious or easy solution.

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote (last edit ):

> ` gnome-terminal - A terminal emulator for GNOME`

I stand corrected, recent versions of upstream gnome-terminal indeed ship a man page. I thought it didn't (when I used to actively work on it it didn't).

> So apparently *someone* thinks it's a terminal emulator.

It's unclear whether calling it "terminal emulator" is exact, strictly correct terminology or not. But in practice everyone calls these kinds of software as that.

I firmly disagree with the claim that whether it is a "terminal emulator" or not would depend on its ability to handle the long-obsolete ISO-8859-1 encoding (or even just the long-obsolete single-byte C1 subset thereof).

> issues a warning [...] That would be ideal.

Please don't count on this to happen.

Issuing a warning every time would be disrupting, presumably there'd need to be a way for the user to silence it; it all comes along with a UI as well as a backing storage. Plus a decent amount of planning, taking into account the release cycles of popular distributions and whatnot, to make sure that users don't skip the versions that present this warning before the feature is gone, also taking into account how frequently users tend to upgrade their systems (e.g. if a distro offers 10 years of support then does it mean we have to have a window of at least 10 years with that warning, so that all users of supported distros will definitely see the warning??). The project is heavily understaffed, it has 1 regular part-time volunteer developer and a few occasional contributors. Such careful planning is only feasible from significantly better staffed projects. This amount of work is much better spent elsewhere on VTE or GNOME Terminal, on features that will reach millions of users rather than just a few (let alone those few who have failed to catch up with the world's changes for 15+ years).

GTK4-based GNOME Terminal is likely arriving soon, probably either with the next big release or the subsequent one. I'd like that one to remove the UI setting of the encoding but keep the under-the-hood setting. If this happens, the amount of user requests/complaints will help us decide how long to keep the hidden feature before completely dropping it, and whether to keep it a bit longer in VTE or not. It might also depend on circumstances like how much this obsolete encoding support makes the development of other features harder, or when do we need to make an API-incompatible change for other reasons. It won't happen tomorrow, you surely have at least a year or two (but probably much-much more) to prepare.

The time we already spent on this discussion would probably be enough to modify "luit" to perform the proper C1 conversion from ISO-8859-1 to UTF-8, or to convert them to C0, either of which would guarantee that you could keep using VTE even after it drops its non-UTF-8 support.

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

(luit "patch" below)

---

To summarize where we are so far:

- Given is a legacy program that we cannot modify which emits 7-bit ASCII plus the 8-bit C1 control characters (raw 0x80..0x9F bytes). (Alternatively, if you wish: it uses ISO-8859-1, including the C1 controls.)

- We need to access this software from a modern terminal emulator. The bug's reporter would prefer to use GNOME Terminal.

- Not every terminal emulator supports C1 control characters. Some do, some do not, some only recognize them in legacy (non-UTF-8) encodings. (The raw 0x80..0x9F bytes are incompatible with the nature of UTF-8, so when combining C1 with UTF-8, I'm talking about the UTF-8 representation of U+0080..U+009F.)

- Not every terminal supports the legacy ISO-8859-1 and similar encodings, quite a few only supports UTF-8 nowadays.

- GNOME Terminal (VTE) supports C1, even in UTF-8. This is going to stay, the main developer is firm on this. He clearly expressed this in https://gitlab.gnome.org/GNOME/vte/-/issues/209, and also in a recent private email to me.

- Right now GNOME Terminal (VTE) supports plenty of legacy encodings, including ISO-8859-1. This might change, maybe one day only UTF-8 will remain.

- Right now GNOME Terminal (VTE) suits OP's needs here. If one day it drops support for non-UTF-8, it won't work out of the box for this use case anymore.

- luit, if an (app side) ISO-8859-1 -> (terminal side) UTF-8 conversion is requested, does not convert the 0x80..0x9F C1 bytes into the UTF-8 representation of U+0080..U+009F. Rather, it keeps them as these raw bytes, thereby producing invalid UTF-8. Therefore, accessing the legacy software from within a "luit -encoding iso-8859-1" session inside an UTF-8 terminal would not solve the issue.

---

Here's the new bit:

I've downloaded luit version 20230201 from https://invisible-island.net/luit/ with the desire to "fix" it to convert ISO-8859-1 0x80..0x9F into UTF-8 U+0080..U+009F. I've just barely tested it, but at first glimpse it seems I managed to do it.

charset.c line 71 looks like:

    {"ISO 8859-1", T_96, 'A', "iso8859-1", 0x80, 0, 0},

All you need to do is replace that T_96 with T_128.

Compile it the usual way (./configure && make) and off to the races you go. "luit -encoding iso-8859-1" now converts incoming 1-byte ISO-8859-1 C1 into their 2-byte UTF-8 counterpart.

If a terminal only supports the modern UTF-8 encoding, and recognizes UTF-8 C1 escape sequences, then this thin layer should make that legacy application display correctly.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.