Comment 35 for bug 1297051

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

(luit "patch" below)

---

To summarize where we are so far:

- Given is a legacy program that we cannot modify which emits 7-bit ASCII plus the 8-bit C1 control characters (raw 0x80..0x9F bytes). (Alternatively, if you wish: it uses ISO-8859-1, including the C1 controls.)

- We need to access this software from a modern terminal emulator. The bug's reporter would prefer to use GNOME Terminal.

- Not every terminal emulator supports C1 control characters. Some do, some do not, some only recognize them in legacy (non-UTF-8) encodings. (The raw 0x80..0x9F bytes are incompatible with the nature of UTF-8, so when combining C1 with UTF-8, I'm talking about the UTF-8 representation of U+0080..U+009F.)

- Not every terminal supports the legacy ISO-8859-1 and similar encodings, quite a few only supports UTF-8 nowadays.

- GNOME Terminal (VTE) supports C1, even in UTF-8. This is going to stay, the main developer is firm on this. He clearly expressed this in https://gitlab.gnome.org/GNOME/vte/-/issues/209, and also in a recent private email to me.

- Right now GNOME Terminal (VTE) supports plenty of legacy encodings, including ISO-8859-1. This might change, maybe one day only UTF-8 will remain.

- Right now GNOME Terminal (VTE) suits OP's needs here. If one day it drops support for non-UTF-8, it won't work out of the box for this use case anymore.

- luit, if an (app side) ISO-8859-1 -> (terminal side) UTF-8 conversion is requested, does not convert the 0x80..0x9F C1 bytes into the UTF-8 representation of U+0080..U+009F. Rather, it keeps them as these raw bytes, thereby producing invalid UTF-8. Therefore, accessing the legacy software from within a "luit -encoding iso-8859-1" session inside an UTF-8 terminal would not solve the issue.

---

Here's the new bit:

I've downloaded luit version 20230201 from https://invisible-island.net/luit/ with the desire to "fix" it to convert ISO-8859-1 0x80..0x9F into UTF-8 U+0080..U+009F. I've just barely tested it, but at first glimpse it seems I managed to do it.

charset.c line 71 looks like:

    {"ISO 8859-1", T_96, 'A', "iso8859-1", 0x80, 0, 0},

All you need to do is replace that T_96 with T_128.

Compile it the usual way (./configure && make) and off to the races you go. "luit -encoding iso-8859-1" now converts incoming 1-byte ISO-8859-1 C1 into their 2-byte UTF-8 counterpart.

If a terminal only supports the modern UTF-8 encoding, and recognizes UTF-8 C1 escape sequences, then this thin layer should make that legacy application display correctly.