Comment 11 for bug 394570

Revision history for this message
In , Colin Watson (cjwatson) wrote : Re: IUTF8 pseudo-terminal mode

On Mon, May 12, 2008 at 10:07:47AM +0200, Vincent Lefevre wrote:
> On 2008-05-11 21:00:55 -0500, Nicolas Williams wrote:
> > On Mon, May 12, 2008 at 02:00:56AM +0200, Vincent Lefevre wrote:
> > > default one at the system level. Perhaps you mean that the SSH client
> > > should propagate the locale (more precisely, the charmap) to the
> >
> > SunSSH 1.1 does (by having the client set per-channel LANG/LC_*
> > environment variables).

OpenSSH, as configured in Debian, does this too.

> I meant in a way that always works in practice.
>
> > It's less than perfect: the client has no idea what a client-side
> > locale maps to on the server side.
>
> Yes, that's the problem: it's not always possible to rebuild a correct
> LC_CTYPE on the remote side, e.g. if LC_CTYPE is "en_US", one doesn't
> have information about the charset on the remote side.

Locale names are indeed opaque as far as POSIX is concerned, so there's
no portable way to pick them apart. But even if locale names are in
principle identical (i.e. client and server running the same release of
the same operating system), there's a further problem. With glibc,
locale definitions are quite large (thus inconvenient to distribute in
pre-generated form) and take some time to generate, so it's fairly
common for distributions to set things up so that you only generate the
locale definitions you need. The locale you're using on the client may
simply not exist on the server.

Now, in some ways this does end up invoking undefined behaviour; you're
asking for a locale that doesn't exist. Messages will of course end up
being output in the C locale, and so on. But it is *terribly* useful to
at least get the character encoding right, as otherwise you get hopeless
garbage on the screen and it may well not be very obvious what the
problem is.

This is compounded by the fact that there is no equivalent of "C" for
UTF-8 in glibc: that is, there's no way to say "I just want a basically
unlocalised system that happens to use the UTF-8 encoding for
everything". Thus even people who don't care about localisation have to
select something like en_US.UTF-8 or en_GB.UTF-8, and any time they ssh
to a server that doesn't have those locales generated they end up with
screwed-up output from full-screen terminal applications.

> > Also, it's not just character sets that matter, but language for
> > localization of messages, date formats, etc...
>
> Well, localization of messages and date formats are just a user choice.
> If they are different on the remote side, this isn't really a problem.
> Concerning the character set, the remote one must be compatible with
> the local one, at least when a terminal is used (ditto for the IUTF8
> pseudo-terminal mode). Otherwise the user can't view or edit non-ASCII
> characters correctly.

Absolutely.

--
Colin Watson [<email address hidden>]