(In reply to comment #3)
> No, I'm absolutely certain that I'm in a UTF-8 locale, and I'm really very
sure that I'm not pretending that ISO-8859-1 strings are UTF-8. Try it for yourself.
> $ cat frederic
> Frédéric
> $ file frederic
> frederic: UTF-8 Unicode text
> $ od -tx1 < frederic
> 0000000 46 72 c3 a9 64 c3 a9 72 69 63 0a
> 0000013
> $ iconv -f UTF-8 -t ASCII < frederic
> Friconv: illegal input sequence at position 2
> $ iconv -c -f UTF-8 -t ASCII < frederic
> Frdric
> C3 A9 is the correct UTF-8 encoding of U+00E9, LATIN SMALL LETTER E WITH ACUTE.
> None of my tests with either recode or iconv have been able to get them to
> transcode this into the closest-possible ASCII representation without just
> leaving out the characters whose codepoints lie outside ASCII. Again, if you
> know how to get them to do this, I'd be interested to hear about it.
Something weird is going on:
$ echo è| recode utf-8..ascii
`e
$ echo é |LANG=C recode utf-8..ascii
recode: Invalid input in step `UTF-8..ANSI_X3.4-1968'
$ echo é |LANG=C iconv -f utf-8 -t ascii
iconv: illegal input sequence at position 0
wtf...
ow can some UTF8 characters be more UTF8 than others, and only for recode? And
how come iconv has a different idea of unicode than recode?
The only thing I can suggest now is a:
reportbug iconv; reportbug recode
(In reply to comment #3)
> No, I'm absolutely certain that I'm in a UTF-8 locale, and I'm really very
sure that I'm not pretending that ISO-8859-1 strings are UTF-8. Try it for yourself.
> $ cat frederic
> Frédéric
> $ file frederic
> frederic: UTF-8 Unicode text
> $ od -tx1 < frederic
> 0000000 46 72 c3 a9 64 c3 a9 72 69 63 0a
> 0000013
> $ iconv -f UTF-8 -t ASCII < frederic
> Friconv: illegal input sequence at position 2
> $ iconv -c -f UTF-8 -t ASCII < frederic
> Frdric
> C3 A9 is the correct UTF-8 encoding of U+00E9, LATIN SMALL LETTER E WITH ACUTE.
> None of my tests with either recode or iconv have been able to get them to
> transcode this into the closest-possible ASCII representation without just
> leaving out the characters whose codepoints lie outside ASCII. Again, if you
> know how to get them to do this, I'd be interested to hear about it.
Something weird is going on:
$ echo è| recode utf-8..ascii .ANSI_X3. 4-1968'
`e
$ echo é |LANG=C recode utf-8..ascii
recode: Invalid input in step `UTF-8.
$ echo é |LANG=C iconv -f utf-8 -t ascii
iconv: illegal input sequence at position 0
wtf...
ow can some UTF8 characters be more UTF8 than others, and only for recode? And
how come iconv has a different idea of unicode than recode?
The only thing I can suggest now is a:
reportbug iconv; reportbug recode
I'm sorry I have no other clues at the moment.
Ciao,
Enrico