Comment 3 for bug 7427

Revision history for this message
Colin Watson (cjwatson) wrote :

No, I'm absolutely certain that I'm in a UTF-8 locale, and I'm really very sure
that I'm not pretending that ISO-8859-1 strings are UTF-8. Try it for yourself.

  $ cat frederic
  Frédéric
  $ file frederic
  frederic: UTF-8 Unicode text
  $ od -tx1 < frederic
  0000000 46 72 c3 a9 64 c3 a9 72 69 63 0a
  0000013
  $ iconv -f UTF-8 -t ASCII < frederic
  Friconv: illegal input sequence at position 2
  $ iconv -c -f UTF-8 -t ASCII < frederic
  Frdric

C3 A9 is the correct UTF-8 encoding of U+00E9, LATIN SMALL LETTER E WITH ACUTE.
None of my tests with either recode or iconv have been able to get them to
transcode this into the closest-possible ASCII representation without just
leaving out the characters whose codepoints lie outside ASCII. Again, if you
know how to get them to do this, I'd be interested to hear about it.