Font problems with .tex files and special (danish) characters

Bug #178173 reported by Christian Dalbjerg
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
texlive-base (Ubuntu)
Expired
Medium
Unassigned

Bug Description

I have created and edited some .tex files in ubuntu 7.10, using kile 1.9.3 and/or texmaker 1.6. They are written in danish, so the letters 'æ', 'ø' and 'å' occur frequently. When opening the files with kile or texmaker in ubuntu 8.04 alpha 2 (or opensuse), the letters are not recognized, and the files are therefore useless (in texmaker the letters simply dont appear; kile shows æ instead of 'æ', ø instead of 'ø' and Ã¥ instead of 'å').
But, if I rightclick on one of these files and choose 'Open with Text Editor', all the letters are displayed correctly. If I create a new file in either texmaker or kile, I can copy the entire text from gedit and into the empty file. The letters will then be displayed correct in the editors, but the file cannot be compiled. For every special character, LaTeX gives an error like:

./sdf.tex:35:Package inputenc Error: Keyboard character used is undefined(inputenc) in inputencoding `utf8'.
or
./sdf.tex:35:Package inputenc Error: Unicode char \u8:åde not set up for use with LaTeX.

I load the package \usepackage[utf8]{inputenc} in the preambel. The Tex-installation is the one wich automatically gets done when installing kile.
I am not sure exactly what more information you need, so just ask.

Best regards,
Christian Dalbjerg

Revision history for this message
Norbert Preining (preining) wrote : Re: [Bug 178173] Font problems with .tex files and special (danish) characters

Hi Christian,

On Sa, 22 Dez 2007, Christian Dalbjerg wrote:
> I am not sure exactly what more information you need, so just ask.

Your problem is all about character sets. It seems that the text editor
you are referring to understands AND writes UTF8, but kile/texmaker
cannot read/work with that.

Best would be that you upload a file that does NOT work to the launchpad
so I can take a look. But this is definitely not a TeX bug but a
problems with your character encoding mixture.

Best wishes

Norbert

-------------------------------------------------------------------------------
Dr. Norbert Preining <email address hidden> Vienna University of Technology
Debian Developer <email address hidden> Debian TeX Group
gpg DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
-------------------------------------------------------------------------------
HENSTRIDGE (n.)
The dried yellow substance found between the prongs of forks in
restaurants.
   --- Douglas Adams, The Meaning of Liff

Revision history for this message
Christian Dalbjerg (christiandalbjerg-deactivatedaccount) wrote :

Dear Norbert

I am sorry for any in inconvenience I have caused with this misplaced bugreport. I am aware that the problem is not a 'real' bug in kile/texmaker/TeXLive. Still this should not happen, and I suppose that it is due to the general setup of the default ubuntu installation of TeXLive. I have attaced a couple of files for you to look at.
By the way, I know almost nothing of character sets and the like. What I know is that i can't use my .tex files on some other distributions or on windows, and seemingly not on the newest ubuntu either! I thought .tex files were just plain ASCII files, and so there should be no problems with compatibilty between different systems and platforms. I suppose all this business about input-encoding and character sets means I am wrong. How does one ensure ones .tex files to be completely adherent to standards and compatible with different systems, now and in the future? I should like to be able to read and edit the same files in 5 years from now, and to take my files back and forth between university my home system without problems of this sort. Could you perhaps point me in the direction of ressources which will be able to answer questions of this sort, and of character encoding in general?

Best regards, and a merry christmas to you,
Christian

Revision history for this message
Christian Dalbjerg (christiandalbjerg-deactivatedaccount) wrote :

Here is another file

Revision history for this message
Norbert Preining (preining) wrote : Re: [Bug 178173] Re: Font problems with .tex files and special (danish) characters

Hi Christian,

I checked both files and:
- both files are saved in UTF8 encoding
- both files compile fine without any warning on my system

Can you explain what problems you had with these files?

Is it only that kile and texmaker cannot work with them or do you have
problems compiling the files with latex?

On So, 23 Dez 2007, Christian Dalbjerg wrote:
> By the way, I know almost nothing of character sets and the like.
> What I know is that i can't use my .tex files on some other distributions
> or on windows, and seemingly not on the newest ubuntu either! I thought
> .tex files were just plain ASCII files, and so there should be no
> problems with compatibilty between different systems and platforms.

You CAN use plain ascii files, but if you want to key in characters of
your national script (or mine, or anything else which needs more then
ASCII) you have two options:
- use tex commands for your symbols like \ae \"o etc
- use different character encodings

I cannot explain the full details, that would be too long. But your
files are saved in UTF8 which is an international standard and work
quite nice with latex. So no problems here.

As I said, I also could compile the file on my system, and I am sure
that it will work on Windows, too.

So please again, what are the problems you have with these two files?

All the best and a peaceful christmas

Norbert

-------------------------------------------------------------------------------
Dr. Norbert Preining <email address hidden> Vienna University of Technology
Debian Developer <email address hidden> Debian TeX Group
gpg DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
-------------------------------------------------------------------------------
KINGSTON BAGPUISE (n.)
A forty-year-old sixteen-stone man trying to commit suicide by
jogging.
   --- Douglas Adams, The Meaning of Liff

Revision history for this message
Christian Dalbjerg (christiandalbjerg-deactivatedaccount) wrote :

Dear Norbert

Well, my problem is as I described it in the first report: Using ubuntu 8.04 I basically have two choices:
1. Open the files with kile; kile shows æ instead of 'æ', ø instead of 'ø' and Ã¥ instead of 'å'. I can compile the files and get the correct output, but the .tex file is messed up with strange symbols. This is of course not acceptable, as I work with these files on a daily basis.
2. Open the .tex file with gedit, and copy the code into kile. Now the letters display correctly in the editor, but I can't compile the files: I am getting errors like the ones posted in the original report.
Of course the problem is not only with the two files I posted here, it applies to all my .tex files. There is no problem using ubuntu 7.10, but I would like to be able to use my .tex files with the new ubuntu, once released. Also, I have experienced the same problem with a stable release of opensuse.
When you open the files in your editor, do you see the letters displayed probably?

I am sorry if it was (is) not clear enough formulated. Please feel free to ask further questions.

Christian

Revision history for this message
Norbert Preining (preining) wrote :
Download full text (3.3 KiB)

Hi Christian,

On So, 23 Dez 2007, Christian Dalbjerg wrote:
> 1. Open the files with kile; kile shows Ã?? instead of 'æ', Ã?? instead of 'ø' and Ã¥ instead of 'å'. I can compile the files and get the correct output, but the .tex file is messed up with strange symbols. This is of course not acceptable, as I work with these files on a daily basis.

Ok, I installed kile and see what is going on. Your ENVIRONMENT is not
set up for UTF8 but for some national encoding, if you enter
 locale
on the cmd line of a shell you will see something like
 LANG=xx.YYYYY
where the YYYYY is the encoding. Maybe you have as YYYYY
 ISO-8859-15
which is ok.

BUT: Your tex files are encoded in utf8. Kile seems to have the problem
that it cannot autodetect the encoding of files automatically.

Now kile opens your file as ISO-8859-15 encoding so that there appear
that strange double letters (because 'æ' is encoded as 2 bytes in utf8).

> 2. Open the .tex file with gedit, and copy the code into kile. Now the letters display correctly in the editor, but I can't compile the files: I am getting errors like the ones posted in the original report.

gedit CAN auto-detect that encoding so opens your tex files in utf8 and
shows you the right characters. Now when you copy from gedit to kile you
enter a 'æ' in national encoding into the kile file. Now if you save
that and compile it with latex it breaks because you have
 \usepackage[utf8]{inputenc}
and the 'æ' in your local encoding is NOT utf8!!

So all this is to be expected, but the problem that kile is too stupid
to autodetect encodings. Maybe this could be filed as a bug report
against kile.

You have the following options, depending on HOW you want to save your
files:

1) you want to use utf8 as default encoding for your tex files

  tell kile that files should always be treated as utf8:
  Settings -> Configure Kile
     Editor -> Open/Save
       change "Encoding" to "Unicode ( utf8 )"

   from now on all files opened in kile will be treated as utf8
   inputenc. So don't forget theusepackage line as above.

2) you switch to iso-8859-15 as default encoding for your tex files

   leave kile alone
   leave gedit alone
   edit your tex files to include
     \usepackage[latin9]{inputenc} % or latin1
   You should recode your tex files to latin9 with
     recode utf8..recode file.tex
   so that your 'æ' gets translated from utf8 to latin1/9.

I hope that all this is a bit clearer now.

Ah yes, why you did have problems on other computers: You copied the 'æ'
from gedit into kile. kile saved it in your national encoding, but the
tex file specifies inputencoding utf8, thus it breaks on other systems,
too.

So to sum it up: The real bug is with kile which cannot autodetect the
encoding of files.

Best wishes

Norbert

-------------------------------------------------------------------------------
Dr. Norbert Preining <email address hidden> Vienna University of Technology
Debian Developer <email address hidden> Debian TeX Group
gpg DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
-------------------------------------------------------------------------------
FORD Six pin...

Read more...

Revision history for this message
Christian Dalbjerg (christiandalbjerg-deactivatedaccount) wrote :

Dear Norbert

Thanks alot for all your work, its appreciated! When entering locale in ubuntu 7.10 I get LANG=en_DK.UTF-8, which causes me no problems since kile is set to use encoding KDEDefault, which im guessing is refering to what YYYY=DK.UTF-8 is. But when entering locale in ubuntu 8.04 I get LANG=C. Is it then correctly understood that the problem arrises because kile is trying to open the files as if they were encoded in whatever LANG=C means? And what to do about it? Im not sure, but I think the the bug report should be filed against ubuntu in general? I mean, it would be nice if kile could autodetect the encoding, but it isn't really a bug in kile, more like a feature request.
I have installed ubuntu 7.10 and 8.04 the exact same way so I don't understand why the LANG settings are different. It would be nice if this was changed back before final release.
In the meanwhile, which one of the two options do you recommend? The first one seems the easiest, is there any reason to prefer the second?

Regards
Christian

Revision history for this message
Norbert Preining (preining) wrote :

Dear Christian,

On Mo, 24 Dez 2007, Christian Dalbjerg wrote:
> Thanks alot for all your work, its appreciated! When entering locale
> in ubuntu 7.10 I get LANG=en_DK.UTF-8, which causes me no problems
> since kile is set to use encoding KDEDefault, which im guessing is
> refering to what YYYY=DK.UTF-8 is.

more or less, YYYY=UTF-8, locales consist of
 aa[_BB].CCCC
aa ... 2 letter language code
BB ... 2(?) letter country code
 the _BB is not necessary
CCCC ... character encoding

So that means that your are working with English language in Danemark,
with UTF-8 encoding.

> But when entering locale in ubuntu 8.04 I get LANG=C. Is it then

Ups, well, then everything is though to be in ASCII.

> correctly understood that the problem arrises because kile is trying
> to open the files as if they were encoded in whatever LANG=C means?

ASCII

> And what to do about it? Im not sure, but I think the the bug report
> should be filed against ubuntu in general?

See below ...

> I mean, it would be nice if kile could autodetect the encoding, but it
> isn't really a bug in kile, more like a feature request.

Right, feature request.

> I have installed ubuntu 7.10 and 8.04 the exact same way so I don't
> understand why the LANG settings are different. It would be nice if
> this was changed back before final release.

Sorry I cannot help you here since I am Debian maintainer and only
helping out on the Ubuntu side a bit. I don't know nothing about the
internals of the installer and why the LOCALE settings weren't done
right. But it is definitely worth a bug report.

> In the meanwhile, which one of the two options do you recommend?
> The first one seems the easiest, is there any reason to prefer the second?

I am not sure about the way to fix it on Ubuntu, but I would suggest:
 sudo /usr/sbin/dpkg-reconfigure -plow locales
then select the en_DK.UTF-8 and maybe some others you might have use
for. And AFAIR at the end it should ask you about the default locale for
your system. After that restarting the computer (or restarting the
display manager gdm/kdm/whatever-dm) should give you the right
setting. If not, there might be something saved in your local
configuration files in ~/.?something.

But that is not for me to debug.

I hope that helped a bit

Best wishes

Norbert

-------------------------------------------------------------------------------
Dr. Norbert Preining <email address hidden> Vienna University of Technology
Debian Developer <email address hidden> Debian TeX Group
gpg DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094
-------------------------------------------------------------------------------
And wow! Hey! What's this thing coming towards me very
fast? Very very fast. So big and flat and round, it needs a
big wide sounding word like... ow... ound... round...
ground! That's it! That's a good name - ground!
I wonder if it will be friends with me?
                 --- For the sperm whale, it wasn't.
                 --- Douglas Adams, The Hitchhikers Guide to the Galaxy

Revision history for this message
xteejx (xteejx) wrote :

Inactive account, just under 2 years since last comment, and no response from submitter.
Closing bug report.

Changed in texlive-base (Ubuntu):
status: New → Invalid
Revision history for this message
ferrazrafael (ferrazrafael) wrote :

Im having a similar problem in ubuntu 9.10, when I import some .tex files from windows, generated in texmaker as well, some character like "ã" and "ç" seems to be deleted, but when I put it again, tex live process them incorrectly puting some other characters instead. I im using ubuntu is portuguese from brazil, locale pt_br_utf8 and pt_pt_utf8.

Changed in texlive-base (Ubuntu):
status: Invalid → Confirmed
Revision history for this message
xteejx (xteejx) wrote :

Thanks for updating us with this. Can you confirm what version of texlive you are using please? Thank you.

Changed in texlive-base (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
ferrazrafael (ferrazrafael) wrote :

im using the last one in 9.10, is texlive 2007.dfsg.2-4ubuntu1

Revision history for this message
xteejx (xteejx) wrote :

Great. Thank you for the info, marking Confirmed.

Changed in texlive-base (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
ferrazrafael (ferrazrafael) wrote :

if there is something to do, more info to provide, I will be glade to help

thanks guys

Revision history for this message
Romain Janvier (romain-janvier) wrote :

Confirmed.

I use Ubuntu 9.10 / LANG=fr_FR.UTF-8 with correct header (\usepackage[utf8]{inputenc}
\usepackage[francais]{babel}) in my *.tex files and i've got the inputenc error (! Package inputenc Error: Unicode char \u8:° not set up for use with LaTeX) using texlive.

Revision history for this message
richard laurenson (frrichardlaurenson) wrote :

I am using Lyx which uses texlive, on Ubuntu 9.10. Locality Rome, but US English by default.

I have one file that compiles Fine... Typed everything myself in Lyx
and it has for language setting UNICODE (ucs -Extended) (utf8x)

Another file, Which I wrote in OO and copied pasted gives the error
"Package inputenc Error: Unicode char \u8:  not set up for use with LaTeX."

I exported from OO to .txt and imported after replacing all the "" with plain asci in gedit and got this error
"Could not find LaTeX command for character '⁠' (code point 0x2060)"

This is under UNICODE UTF8 or the extended option above

If I paste the offending text into the working file I get this error
Package ucs Error: Unknown Unicode character 8288 = U+2060,
Description
 by step,”2⁠
                      There was no regular shipping service into this part of
Unicode character 8288 = U+2060:
WORD JOINER
Character is not defined in uni-*.def files.
Enter I!<RET> to define the glyph.

Now For me this is not critical, as I am only playing... but it is curious.
There is this post maybee worth looking at on topic http://ubuntuforums.org/showthread.php?t=669228

Revision history for this message
richard laurenson (frrichardlaurenson) wrote :

Edit:

Hmm... there seems to be a number of non printing characters, field placings from footnotes or something that have survived as artifacts when I copy and pasted them.... I wonder if these are the culprits. ....

YES they are...
Copy Paste from OO into Lyx, and cpy paste Lyx to OO. Removed all artifacts and it compiles nicely.
Cannot see these artifacts in any of my txt editors nor Lyx, only OO as grey shades and in MS word07 (wine) as extra wide spaces when view all characters is on, and is not visible at all with Abiword.

Revision history for this message
zpon (zpon-dk) wrote :

I have a similar problem. I have been using LaTeX for years, but since my last reinstall of ubuntu, I haven't been able to get it working normally, even files that compiled with no problems on my old setup is now giving me the same error.
When the compiler gets to the \section in my file it get the following error:

----
! Package inputenc Error: Unicode char \u8: not set up for use with LaTeX.

See the inputenc package documentation for explanation.
Type H <return> for immediate help.
 ...

l.1 
       \section{Section title}
----

I am on Ubuntu 9.10, any help will be appreciated

Revision history for this message
zpon (zpon-dk) wrote :

Update:

I just changed utf8 to utf8x in my preamble file, and now everything seems to working great again

Revision history for this message
RomainWartel (linux-wartel-net) wrote :

Just to confirm what zpon said, using utf8x fixes the problem (Kile 2.0.83 on Karmic).

\usepackage[utf8x]{inputenc}

Revision history for this message
xteejx (xteejx) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. We are sorry that we do not always have the capacity to look at all reported bugs in a timely manner.
There have been many changes in Ubuntu since the time of the last comment and your problem may have been fixed with some of the updates. It would help us a lot if you could test the current Ubuntu version (10.10). If you can test it, and it is still an issue, we would appreciate if you could upload updated logs by running apport-collect <bug #>, and any other logs that are relevant for this particular issue.

Changed in texlive-base (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Samium Gromoff (deepfire) wrote :
Download full text (3.9 KiB)

I think I'm hitting the same issue on an up-to-date installation of 10.10.

What I did, essentially, is:
  1. install emacs, org-mode, texlive-latex-base, texlive-latex-recommended
  2. create an org file with Cyrillic characters, in UTF-8
  3. request org-export to PDF

Here's what I'm getting for the very first cyrillic character, in the Shell interaction emacs buffer:

This is pdfTeX, Version 3.1415926-1.40.10 (TeX Live 2009/Debian)
 restricted \write18 enabled.
entering extended mode
(/home/deepfire/src/pyzzle/doc/documentation.tex
LaTeX2e <2009/09/24>
Babel <v3.8l> and hyphenation patterns for english, usenglishmax, dumylang, noh
yphenation, loaded.
(/usr/share/texmf-texlive/tex/latex/base/article.cls
Document Class: article 2007/10/19 v1.4h Standard LaTeX document class
(/usr/share/texmf-texlive/tex/latex/base/size11.clo))
(/usr/share/texmf-texlive/tex/latex/base/inputenc.sty
(/usr/share/texmf-texlive/tex/latex/base/utf8.def
(/usr/share/texmf-texlive/tex/latex/base/t1enc.dfu)
(/usr/share/texmf-texlive/tex/latex/base/ot1enc.dfu)
(/usr/share/texmf-texlive/tex/latex/base/omsenc.dfu)))
(/usr/share/texmf-texlive/tex/latex/base/fontenc.sty
(/usr/share/texmf-texlive/tex/latex/base/t1enc.def))
(/usr/share/texmf-texlive/tex/latex/graphics/graphicx.sty
(/usr/share/texmf-texlive/tex/latex/graphics/keyval.sty)
(/usr/share/texmf-texlive/tex/latex/graphics/graphics.sty
(/usr/share/texmf-texlive/tex/latex/graphics/trig.sty)
(/etc/texmf/tex/latex/config/graphics.cfg)
(/usr/share/texmf-texlive/tex/latex/pdftex-def/pdftex.def)))
(/usr/share/texmf-texlive/tex/latex/tools/longtable.sty)
(/usr/share/texmf-texlive/tex/latex/hyperref/hyperref.sty
(/usr/share/texmf-texlive/tex/generic/oberdiek/ifpdf.sty)
(/usr/share/texmf-texlive/tex/generic/oberdiek/ifvtex.sty)
(/usr/share/texmf-texlive/tex/generic/ifxetex/ifxetex.sty)
(/usr/share/texmf-texlive/tex/latex/oberdiek/hycolor.sty
(/usr/share/texmf-texlive/tex/latex/oberdiek/xcolor-patch.sty))
(/usr/share/texmf-texlive/tex/latex/hyperref/pd1enc.def)
(/usr/share/texmf-texlive/tex/generic/oberdiek/etexcmds.sty
(/usr/share/texmf-texlive/tex/generic/oberdiek/infwarerr.sty))
(/usr/share/texmf-texlive/tex/latex/latexconfig/hyperref.cfg)
(/usr/share/texmf-texlive/tex/latex/oberdiek/kvoptions.sty
(/usr/share/texmf-texlive/tex/generic/oberdiek/kvsetkeys.sty))
Implicit mode ON; LaTeX internals redefined
(/usr/share/texmf-texlive/tex/latex/ltxmisc/url.sty)
(/usr/share/texmf-texlive/tex/generic/oberdiek/bitset.sty
(/usr/share/texmf-texlive/tex/generic/oberdiek/intcalc.sty)
(/usr/share/texmf-texlive/tex/generic/oberdiek/bigintcalc.sty
(/usr/share/texmf-texlive/tex/generic/oberdiek/pdftexcmds.sty
(/usr/share/texmf-texlive/tex/generic/oberdiek/ifluatex.sty)
(/usr/share/texmf-texlive/tex/generic/oberdiek/ltxcmds.sty))))
(/usr/share/texmf-texlive/tex/generic/oberdiek/atbegshi.sty))
*hyperref using default driver hpdftex*
(/usr/share/texmf-texlive/tex/latex/hyperref/hpdftex.def) (./documentation.aux)
(/usr/share/texmf-texlive/tex/context/base/supp-pdf.mkii
[Loading MPS to PDF converter (version 2006.09.02).]
) (/usr/share/texmf-texlive/tex/latex/hyperref/nameref.sty
(/usr/share/texmf-texlive/tex/latex/oberdiek/refcount.sty))
(./docu...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for texlive-base (Ubuntu) because there has been no activity for 60 days.]

Changed in texlive-base (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.