kate silently corrupts iso-8859-1 files in utf-8 locale

Bug #60670 reported by JS
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
KDE Base
Invalid
Undecided
Unassigned
KDE Software Development Kit
Fix Released
Medium
kdesdk (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Binary package hint: kate

If I open a binary file in kate, it gives a warning that saving the file will corrupt it. Fine.

However, if I work in a utf-8 locale, open an iso-8859-1 encoded file in kate, and save it, it will be corrupted. Each non-ASCII character (that does not happen to form a valid utf-8 encoded character) is changed into an "invalid character" marker. Simply opening and saving the file is enough to lose all non-ASCII characters, and no warning is given.

Switching to the right encoding after opening the file naturally helps, if only one remembers to do it. If the files are in English, there may be only a few non-ASCII characters in the files, making them easy to miss until it is too late.

Revision history for this message
Kenny Duffus (kduffus) wrote :

can you check by doing save as what encoding kate thinks the file is in by looking at the top right of the save dialog where it will say something like "iso 8859-1" or "utf8" etc

Changed in kdebase:
importance: Untriaged → Medium
status: Unconfirmed → Needs Info
Revision history for this message
JS (j5) wrote :

File / Save As... dialog shows "utf8".

Revision history for this message
kibe (b-kix) wrote :

I want to add, that this problem happens with xml-files, too...
If you're opening a xml-file witch is iso-8859-1 encoded and also has a correct xml-header where the encoding of the file is stated, kate is corrupting the file by simply saving it, 'cause kate saves them in utf-8.

But when you open the file and manually correct the encoding, then edit, then save it, there is no problem. The next time you open that file, kate handles the encoding correct and opens it in iso-8859-1 each time.

Seems to me that kate sets some kind of flag to the file?

Revision history for this message
JS (j5) wrote :

Some further comments regarding the behaviour of the latest version of Kate (2.5.6) in Kubuntu.

As I have learned that Kate corrupts iso-8859-1 files in UTF-8 locale, I have tried to make sure that Kate gets the encoding right by adding "kate: encoding ..." directives in some files. However, this leads into yet another trouble.

The following happens quite often; I am using a UTF-8 locale:

- I have an iso-8859-1 encoded file which is created somewhere else. The file might be, for example, a Latex source code; it contains the directive "% kate: encoding iso-8859-1" near the beginning of the file and it also contains some non-ASCII characters encoded in iso-8859-1.

- I open the file in Kate (usually a Kate instance is already running; I open files using "kate -u file.tex" from the command line).

- At this point I _should_ notice that the file is displayed using the wrong encoding; non-ASCII characters are shown as squares. However, this is very easy to miss if there are only few non-ASCII characters in the file, especially if none of them are near the beginning of the file but hidden somewhere in the middle of a long file.

- Therefore I simply start editing the file. I do changes here and there and finally decide to save the changes. At this point Kate complains that "the selected encoding cannot encode every Unicode character in the file".

- Now I notice that all non-ASCII characters are indeed displayed as squares. However, Tools / Encoding shows that iso-8859-1 is selected!

The trouble is that even though Tools / Encoding show iso-8859-1, Kate did not properly read the file in this encoding. An now I am in trouble. At this point I cannot reselect Tools / Encoding / iso-8859-1; Kate would like to discard the changes and re-read the original file. And I cannot even copy & paste the file into a new window and save it with the correct encoding, because non-ASCII characters are garbage. I have to re-open the file with the correct encoding and manually copy & paste just those parts where I have done editing. Or something like that.

There is a workaround, but it is hard to remember that this must be done every time:

- Check that non-ASCII characters (if any) are displayed correctly. If not, re-select Tools / Encoding / iso-8859-1 immediately after opening the file.

It seems that Kate remembers the correct setting for each file; therefore if I do this once for a particular file (or if the file is originally created with Kate on this computer), things work fine.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for kdebase (Ubuntu) because there has been no activity for 60 days.]

Revision history for this message
JS (j5) wrote :

The same bug is still present in Kubuntu 7.10.

I just opened an ISO-8859-1 file on UTF-8 locale. Most of the text was in plain ASCII so I did not notice that anything is wrong. I did some editing and saved the file. There were no warnings, and all non-ASCII characters are replaced with garbage. Some data is lost.

Changed in kdebase:
status: Invalid → Incomplete
Revision history for this message
Martin Böhm (martin.bohm) wrote :

Thank you for your bug report. Is it only Kubuntu's Kate that corrupts the file? Could anyone try another distribution's Kate so we can assume it's a kubuntu-only bug?

Revision history for this message
JS (j5) wrote :

Still present in up-to-date Kubuntu 8.04 (Kate 2.5.9, KDE 3.5.9).

Revision history for this message
Reinhard (rforge) wrote :

I have the same problem with Kile 2.0.0 and Kubuntu 8.04 (KDE 4.1 with Kile using 3.5.9)

I have to select the 8859-1 encoding for each single (sub-)tex-document I want to open. Quite embarrasing with a 200+ page report created with MikTEX on a XP machine...

I really like the Kile environment and looking forward exploring the possiblilities, but having some headwind right now to port my project to Kubuntu/Kile/utf-8....

Revision history for this message
Harald Sitter (apachelogger) wrote :

Moving the bug to kdesdk. Still present in KDE 4.1 from what I can tell.

Changed in kdebase:
status: Incomplete → Confirmed
Changed in kdesdk:
status: Unknown → New
Changed in kdebase:
status: New → Invalid
Changed in kdesdk (Ubuntu):
status: Confirmed → Triaged
Changed in kdesdk:
status: New → Fix Released
Revision history for this message
Jonathan Thomas (echidnaman) wrote :

Fix committed for KDE 4.5. Unfortunately the fix was a rewrite of a major component, so the fix cannot be backported to the stable 4.4 packages.

Changed in kdesdk (Ubuntu):
status: Triaged → Fix Committed
Felix Geyer (debfx)
Changed in kdesdk (Ubuntu):
status: Fix Committed → Fix Released
Changed in kdesdk:
importance: Unknown → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.