Gedit fails to read UTF-16 encoded file

Bug #1671512 reported by Louis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
gedit (Ubuntu)
New
Undecided
Unassigned

Bug Description

Affected version: 3.22.0

I'm trying to open a certain text file. Unsure of the exact encoding used, I viewed another text file in the same folder (part of the same thing) and had GEdit auto-detect the encoding as UTF-16. Viewing the file in a hex editor this seems to indeed be the case. The other file contains a lot of CJK characters, while this file contains very little (mostly ASCII english, with a few special symbols). You can see it in the file structure; almost every even byte is a zero byte (0x00).

GEdit fails to open the file with the message
Could not open the file “/hdd/programs/thd2/resource/addon_english.txt”.
Unexpected error: Invalid byte sequence in conversion input

I first figured the problem was with the text file. So I tried 'fixing' the file by converting to its own encoding, ignoring invalid sequences, using 'iconv' tool.

$ iconv -c -f 'UTF-16' -t 'UTF-16' addon_english.txt > addon_english_fixed.txt
$ sha1sum addon_english.txt
e0e9f360482f2f234e5aeb09406c10081ebb6e1a addon_english.txt
$ sha1sum addon_english_fixed.txt
e0e9f360482f2f234e5aeb09406c10081ebb6e1a addon_english_fixed.txt

As you can clearly see, nothing changed. Therefore I'm suspecting something's wrong with gedit here.
As an aside, other editors also don't like this file much:

GNU nano won't open it by default.
vim will open it, but can't display all the characters in it (probably han unification issues).
leafpad will nuke the contents replacing it with a literal ASCII Byte-order mark. (A BOM as rendered in Latin-1).

My locale settings are EN-GB for language and UTF-8 for preferred charset used by the OS itself.

The file in question has been attached to this bug report for bug reproduction purposes.

Revision history for this message
Louis (aphid4) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.