Gedit fails to read UTF-16 encoded file
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
gedit (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Affected version: 3.22.0
I'm trying to open a certain text file. Unsure of the exact encoding used, I viewed another text file in the same folder (part of the same thing) and had GEdit auto-detect the encoding as UTF-16. Viewing the file in a hex editor this seems to indeed be the case. The other file contains a lot of CJK characters, while this file contains very little (mostly ASCII english, with a few special symbols). You can see it in the file structure; almost every even byte is a zero byte (0x00).
GEdit fails to open the file with the message
Could not open the file “/hdd/programs/
Unexpected error: Invalid byte sequence in conversion input
I first figured the problem was with the text file. So I tried 'fixing' the file by converting to its own encoding, ignoring invalid sequences, using 'iconv' tool.
$ iconv -c -f 'UTF-16' -t 'UTF-16' addon_english.txt > addon_english_
$ sha1sum addon_english.txt
e0e9f360482f2f2
$ sha1sum addon_english_
e0e9f360482f2f2
As you can clearly see, nothing changed. Therefore I'm suspecting something's wrong with gedit here.
As an aside, other editors also don't like this file much:
GNU nano won't open it by default.
vim will open it, but can't display all the characters in it (probably han unification issues).
leafpad will nuke the contents replacing it with a literal ASCII Byte-order mark. (A BOM as rendered in Latin-1).
My locale settings are EN-GB for language and UTF-8 for preferred charset used by the OS itself.
The file in question has been attached to this bug report for bug reproduction purposes.