Can not display GB2312/GB18030 encoded chinese files
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| gedit |
New
|
Undecided
|
Unassigned | |
| gedit (Ubuntu) |
High
|
Unassigned | ||
Bug Description
GB2312/GB18030 encoding is the national standard in China, gedit should support them.
An Yang (euroford) wrote : | #1 |
An Yang (euroford) wrote : | #2 |
An Yang (euroford) wrote : | #3 |
The content of gb18030.txt should like this picture.
Kyle Nitzsche (knitzsche) wrote : | #4 |
Hi,
Is it possible that the characters do not display simply because there is no font installed that provides a glyph for the code point?
(I notice when looking in the character map application, there are many Chinese characters that appear to have no glyph, for example <UFACE> has a glyph but <UFACF> does not (natty).
An Yang (euroford) wrote : Re: [Bug 819714] Re: Can not display GB2312/GB18030 encoded chinese files | #5 |
在 2011-08-02二的 19:41 +0000,Kyle Nitzsche写道:
> Hi,
>
> Is it possible that the characters do not display simply because there
> is no font installed that provides a glyph for the code point?
Sure, It's possible.
But wqy fonts include all glyphs in CJK UNIFIED IDEOGRAPHs and extension
B.
It's enough according to China national standards.
In this bug, gedit could display none of them if the file is GB18030
encoded.
GB18030 is the encoding standard in China, So I think it's a fatal bug.
>
> (I notice when looking in the character map application, there are many
> Chinese characters that appear to have no glyph, for example <UFACE> has
> a glyph but <UFACF> does not (natty).
>
An Yang (euroford) wrote : | #6 |
I'm sorry, wqy fonts include all glyphs in CJK UNIFIED IDEOGRAPHs and extension A.
CJK UNIFIED IDEOGRAPHs extension B/C is optional.
An Yang (euroford) wrote : | #7 |
Gedit indeed can display all gb18030 encoded files, it support gb18030 encoding very well.
But gedit in ubuntu can not do this -:(
Sebastien Bacher (seb128) wrote : | #8 |
The file opens fine on my Oneiric installation if I do this:
- run gedit
- click open
- select in the encoding combo "add" and add GB18030
- select that encoding in the combo
- select the file
it renders like it is on the screenshot then
is there a way to detect that a file is GB18030 in a programmatic way? how does other editor deal with that example?
An Yang (euroford) wrote : | #9 |
Sebastien,
Yes, you are right, if the user know the encoding of the file, they can open it with gedit.
But not all of them know what's the encoding, so automatic detect mode is the most user case.
Gedit has a auto detect sequence recorded in gconf, the correct value is [CURRENT,
An Yang (euroford) wrote : | #10 |
I'm sorry, typing mistaken, the correct value is
[CURRENT,
Sebastien Bacher (seb128) wrote : | #11 |
can encoding be automatically be detected though? the gconf key you list is by local and suggest that chinese install should use gb encoding before utf8 so the example should open fine?
An Yang (euroford) wrote : | #12 |
In /usr/share/
For example, when the LANG=zh_CN, the following will be set:
<locale name="zh_CN">
</locale>
postinst scripts of gedit:
if [ "$1" = "configure" ]; then
fi
An Yang (euroford) wrote : | #13 |
So I guess, if the LANG environment were set to zh_CN, you can create a release CD with the right settings of gedit.
Sebastien Bacher (seb128) wrote : | #14 |
right, that's getting confusing though, what issue do you try to solve or what are you asking for there? gedit should already do the right thing when using a zh_CN locale and open files in gb encoding which is rated before utf for that locale
An Yang (euroford) wrote : | #15 |
Sebastien,
I just want to contribute to Qin-ubuntu project(a Chinese locale edition of ubuntu), but I do not know who is the right person should notice this problem, Martin Pitti or somebody else?
And of cause, this bug should influence on any other local editions of ubuntu, I hope the guy there would notice the problem.
An Yang (euroford) wrote : | #16 |
The default value of auto_detected is [UTF-8,
I think something is wrong in ubuntu, but I don't know who sould be involved.
Sebastien Bacher (seb128) wrote : | #17 |
is the issue specific to the liveCD or also on the installed system? What version of Ubuntu do you use?
An Yang (euroford) wrote : | #18 |
I tested it from lucid to natty, no matter CD or DVD edition, no matter x86 or x86_64 edition, all of them have this bug.
Sebastien Bacher (seb128) wrote : | #19 |
is the issue specific to liveCD sessions or also on the installed system?
An Yang (euroford) wrote : | #20 |
both of them.
ZhengPeng Hou (zhengpeng-hou) wrote : | #21 |
An Yang (euroford) wrote : | #22 |
Hi Hou,
I just tested the default setting in the gedit package, [CURRENT,
All of GB18030,
[UTF-8,
Your config maybe have some problem, did you test the case, if the file contents GB18030 characters which is not in GB2123?
I'm not sure.
Kyle Nitzsche (knitzsche) wrote : | #23 |
I just tested opening the gb180130.txt file in oneiric alpha3. Here are my findings:
For auto detect of a new encoding to work in gedit, one must do two things:
* Add the encoding in gedit (Open > Character Encodings > Add/Remove > add desired encoding)
- Note that after this step the value of Charect Encoding is still "Automatically Detected". This will not work to open the file yet.
* Set Character Encoding specifically to the encoding you added, and open the file
After this, Character Encoding of "Automaically Detect" works.
So, perhaps the fix is to change the Character Encoding widget to select the encoding that one just added (instead of remaining at "Automatically Detected") for this ONE open and then, perhaps, revert to "Automatically Detected".
Sebastien Bacher (seb128) wrote : | #24 |
Not sure how gedit3 is supposed to work, it seems the old gconf key which has the encoding order got deprecated
Changed in gedit (Ubuntu): | |
importance: | Undecided → High |
Changed in gedit (Ubuntu Oneiric): | |
importance: | Undecided → High |
Sebastien Bacher (seb128) wrote : | #25 |
Ok, in fact they are still there, could you run that on Oneiric with a Chinese installation:
gsettings get org.gnome.
Changed in gedit (Ubuntu Oneiric): | |
status: | New → Incomplete |
Sebastien Bacher (seb128) wrote : | #26 |
The key should be set to something similar to what was pointed before, i.e "[CURRENT,
tags: | added: qin |
Changed in gedit (Ubuntu): | |
status: | New → Incomplete |
Eric Miao (eric.y.miao) wrote : | #27 |
Well, I'd say an ideal solution would be for gedit to detect the encoding by itself, and thus avoid all these tricky configurations. I've experimented a bit with universalchardet, which comes with Mozilla project, and its separate library libuchardet. I found it to be smart enough in most cases. Attached is a preliminary patch I did to support gedit with uchardet, for preliminary early preview.
I'll come up with a testing package a bit later.
Eric Miao (eric.y.miao) wrote : | #28 |
I've uploaded testing packages to http://
Note it's for precise, and one needs to install libuchardet0 firstly.
$> sudo apt-get install libuchardet0
$> sudo dpkg --install gedit*~
The attachment "uchardet.diff" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-reviewers team please also unsubscribe the team from this bug report.
[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]
tags: | added: patch |
Ma Hsiao-chun (mahsiaochun) wrote : | #30 |
The link http://
Changed in gedit (Ubuntu): | |
status: | Incomplete → Confirmed |
Changed in gedit (Ubuntu Oneiric): | |
status: | Incomplete → Confirmed |
tags: | added: precise quantal raring |
Ma Hsiao-chun (mahsiaochun) wrote : | #31 |
This problem is partially worked around by not-so-recent translation change in upstream.
https:/
Such translation is included in 3.6.1 tarball already but Unfortunately Ubuntu 12.10, even claim to have Gedit 3.6.1, doesn't seem to get that from upstream.
Since this bug:
- Is valid.
- Is well described.
- Is reported in the upstream project.
- Is ready to be worked on by a developer.
It's already triaged.
Changed in gedit (Ubuntu): | |
status: | Confirmed → Triaged |
Oneiric reached EOL.
Changed in gedit (Ubuntu Oneiric): | |
status: | Confirmed → Won't Fix |
no longer affects: | gedit (Ubuntu Oneiric) |
My local settings:
LANG=zh_CN.UTF-8 zh_CN:en_ US:en "zh_CN. UTF-8" "zh_CN. UTF-8" "zh_CN. UTF-8" "zh_CN. UTF-8" "zh_CN. UTF-8" zh_CN.UTF- 8 "zh_CN. UTF-8" "zh_CN. UTF-8" "zh_CN. UTF-8" "zh_CN. UTF-8" "zh_CN. UTF-8" ON="zh_ CN.UTF- 8"
LANGUAGE=
LC_CTYPE=
LC_NUMERIC=
LC_TIME=
LC_COLLATE=
LC_MONETARY=
LC_MESSAGES=
LC_PAPER=
LC_NAME=
LC_ADDRESS=
LC_TELEPHONE=
LC_MEASUREMENT=
LC_IDENTIFICATI
LC_ALL=