Ubuntu

In some Japanese PDF files(in evince) could not display Kanji characters

Reported by Fumihito YOSHIDA on 2011-04-13
32
This bug affects 3 people
Affects Status Importance Assigned to Milestone
language-selector (Ubuntu)
Undecided
Gunnar Hjalmarsson

Bug Description

Binary package hint: language-selector

In some Japanese PDF files(font is not embedded) could not read with evince.
So, Japanese user could not read many pdf files at right out of the box.
------------------------------------------------------------------
How reproduce:
 1) install natty
 2) login and set "Japanese" :
  system settings -> Language Settings -> Install/Remove Language -> add "Japanese"
 3) install poppler-data(just in case, it is automatic applied in Natty)
 # apt-get install popller-data
 4) Open attached PDF file with evince.

Actual result:
- PDF characters are not readable, Kanji chars become squares.
- see https://launchpadlibrarian.net/69251554/evince-actual.png

Expected result (pathced):
- PDF characters are readable.
- see https://launchpadlibrarian.net/69251615/evince-expected.png

------------------------------------------------------------------
Solution:

We(me, Ikuya AWASHIRO, Mitsuya Shibata, Takashi Sakamoto) test with these behavior.
In conclusion, it came from /etc/fonts/conf.avail/69-language-selector-ja-jp.conf .

That file bind at top with "DejaVu Serif" / "DejaVu Sans", evince could not properly handle the these designation.

However, this bug *not* evince's. IMHO, in a precise sense, that cause evinces bad behavir. evince could not handle "font-linked" settings,
but other(many) application could use this.

In ideal behavior, evince use "DejaVu Serif" / "DejaVu Sans" bindings handle the correctly, but that is not easy.

I approach the patch. It is workaround for that bad behavior.

Fumihito YOSHIDA (hito) wrote :
Fumihito YOSHIDA (hito) wrote :
Fumihito YOSHIDA (hito) wrote :
description: updated
Fumihito YOSHIDA (hito) on 2011-04-13
description: updated
tags: added: patch
Gunnar Hjalmarsson (gunnarhj) wrote :

Thanks for your effort to make Ubuntu better by reporting this issue and propose a solution.

Could you please let us know how the attached patch affects rendering of English and other Latin characters. Have the four of you discussed possible regression and, if you have, what's your views on pros and cons?

I know basically nothing about displaying of CJK characters, but I have noted that the equivalent issue for traditional Chinese is being discussed in considerable detail, e.g. in bug 713950 and bug 659280. Some argue well against the kind of workaround you suggest, meaning that it's poppler-data and buggy applications that ought to be fixed. I recommend that you review those discussions, unless you already have.

I'm not saying that this aspect of Japanese and traditional Chinese necessarily should be handled in the same way, but it's an adequate comparison, don't you think? https://launchpad.net/~ubuntu-cjk-testers might be an appropriate forum for coordinating issues as the one in this bug report, but I don't know how well that team works.

Looking forward to further comments; setting the status "Incomplete" in the meantime.

Changed in language-selector (Ubuntu):
assignee: nobody → Gunnar Hjalmarsson (gunnarhj)
status: New → Incomplete
Fumihito YOSHIDA (hito) wrote :

@Gunnar ,

Thank you for your pointing. Okay, we will clean up their points (information
 about regression / pros and cons / and so on) in a few days.

Fumihito YOSHIDA (hito) wrote :

I'm sorry to be late. I put together a our view.

First, we analyze evince's source code, but that has huge complexity,
that needs major surgery.

If we erase DejaVu from fontconfig lists, that caused:

- It will cause a some change, Latin-1 characters will be render
  using "Takao" fonts.
- "Takao" include Latin-1 glyphs that has enough legibility.
- DejaVu is *very good* for Latin-1, but Takao is *good* too.
- We had not encountered any regression, but we can not negate
  unknown side-effects.

Our pros/cons lists are below:

fontconfig approach:
(pros) : Fix the evince's problem.
(pros) : And fix the any other applications's problems (such as old Adobe Flash[1]).
(pros) : Fix also fontconfig's "mixed" behavior, see attached "gedit-mixed-sample.png" [2]
(cons) : Latin-1 characters render with "Takao" fonts. ; its trivial changes.

Evince fixing approach:
(pros) : straight eye for solving.
(pros) : Fix the evince's problem.
(cons) : Need major surgery.

[1] Adobe Flash ( < 10.2 ) has same bug, that cannot display Kanji characters in current fontconfig.

[2] When I input "a0" on gedit, these are shown with DejaVu.
However, I input "00", these are with Takao. It follows that DejaVu
Latin characters and Takao Latin characters are mixed in one document.
Yes, this is an another bug, but i dont know that is filed(may be not yet).

------------------------------------------------------------
And, we consult jkbys(He is Japanese LoCo leader && cjk-testers admin),
he has the same opinion.
# cjk-testers has not brisk exchange, but he is expert of such problems.

In conclusion,
- The workaround(by fontconfig) has enough/many pros, and small cons.
- Natty will be releases at soon, now in a last phase.
- We can not prognosis the impact of this changes (we could not test
 *all* cases of ubuntu applications), we must prevent from unpredictable
  side effects at releng.
- If we provide some statement in release notes(in Japanese), user
  can avoid this root problem.

------------------------------------------------------------
So, in our opinion, its too late for Natty.
Natty is in Release Quality Iteration, that patch are too risky.
We shoud provide only "how to" at release notes, its provide
enough environment for Japanese PDFs.

On the other hand, In Oneiric, we believe that fontconfig patching
is acceptable, that have validity for development phase, isn't it?

Changed in language-selector (Ubuntu):
status: Incomplete → New
Fumihito YOSHIDA (hito) wrote :
Fumihito YOSHIDA (hito) wrote :
Gunnar Hjalmarsson (gunnarhj) wrote :

Thanks for the additional input, Fumihito. Four days ago you said "in a few days", so you aren't late. :)

I will include your suggestion in a merge proposal soon after the Natty release. Given that it's approved we'll then get possible reactions from a number of developers, at least, before the release of Oneiric.

I have one more question: If I understand the discussion at bug 713950 correctly, they suggest that Dejavu is moved downwards in the priority order rather than dropped completely. Would it be any advantages with doing so in the Japanese fontconfig file?

Changed in language-selector (Ubuntu):
status: New → In Progress
Nobuto MURATA (nobuto) wrote :

Indeed Evince(maybe poppler?) has a regression problem over a period of time including Natty Alpha2.
Current daily-live 20110418 with poppler-data and ttf-takao-{mincho,gothic} installed shows non-embedded characters specified as Ryumin(Japanese Serif: Mincho font) properly.
However my upgraded system still does not show their characters. I can't find where the difference come from. Any ideas?

BTW, non-embedded characters specified as GothicBBB(Japanese Sans-serif: Gothic font) are not showed even if in current daily-live. But this is not regression. Evince in Maverick does not show their characters either. It seems to be a bug of parsing the font name. Attached patch is a workaround of this issue.

Although I think it's a severe issue not to show Ryumin and GothicBBB, I don't think it's needed to deal with "Times New Roman" which does not have Japanese glyphs like attached on #8. I think removing DejaVu lines is a overreacted workaround, because I think it's a problem of PDF itself.

Fumihito YOSHIDA (hito) wrote :

@Nobuto

> It seems to be a bug of parsing the font name.

I want to clarify the this indication. Could you please check your analysis?

(1)
Evince (in File -> Property) 's property show the "Times New Roman" (like https://wiki.ubuntulinux.jp/Develop/Natty/Evince?action=AttachFile&do=get&target=evince2-mocchi.png) , that is *only* property-dialog issue. In rendering, evince has same logical bug(see 2), but that is not same code.

(2)
Some PDF (e.g. #8 attached) files have SHIFT_JIS encoded font-name, but that is correct in PDF format. Mitsuya Shibata and me think that is PDF-reader side bug, not PDF itself.

# So, let's cry with us, "PDF specification is too complex for me!":)

Fumihito YOSHIDA (hito) wrote :

@Gunnar

> Four days ago you said "in a few days", so you aren't late. :)

Ops, it's gone right out of my head :)

> I have one more question: If I understand the discussion at bug 713950 correctly, they suggest that Dejavu is moved downwards in the priority order rather than dropped completely.

Yes, they decide a case in natty to downward instead of dropping.

> Would it be any advantages with doing so in the Japanese fontconfig file?

I cannot see a clear difference of these settings. In our decision that has simply reason, "monospace" (in Ja-configs) has not "DejaVu" entries, so we decided to drop from "sans", "sans-serief" too.

Gunnar Hjalmarsson (gunnarhj) wrote :

Ok, then let's apply your patch as is (after the Natty release).

Thanks!

Nobuto MURATA (nobuto) wrote :

> @Nobuto
>
>> It seems to be a bug of parsing the font name.
>
> I want to clarify the this indication. Could you please check your analysis?
>
> (1)
> Evince (in File -> Property) 's property show the "Times New Roman" (like https://wiki.ubuntulinux.jp/Develop/Natty/Evince?action=AttachFile&do=get&target=evince2-mocchi.png) , that is *only* property-dialog issue. In rendering, evince has same logical bug(see 2), but that is not same code.
>
> (2)
> Some PDF (e.g. #8 attached) files have SHIFT_JIS encoded font-name, but that is correct in PDF format. Mitsuya Shibata and me think that is PDF-reader side bug, not PDF itself.

Ah, I might have misunderstood this report. I thought this report handles early-Natty regression that characters like Ryumin which can be shown on Maverick are shown as squares on Natty.

And now my question is,
Does this report handles the problem that when SHIFT_JIS encoded font-names are specified, evince(poppler) cannot show the characters?
You have not mention SHIFT_JIS until comment #11, I'm confused.

BTW, I mentioned parsing problem as that even if GothicBBB is shown in properties window properly, but characters which are specified as GothicBBB shown as squares. I have reported this issue as Bug #769827.

> # So, let's cry with us, "PDF specification is too complex for me!":)

Just for my curiosity, supposing that SHIFT_JIS encoded font-names complies PDF specifications, what software uses SHIFT_JIS encoded font-names as far as you know?
I'd like to know how many PDF files are affected.

Just for your information, "poppler should fall back font properly" issue seems to be reported in upstream as poppler.
https://bugs.freedesktop.org/show_bug.cgi?id=36474

Fumihito YOSHIDA (hito) wrote :

@Nobuto

> Does this report handles the problem that when SHIFT_JIS encoded font-names are specified, evince(poppler) cannot show the characters?

No, its seems misunderstanding too. This bug covers below "troble":

- Workaround/mitigation/fixing for Evince's "square character bug" (yes, strictly speaking, its libpoppler's bug).
- That caused by fontconfig settings. Evince(libpoppler) could not handles properly "secondoly" fonts.
- It has same root problem of https://bugs.freedesktop.org/show_bug.cgi?id=36474 . But another approach(workarounding).

This "trouble" caused by these situations(In PDF property):

a) PDF designate "non-existent fonts" in your sytstem.
b) PDF designate "Shift_JIS(amd other encodings) encoded FontName". That makes corruption in FontName strings, that cause a) as a result(Broken FontName could not match system's font lists).

And, If you reproduce this "trouble", you could see these two behavior:

1) Evince could not show characters ("square character bug"), its evince/libpoppler bug.
2) Evince shows the FontName as "Times New Roman". This is evince's bug.

The key factor of that, these are brother, but not same bug.
- "Situation a" cause 1.
- "Situation b" cause 1 and 2.

So, I'll report 2) as a yet another bug.

Fumihito YOSHIDA (hito) wrote :

@Nobuto

> Just for my curiosity, supposing that SHIFT_JIS encoded font-names complies PDF specifications, what software uses SHIFT_JIS encoded font-names as far as you know?
> I'd like to know how many PDF files are affected.

At least,
- Macromedia(Adobe) FlashPaper
- PDF Edit ( http://japan.nuance.com/pdf/edit2/ )
- JUST PDF ( http://www.justsystems.com/jp/products/justpdf/ )

But, I don't know how many PDFs inhabit the earth. Probably *bellyful*.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package language-selector - 0.36

---------------
language-selector (0.36) oneiric; urgency=low

  [ Gunnar Hjalmarsson ]
  * LanguageSelector/gtk/GtkLanguageSelector.py:
    Hack for Vietnamese users: Make the language names in the list over
    installable languages be displayed in respective native language when
    the current language is Vietnamese (LP: #783090).
  * fontconfig/69-language-selector-ja-jp.conf:
    Drop DejaVu (LP: #759882).

  [ Martin Pitt ]
  * data/pkg_depends: Install firefox-locale-*.
 -- Gunnar Hjalmarsson <email address hidden> Mon, 23 May 2011 14:45:22 +0200

Changed in language-selector (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.