Asian characters don't render in PDF reports

Bug #830831 reported by technick
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
SchoolTool
Triaged
High
Douglas Cerna
schooltool (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

PDF export of asian characters doesn't work correctly. This has been verified with Hangul (Korean), Japanese, and Chinese text.

The PDF seems to have the correct text data, but all asian characters render as square boxes in the PDF.
** You can copy the text out of the PDF and paste it in a text editor, so this looks more like a font issue.

See the attached screen-shot.

To reproduce:
1) Add three students with names 이름 (Korean), 名前 (Japanse), 名称 (Chinese).
2) Save a PDF report that includes these students (ex: create some absenses in a section for these students and view the absences for section report)

Revision history for this message
technick (nickfolse-gmail) wrote :
Revision history for this message
Douglas Cerna (replaceafill) wrote :

Correct, this is a font issue. We only embed Liberation* fonts in the PDF. To display other languages correctly we would need to embed the appropriate fonts based maybe on the request locale or the language setting. I couldn't find a dynamic way to do it while working with Khmer.

Revision history for this message
Tom Hoffman (tom-hoffman) wrote :

We'll have to come up with some kind of strategy for this, even if it is just creating separate language packs for reports with different character sets.

Revision history for this message
technick (nickfolse-gmail) wrote :

As a temporary workaround I added the following mapping to my app/pdf.py file

# pdf.py
font_map_kr = {'Arial_Normal': 'UnDotum.ttf',
            'Arial_Bold': 'UnDotumBold.ttf',
            'Arial_Italic': 'UnDotum.ttf',
            'Arial_Bold_Italic': 'UnDotumBold.ttf',
            'Times_New_Roman': 'UnBatang.ttf',
            'Times_New_Roman_Bold': 'UnBatangBold.ttf',
            'Times_New_Roman_Italic': 'UnBatang.ttf',
            'Times_New_Roman_Bold_Italic': 'UnBatangBold.ttf'}

font_map = font_map_kr

Then I updated school_tool.conf to point to the unfonts directory:
reportlab_fontdir /usr/share/fonts/truetype/unfonts

I believe these are the Korean fonts installed in Ununtu when you enable Korean language input.

Changed in schooltool:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Tom Hoffman (tom-hoffman) wrote :

We need a strategy for this. It is ok if it takes the user a few steps -- that's much better than just throwing up our hands.

Revision history for this message
Gediminas Paulauskas (menesis) wrote :

The font map (see Comment #4) could be moved to schooltool.conf

Also make PDFs use only one font, either serif or sans-serif, not both. Will make it simpler.

Changed in schooltool:
assignee: nobody → Gediminas Paulauskas (menesis)
status: Confirmed → Triaged
milestone: none → 2.1
Revision history for this message
Tom Hoffman (tom-hoffman) wrote :

OK... is that the plan? Objections?

Revision history for this message
Justas Sadzevičius (justas.sadzevicius) wrote :

There's still a minor problem with word-wrap. It needs to be set to CJK when using these fonts. Mixed case (CJK + non-CJK text) is not handled by default in reportlab, IIRC.

Revision history for this message
Tom Hoffman (tom-hoffman) wrote : Re: [Bug 830831] Re: Asian characters don't render in PDF reports

Set... where?

2011/11/29 Justas Sadzevičius <email address hidden>:
> There's still a minor problem with word-wrap.  It needs to be set to CJK
> when using these fonts.  Mixed case (CJK + non-CJK text) is not handled
> by default in reportlab, IIRC.
>
> --
> You received this bug notification because you are a member of
> SchoolTool Owners, which is subscribed to SchoolTool.
> https://bugs.launchpad.net/bugs/830831
>
> Title:
>  Asian characters don't render in PDF reports
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/schooltool/+bug/830831/+subscriptions
>

Revision history for this message
Justas Sadzevičius (justas.sadzevicius) wrote :

Somewhere in our code. Because of some setting we're yet to implement. Or maybe there's a workaround.

Changed in schooltool:
milestone: 2.1.0 → next
Revision history for this message
Lyle Kozloff (lkozloff) wrote :

Uninformed opinion follows - I haven't looked at the code at all.

Rather than exporting to PDF where we need to worry a lot about fonts, what about styling the reports with print-specific CSS? It's reasonable to assume that if someone is using the site in an Asian language (or other complex script) that it's already rendering properly in their browser.

It seems like it would be easier for a layman to style the reports and take care of the font-embedding problem.

Revision history for this message
Tom Hoffman (tom-hoffman) wrote :

Yes, that is certainly possible. In general we haven't emphasized the printable web page route because for more formal reports (like ones that are sent home) you really need to be able to do proper page layout, but there is also a place for easier/less formal tabular reports which might be good enough just with print CSS.

Changed in schooltool:
assignee: Gediminas Paulauskas (menesis) → Douglas Cerna (replaceafill)
Revision history for this message
Tom Hoffman (tom-hoffman) wrote :

Thanks for the suggestion, we'll take a look at weasyprint, especially since it is written in Python. The good thing is we wouldn't need do make an either/or decision. Perhaps we'll start making some new things with it, we wouldn't literally have to ditch ReportLab.

In practice it might be less of a clear win than you'd think, because for formal reports we'd also have to learn a lot about, say, the CSS Paged Media Module, which may end up being as complicated as ReportLab.

But yes, we'll check it out.

Revision history for this message
Tom Hoffman (tom-hoffman) wrote :

Actually, the biggest problem might be the lack of an Ubuntu package, which we'd have to make ourselves.

Revision history for this message
nedosa (nedosa) wrote :

Could it not be installed as python dependency in schooltool's setup.py ?

As for its usage, you're absolutely right, paging can be a pain, but I find working with CSS - a declarative language designed for layout - more amenable to the manual manipulation of page blocks as in Reportab.

Revision history for this message
Tom Hoffman (tom-hoffman) wrote :

*Python* packaging isn't a problem (eggs, setup.py), but we'd have to have Ubuntu packaging ourselves.

Revision history for this message
Tom Hoffman (tom-hoffman) wrote :

That is, we'd have to do the Ubuntu packaging ourselves.

Changed in schooltool:
status: Triaged → Opinion
description: updated
Changed in schooltool:
status: Opinion → Triaged
Changed in schooltool:
milestone: next → none
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in schooltool (Ubuntu):
status: New → Confirmed
Ross Gammon (rosco2)
affects: ubuntu → schooltool (Ubuntu)
Changed in schooltool (Ubuntu):
status: New → Confirmed
Revision history for this message
Daniel Owens (dh-owens) wrote :

This problem also affects Vietnamese. I solved it by following the above workaround and using DejaVu Sans. But upgrading Schooltool erases such modifications. Is there a way that substitute fonts that render correctly in PDF export could be added through the web interface?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.