Incorrect Chinese Characters in PDF TOC converted from EPUB

Bug #1433848 reported by jenjou hung on 2015-03-19
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

calibre version: 2.21.0
operating system: tested on windows 7

When converting chinese EPUB file to PDF format, some Chinese characters in the TOC of result PDF are not correctly converted.

Here we made two screen shots (Problem1.png and Problem2.png) to demostrate the problem. In both pictires, the left hand side is the TOC tree in PDF file, and the right hand side is from the original EPUB file. The parts that we marked with red box is problmatic output.

In the first case (T49n2035.epub->T49n2035.pdf, 八教對會五時圖), 「對」 (U+5C0D) is incorrectly coverted to「尊」(U+5C0A).
In the second case (C077n1710.epub->C077n1710.pdf, 冬不人事頌一首示眾云), 「不」 (U+4E0D) is incorrectly coverted to「上」(U+4E0A).

It only happens in TOC section.

It seems that the problmes share the same pattern: coverting charcater ending with "0A", and the results characters becomes ending with "0D". We have a guess that it possibilty related to the "line break" characters in windows(0D , 0A) and in Linux system(0A), perhaps the program replaces 0D to 0A for dealing with the cross-platform issue.

We also observe that, the same pdf disaplays *correctly* in Ubuntu system.

jenjou hung (jenjou-hung) wrote :

Fixed in branch master. The fix will be in the next release. calibre is usually released every Friday.

 status fixreleased

Changed in calibre:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments