Convert epub to pdf, pdf appearance looks correct, but some of the copied text is incorrect
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| calibre |
Undecided
|
Unassigned |
Bug Description
* The calibre version (get this by looking at the bottom of the main calibre screen)
I've tried both 4.6.0 and 4.7.0
* The operating system you are running calibre on (Windows, OS X, Linux)
I tried windows 10, but I think other Windows versions should have samme problems.
* My issue
For the ePub file with Chinese text converted to PDF, some text will be garbled. I suspect that there may be some problems in the processing of CMAP / CID by PDF. So I expect to add several parameters to the PDF output options for debugging the output problems. I would rather accept that the document is bigger than that the text is incorrect.
The following code is taken from "src\calibre\
if opts.pdf_
merge_
if opts.pdf_
num_removed = dedup_type3_
if num_removed:
if opts.pdf_
num_removed = remove_
if num_removed:
if opts.pdf_
num_removed = pdf_doc.
if num_removed:
* If you are reporting a conversion problem, attach the input file and the output file and describe exactly what the problem is.
On the left side of the attachment is the PDF reader, there is no problem with the appearance, on the right side is the text selected on page 4 and copied to the Notepad. All the text marked in red in the figure has problems.
Below is my command line and output:
D:\software\
Conversion options changed from defaults:
base_font_size: 14.0
pdf_serif_family: u'\u5fae\
pdf_sans_family: u'\u5fae\
1% 将输入转换为HTML中...
InputFormatPlugin: EPUB Input running
on D:\software\
Found HTML cover titlepage.xhtml
Parsing all content...
34% 正在对电子书进行转换...
Merging user specified metadata...
Detecting structure...
Flattening CSS and remapping font sizes...
Source base font size is 13.20000pt
Removing fake margins...
Cleaning up manifest...
Trimming unused files from manifest...
Creating PDF Output...
67% 正在运行 PDF Output 插件
D:\software\
The cover image has an id != "cover". Renaming to work around bug in Nook Color
68% Parsed all content for markup transformation
70% Completed markup transformation
90% Rendered all HTML as PDF
91% Added links to PDF content
100% Updated metadata in PDF
PDF output written to D:\software\
输出保存到 D:\software\
moka (mokacao) wrote : | #1 |
moka (mokacao) wrote : | #2 |
moka (mokacao) wrote : | #3 |
moka (mokacao) wrote : | #5 |
The attachment is the Microsoft YaHei font I used.
Kovid Goyal (kovid) wrote : | #6 |
That looks like a bug with whatever PDf viewing software you are using. The PDF you attached renders fine in both acrobat reader and okular on my system.
Changed in calibre: | |
status: | Incomplete → Invalid |
Kovid Goyal (kovid) wrote : | #7 |
This is the text as copied using okular on my system: 6月26日星期日 大风天 亲眼看 我男朋友 着他新欢的
手,在新光天地里 喷香水的那 刻,
Kovid Goyal (kovid) wrote : | #8 |
Never mind, I think I see the issue
Changed in calibre: | |
status: | Invalid → New |
Kovid Goyal (kovid) wrote : Fixed in master | #9 |
Fixed in branch master. The fix will be in the next release. calibre is usually released every alternate Friday.
status fixreleased
Changed in calibre: | |
status: | New → Fix Released |
moka (mokacao) wrote : | #10 |
Thank you for your quick response to my question.
I've verified it on branch master. The copied text is completely correct.
After that, I will do more tests on ePub documents. If there is any problem, I will continue to feed back here.
Thank you again.
Xavier Berger (xsiberger) wrote : | #11 |
I have a similar problem. I convert my kindle book to PDF to use it on my iPad with MargineNote for studying. The highlighted text is converted automatically in flash cards and a mind map but some letters are replaced with "?" (questions mark). The same happens when I copy/paste text from the converted PDF from SumatraPDF viewer to Notepad++. Letters are replaced with "?" in my case all "P" and "Q" are replaced with "?". Not sure if it is a viewer problem or something with the PDF. Acrobact Reader does not have a problem when I copy/paste text.
Not sure if I am missing a setting but I tried everything with embedding the fonts but I did not have any luck so far. It seems to be a little bit random as well. Depending on which font I embeded. Sometimes it is the letter "P", "J", or "V" which gets replaced with the "?".
Embed the fonts you are using in the epub file and attach that, so I can
reproduce.
status incomplete