PDF (from EPUB) has text layer different to what's displayed

Bug #1915485 reported by Sam Wilson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Invalid
Undecided
Unassigned

Bug Description

When converting an EPUB to PDF, the text that is displayed in the PDF differs from what can be selected and copied.

The EPUB contains `<h2>हिन्द स्वराज — हिन्द स्वराज</h2>` and when converted to PDF this is displayed correctly — but when selected and copied, it ends up as `ह द वराज — ह द वराज`.

Using ebook-convert (calibre 5.10.1) on Ubuntu.

Downstream bug (in Wikisource Export) is https://phabricator.wikimedia.org/T274560

Tags: pdf-output
Revision history for this message
Sam Wilson (samwilson.id.au) wrote :
  • EPUB Edit (2.8 KiB, application/octet-stream)
Revision history for this message
Sam Wilson (samwilson.id.au) wrote :
  • PDF Edit (20.4 KiB, application/pdf)
Revision history for this message
Kovid Goyal (kovid) wrote : Re: calibre bug 1915485

Sadly PDF generation is not in calibre's control. It is done by Qt
WebEngine (aka Chromium). Chromium recently switched to using harfbuzz
for font shaping instead of sfntly, that might be the cause for it You
can test it by using a version of calibre from before the change,
possibly 5.6 or 5.7.

 status invalid

Changed in calibre:
status: New → Invalid
Revision history for this message
Sam Wilson (samwilson.id.au) wrote :

Thanks for the quick reply.

That's unfortunate. I guess we can just wait till it (maybe) gets fixed upstream. There are other improvements from recent Calibre that we want to keep.

Revision history for this message
Sam Wilson (samwilson.id.au) wrote :

I've lodged an issue with Qt WebEngine: https://bugreports.qt.io/browse/QTBUG-91126

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.