Search does not work on PDFs with russian font

Bug #1328946 reported by Benjamin Eltzner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qpdfview
Incomplete
Medium
Adam Reichold

Bug Description

This is a copy of a bug reported in Debian:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=750648

As of now it is unclear, wheter the problem is generic. I have requested additional information from the reporter in Debian's BTS and will post any additional information here.

Revision history for this message
Adam Reichold (adamreichold) wrote :

Sorry for being too eager to triage this, I confused it with the other Debian bug report.

Changed in qpdfview:
status: New → Fix Committed
importance: Undecided → Medium
assignee: nobody → Adam Reichold (adamreichold)
milestone: none → 0.4.11
status: Fix Committed → New
milestone: 0.4.11 → none
Revision history for this message
Adam Reichold (adamreichold) wrote :

Hello again,

thanks for forwarding. Until further information is given is suspect problems with handling the encoding of the text in the document, maybe related to [1].

Best regards, Adam.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=36111

Changed in qpdfview:
status: New → Incomplete
Revision history for this message
Adam Reichold (adamreichold) wrote :

Hello again,

after having a look at the files "100144927890_20140531.pdf" and "ig-jupiter2.djvu", it is clear why one can not search them: Using "djvutext", one finds that the DjVu files does not contain any text layers and hence its text can neither be extracted nor searched. The PDF document does contain a text layer, however the encoding used seems to be problematic. Using "pdftotext" to extract the text yields garbage output without any intelligble russian characters. As the fonts are all embedded, this might be due to an non-standard encoding which Poppler currently can't identify or process.

From this, I'd say that the DjVu issue is related to the documents itself and the PDF issue should go upstream to the Poppler project?

Best regards, Adam.

P.S.: I also tried MuPDF (directly and via our Fitz plug-in) and both did yield the same garbled output.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.