okular word search in Hebrew (and other RTL languages) is opposite

Bug #524517 reported by Uri Shabtay on 2010-02-19
30
This bug affects 6 people
Affects Status Importance Assigned to Milestone
KDE Graphics
Unknown
Wishlist
kdegraphics (Ubuntu)
Low
Unassigned

Bug Description

Binary package hint: kdegraphics

hey all,

using Gnome & Ubuntu 9.10, i really prefer KDE's Okular above all PDF readers. the only issue which really disturbs me is the word search (that i found to be the best among ALL the pdf readers in Linux, IMO): it works fine, but when searching in a RTL language (such as Hebrew) you most write the word in the opposite letter direction so that your word will be found..

for exapmle, if i want to look for the word 'lesson', that in Hebrew is שיעור - i'll need to write it the other way around, like this: רועיש
now i know Hebrew isn't your strong side :P so in English, 'lesson' should look like this: 'nossel'

cheers

ProblemType: Bug
Architecture: i386
Date: Fri Feb 19 18:14:58 2010
DistroRelease: Ubuntu 9.10
ExecutablePath: /usr/bin/okular
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release i386 (20091028.5)
NonfreeKernelModules: wl
Package: okular 4:4.3.2-0ubuntu1
ProcEnviron:
 LANGUAGE=en_US.UTF-8
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-19.56-generic
SourcePackage: kdegraphics
Uname: Linux 2.6.31-19-generic i686

Version: (using KDE 4.3.1)
Installed from: Ubuntu Packages

When searching through Hebrew text, text is searched for backwards (probably due to the fact that visual Hebrew is used in PDF documents). If the text searched for is all Hebrew, then Okular could reverse the order of the letters when searching.

Can you attach a sample document showing the issue?

Created attachment 38661
Hebrew-language PDF document.

All Hebrew PDF documents display the issue. Here is one.

I can confirm this.

I don't think the fix is that complicated either.

I'm not really familiar with the libraries (and I'm a GNOME user), but a quick search reveals that KDE has reliable BiDi support since 3.0.1. In particular, I found this function:

QCString QHebrewCodec::fromUnicode ( const QString & uc, int & lenInOut ) const [virtual]

(from http://doc.trolltech.com/3.3/qhebrewcodec.html)

which I could guess, that if the search string piped through it before the search takes place, it would fix the problem.

(Although there might be a more suitable Qt bidi function which fixes Hebrew, Arabic and all other RTL languages in one go.)

Gadi

It seems that bug #128609 is for the same issue, but for KPDF instead of Okular. One of these bugs should be duped of the other. I will leave it to the devs to decide which. Thanks.

PDF's objective is to reflect the exact appearance of text. For Hebrew, it means that the glyphs are stored in visual order. If your PDF viewer accepts user input in logical order (which is the case in Windows and Linux), it should transform search arguments (captured from a user dialog) from logical to visual order before performing the search.
For Arabic, there is the additional issue that the glyphs represent letter shapes, and you must perform "shaping", in addition to reordering, on the search arguments to choose the proper glyphs for each Arabic letter.

Uri Shabtay (uri.shabtay) wrote :
tags: added: hebrew okular opposite rtl search text
affects: kdegraphics (Ubuntu) → okular (Ubuntu)
affects: okular (Ubuntu) → kdegraphics (Ubuntu)
Changed in kdegraphics (Ubuntu):
status: New → Triaged
importance: Undecided → Low
Changed in kdegraphics:
status: Unknown → Confirmed
Changed in kdegraphics:
importance: Unknown → Wishlist
Changed in kdegraphics:
status: Confirmed → Unknown

*** Bug 282849 has been marked as a duplicate of this bug. ***

*** Bug 331785 has been marked as a duplicate of this bug. ***

Here is a quick patch to fix this problem.
https://git.reviewboard.kde.org/r/125442/

Thanks

This bug needs retest against Poppler >= 0.40 because there of this:
https://bugs.freedesktop.org/show_bug.cgi?id=55977

Using poppler 0.42.0, typing hebrew put the search box in right-to-left but I must write the word in left to right (so backward) so that it matches.

Created attachment 100228
arabic text

you can search using this word: "بسم" in attached arabic text pdf
if you find it, it means it is fixed in upstream otherwise the problem in okular.

See my comment about hebrew: it didn't work because of the said reasons.

Elad Hen (eladhen2) wrote :

This bug seems to be fixed in poppler, as it doesn't exist in evince anymore. It's still present in Okular 0.25 from the Kubuntu Backports PPA

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.