okular word search in Hebrew (and other RTL languages) is opposite

Bug #524517 reported by Uri Shabtay
30
This bug affects 6 people
Affects Status Importance Assigned to Milestone
KDE Graphics
Unknown
Wishlist
kdegraphics (Ubuntu)
Triaged
Low
Unassigned

Bug Description

Binary package hint: kdegraphics

hey all,

using Gnome & Ubuntu 9.10, i really prefer KDE's Okular above all PDF readers. the only issue which really disturbs me is the word search (that i found to be the best among ALL the pdf readers in Linux, IMO): it works fine, but when searching in a RTL language (such as Hebrew) you most write the word in the opposite letter direction so that your word will be found..

for exapmle, if i want to look for the word 'lesson', that in Hebrew is שיעור - i'll need to write it the other way around, like this: רועיש
now i know Hebrew isn't your strong side :P so in English, 'lesson' should look like this: 'nossel'

cheers

ProblemType: Bug
Architecture: i386
Date: Fri Feb 19 18:14:58 2010
DistroRelease: Ubuntu 9.10
ExecutablePath: /usr/bin/okular
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release i386 (20091028.5)
NonfreeKernelModules: wl
Package: okular 4:4.3.2-0ubuntu1
ProcEnviron:
 LANGUAGE=en_US.UTF-8
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-19.56-generic
SourcePackage: kdegraphics
Uname: Linux 2.6.31-19-generic i686

Revision history for this message
In , Kde-2011-08 (kde-2011-08) wrote :

Version: (using KDE 4.3.1)
Installed from: Ubuntu Packages

When searching through Hebrew text, text is searched for backwards (probably due to the fact that visual Hebrew is used in PDF documents). If the text searched for is all Hebrew, then Okular could reverse the order of the letters when searching.

Revision history for this message
In , Pino Toscano (pinotree) wrote :

Can you attach a sample document showing the issue?

Revision history for this message
In , Kde-2011-08 (kde-2011-08) wrote :

Created attachment 38661
Hebrew-language PDF document.

All Hebrew PDF documents display the issue. Here is one.

Revision history for this message
In , Gadi Cohen (gadicc) wrote :

I can confirm this.

I don't think the fix is that complicated either.

I'm not really familiar with the libraries (and I'm a GNOME user), but a quick search reveals that KDE has reliable BiDi support since 3.0.1. In particular, I found this function:

QCString QHebrewCodec::fromUnicode ( const QString & uc, int & lenInOut ) const [virtual]

(from http://doc.trolltech.com/3.3/qhebrewcodec.html)

which I could guess, that if the search string piped through it before the search takes place, it would fix the problem.

(Although there might be a more suitable Qt bidi function which fixes Hebrew, Arabic and all other RTL languages in one go.)

Gadi

Revision history for this message
In , Kde-2011-08 (kde-2011-08) wrote :

It seems that bug #128609 is for the same issue, but for KPDF instead of Okular. One of these bugs should be duped of the other. I will leave it to the devs to decide which. Thanks.

Revision history for this message
In , Matitiahu-allouche (matitiahu-allouche) wrote :

PDF's objective is to reflect the exact appearance of text. For Hebrew, it means that the glyphs are stored in visual order. If your PDF viewer accepts user input in logical order (which is the case in Windows and Linux), it should transform search arguments (captured from a user dialog) from logical to visual order before performing the search.
For Arabic, there is the additional issue that the glyphs represent letter shapes, and you must perform "shaping", in addition to reordering, on the search arguments to choose the proper glyphs for each Arabic letter.

Revision history for this message
Uri Shabtay (uri.shabtay) wrote :
tags: added: hebrew okular opposite rtl search text
affects: kdegraphics (Ubuntu) → okular (Ubuntu)
affects: okular (Ubuntu) → kdegraphics (Ubuntu)
Changed in kdegraphics (Ubuntu):
status: New → Triaged
importance: Undecided → Low
Changed in kdegraphics:
status: Unknown → Confirmed
Changed in kdegraphics:
importance: Unknown → Wishlist
Revision history for this message
In , Ohadcn (ohadcn) wrote :
Changed in kdegraphics:
status: Confirmed → Unknown
Revision history for this message
In , Albert Astals Cid (aacid) wrote :

*** Bug 282849 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Albert Astals Cid (aacid) wrote :

*** Bug 331785 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Fahad (fahad-alsaidi) wrote :

Here is a quick patch to fix this problem.
https://git.reviewboard.kde.org/r/125442/

Thanks

Revision history for this message
In , Fahad (fahad-alsaidi) wrote :

This bug needs retest against Poppler >= 0.40 because there of this:
https://bugs.freedesktop.org/show_bug.cgi?id=55977

Revision history for this message
In , Olivier-w (olivier-w) wrote :

Using poppler 0.42.0, typing hebrew put the search box in right-to-left but I must write the word in left to right (so backward) so that it matches.

Revision history for this message
In , Fahad (fahad-alsaidi) wrote :

Created attachment 100228
arabic text

Revision history for this message
In , Fahad (fahad-alsaidi) wrote :

you can search using this word: "بسم" in attached arabic text pdf
if you find it, it means it is fixed in upstream otherwise the problem in okular.

Revision history for this message
In , Olivier-w (olivier-w) wrote :

See my comment about hebrew: it didn't work because of the said reasons.

Revision history for this message
In , Elad Hen (eladhen2) wrote :

This bug is still present in Mint Cinnamon 18 (and presumably in all of the Ubuntu 16.04 family). It should be noted that the similar bug in Evince, Atril and some others, that stemmed from Poppler, are fixed as of Ubuntu 16.04/ Mint 18.

Revision history for this message
Elad Hen (eladhen2) wrote :

This bug seems to be fixed in poppler, as it doesn't exist in evince anymore. It's still present in Okular 0.25 from the Kubuntu Backports PPA

Revision history for this message
In , Cfeck (cfeck) wrote :

*** Bug 386468 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Fahad (fahad-alsaidi) wrote :

this bug also effect the copying the RTL text. the copied text is reversed.

Revision history for this message
In , Nate-b (nate-b) wrote :

Fahad submitted a patch for this, which I've migrated to Phabricator:

https://phabricator.kde.org/D10298

Revision history for this message
In , Fahad (fahad-alsaidi) wrote :

I think the problem form QT interface for poppler. please see this bug

https://bugs.freedesktop.org/show_bug.cgi?id=105015

Revision history for this message
In , Fahad (fahad-alsaidi) wrote :

I think I've found where is the problem. It is from TextPagePrivate::correctTextOrder(), it sorts words & characters to be LTR using theses compareTinyTextEntityY & compareTinyTextEntityX.

This approach doesn't fit with RTL text.

Revision history for this message
In , Fahad (fahad-alsaidi) wrote :

I proposed another patch to fix this bug, here

https://phabricator.kde.org/D10455

Revision history for this message
In , David-hurka (david-hurka) wrote :

*** Bug 429869 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Mh-firouzjah (mh-firouzjah) wrote :

same problem for another rtl language Persian.

Linux/KDE Plasma: 5.15.38-1-Manjaro(64-bit)
(available in About System)
KDE Plasma Version: 5.24.5
KDE Frameworks Version: 5.93
Qt Version: 5.15.3

Revision history for this message
In , David-hurka (david-hurka) wrote :

*** Bug 442046 has been marked as a duplicate of this bug. ***

Revision history for this message
In , David-hurka (david-hurka) wrote :

*** Bug 457448 has been marked as a duplicate of this bug. ***

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.