Searching Hebrew text works in reverse order

Bug #127732 reported by Haggai Eran on 2007-07-23
60
This bug affects 10 people
Affects Status Importance Assigned to Milestone
Poppler
Unknown
High
poppler (Ubuntu)
Medium
Ubuntu Desktop Bugs

Bug Description

Hi,

There is problem searching Hebrew text in pdf files in evince. The text I write in the find box is being found in reverse (perhaps in visual order), and not in the right order of characters. The searching works in Adobe Acrobat, so I'm guessing its not a problem in the pdf file itself.

I'll attach an example pdf.

Steps to reproduce:
1. Open the attached pdf.
2. Search for the word "ידיעון" (The letter "י" is the first in logical order). The text is not found in the file.
2. Search for the reversed word "ןועידי". (The letter "ן" is first in logical order). The text is found in reverse in the file.

Regards,
Haggai Eran

= Additional Comments from http://bugzilla.gnome.org/show_bug.cgi?id=300536 =

Reporter: <email address hidden> (Roee)

Please describe the problem:
When using the search function in an hebrew document the typing of the searched
text should be entered in backwards in order to make a search in the documnet.

Steps to reproduce:
1. Open a document containing hebrew text.
2. Try to search for a word

Actual results:
The only posible way to search is when the typing is done backwards

Expected results:
The search should find words in their correct typing

Does this happen every time?
yes

attachment: http://bugzilla.gnome.org/attachment.cgi?id=45225&action=view

GNOME bug http://bugzilla.gnome.org/show_bug.cgi?id=313230 is a duplicate of
this one.

I just wish to add that reversing the word when a RTL script is entered is not
enough! Poppler should implement the Unicode BiDi algorithm to support search
strings which contain both LTR and RTL scripts. There is an implementation named
fribidi you can use.

Haggai Eran (haggai-eran) wrote :

Hi,

There is problem searching Hebrew text in pdf files in evince. The text I write in the find box is being found in reverse (perhaps in visual order), and not in the right order of characters. The searching works in Adobe Acrobat, so I'm guessing its not a problem in the pdf file itself.

I'll attach an example pdf.

Steps to reproduce:
1. Open the attached pdf.
2. Search for the word "ידיעון" (The letter "י" is the first in logical order). The text is not found in the file.
2. Search for the reversed word "ןועידי". (The letter "ן" is first in logical order). The text is found in reverse in the file.

Regards,
Haggai Eran

Haggai Eran (haggai-eran) wrote :
Haggai Eran (haggai-eran) wrote :

One more thing:
I think this bug was also reported at:
https://bugs.freedesktop.org/show_bug.cgi?id=2981

Haggai

Sebastien Bacher (seb128) wrote :

Thank you for your bug. It looks like this poppler issue, opened an upstream task for it

Changed in evince:
assignee: nobody → desktop-bugs
importance: Undecided → Medium
status: New → Triaged
Changed in poppler:
status: Unknown → Confirmed

> As I know, no viewer support RtL scripts yet.

Adobe Acrobat Reader supports searching and copying Arabic text perfectly.

> There is an implementation named fribidi you can use.

Freebidi works very well, and and it's a Freedesktop project:

http://fribidi.freedesktop.org/wiki/

Why not use it in Poppler?

Note that FriBidi converts from logical (input text) order to visual (glyph) order. The problem in poppler is reverse-bidi. That is, going back from the visual order as found in the PDF to the logical text order. Poppler does an ok job at that. It sure can be improved, but fribidi is no magic bullet here. I wrote about this a bit here:

  http://lists.cairographics.org/archives/cairo/2007-September/011427.html

search for reverse-bidi.

An alternative would be to make poppler use fribidi to find the visual order for the search text, then match that against the visual order of extracted text. But that's against the current code and does not yield much immediate benefits.

he only thing that I can think of that can improve poppler's behavior by using fribidi is mirroring characters like brackets when found around RTL text. That's all for now.

> The problem in poppler is reverse-bidi. That is, going back from
> the visual order as found in the PDF to the logical text order.
> Poppler does an ok job at that.

If I want to search for the string
لقد
in a PDF using evince, I have to type it backwards in the search field:
دقل

This is completely broken behaviour. How can it be considered doing "an OK job"?

Ah I thought it's changed in the mean time.

Poopler hackers, how does the search work? I thought the search word is matched against the text extracted using the text device? If it's done that way it should work fairly ok.

Asaf Schoen (sap) wrote :

confirmed on evince 2.24.0 on ubuntu 8.10 beta

https://bugs.launchpad.net/ubuntu/+source/evince/+bug/240398

tags: added: arabic backward evince hebrew pdf rtl search

is their any plaining for fix? it's really a big problem for RTL language user with evince

please help with that, some one is offering a solution here I think:
http://<email address hidden>/msg01819.html

Changed in poppler:
importance: Unknown → Medium

Have you tried Foxit for Linux?
I confirm that the search works in Google Docs.

Oren_B (oren.barnea) wrote :

Foxit is not free software.
I can't see how Google Docs is relevant here...

Adobe reader neither...

Just setting an example, apparently Okular suffers from the exact same
issue...

Changed in poppler:
importance: Medium → Unknown
Changed in poppler:
importance: Unknown → Medium
Uri Shabtay (uri.shabtay) wrote :

i can confirm (surprisingly) that this issue is SOLVED. I tried searching Hebrew text in the PDF above, and it worked flawlessly.

Uri Shabtay (uri.shabtay) wrote :

after a TRUE test, i realized the following:

Adobe Reader version 9.4.2 (Feb. 11, 2011) can recognize the Hebrew text within the search as is, without typing it backwards.
Evince, however, cannot. Therefore we're back to square one.

Changed in poppler:
importance: Medium → High

hello friends,

i've just posted a patch to implement visual to logical text conversion, that migh become a step towards this problem solution.

see https://bugs.freedesktop.org/show_bug.cgi?id=55977

best regards,
alex

Created attachment 68861
find bidirectional text

a small workaround for searching rtl text. limited for mixed directional text.

say ABC 123 will render by fribidi as 123 ABC.
to search for this text in poppler, you'd need to search literally 123 CBA before this patch.
with this patch, search for ABC 123 as entered. nice.
but if you only search for ABC 12, nothing would be found. that's because this patch transforms the searched text from logical to visual before the actual search in the visual text inside poppler, so ABC 12 would render to 12 CBA, that's not there.

there's a better way to go, which i'll implement later. this would also help with bidi text select and copy.

this patch will only work if you first apply my last patch to bug 55977. you also need fribidi or preferably icu.

please enjoy.
alex

Adding the depends for the current patch dependency, the bug itself is not dependent but current solution by alex is.

the last fix for #55977 will be enough to fix this bug too.
again, it's a partial solution for mixed direction text.

Uri Shabtay (uri.shabtay) wrote :

whether you're using Firefox or Chrome - opening this .pdf file within the browser shows that searching Hebrew works flawlessly:
https://launchpadlibrarian.net/15351076/PDF%20for%20Bug%20240398.pdf

why can't the solution be patched to evince? we're in 2014

seriously, people. we're in 2014.

open this .pdf file in your browser - such as Firefox or Chrome -

https://launchpadlibrarian.net/15351076/PDF%20for%20Bug%20240398.pdf

and behold - search functions works flawlessly.

why can't these be patched to Evince/Poppler?

Because there's no patch for it. alex has proven he can't provide a valid patch.

I attached two patches to bug 55977 that should handle searching RTL text to a reasonable level, I wonder if this bug is more appropriate bug for those two patches?

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/poppler/poppler/issues/274.

Changed in poppler:
status: Confirmed → Unknown
Yaron (sh-yaron) wrote :
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.