qpdfview

Offer a pure Find Next/Prev option

Bug #2065501 reported by Brian Ejike on 2024-05-11

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	qpdfview	Opinion	Undecided	Unassigned

Bug Description

I'm using qpdfview 0.4.18 via Ubuntu 22.04 Snap.

I often work with large PDFs (thousands of pages) and searching for keywords in qpdfview feels slow, e.g. over 20s for a 3000+ page document.

It seems qpdfview tries to search for ALL keyword occurrences and cache their locations, while simultaneously servicing my "Find Next" clicks. Which should normally be a okay strategy, but it doesn't feel like it.

When starting a new search, it even takes a few seconds to begin -- i.e. to forget the previous keyword, and begin searching for the new keyword. It is only after this delay that the very-long progress bar begins to progress. Let alone the time taken to search for that keyword across the document. The overall process just feels slow and unresponsive as a result.

You may want to contrast with Sumatra PDF, which simply searches for the next occurrence with lightning speed, even on Wine on Ubuntu. As the search progresses with each Find Next click, Sumatra caches the location of each result, until the search wraps. In this way, it never does more work than absolutely needed, and feels much faster as a result.

So, it would be great to have some config/setting so a user can specify the sort of search behaviour they want:

- A simple Find Next/Prev that just searches for the next occurrence and caches its location
- Or the default/current behaviour, where qpdfview searches for all occurrences

Tags:

Revision history for this message

Adam Reichold (adamreichold) wrote on 2024-05-12:

Did you try disabling the "Advanced search dock" on the "Interface" tab of the settings dialog which should already disable some of that overhead (not sure whether it already actually existed in 0.4.18 though).

Other than that, searching for all occurrences on a page is significantly faster in almost all documents due to the relatively high overhead the underlying libraries have for turning a given page into text. In current versions, all of this happens in the background and optionally in parallel which further reduces the time until results are ready. We even started out with the simpler implementation which tracks only a single current result and it was significantly more janky as searching for the next occurrence on a page is basically as fast/slow as searching for all occurrences on page with Poppler.

So in summary, while it may not be ideal for 3000+ page documents, I think this is still the best compromise given the libraries this program depends on.

Changed in qpdfview:
status:	New → Opinion

Revision history for this message

Brian Ejike (bejike) wrote on 2024-05-12:

> "Did you try disabling the "Advanced search dock" on the "Interface" tab of the settings dialog "

What I found is "Extended search dock", which was already disabled by default. I briefly enabled it to see if it would change anything, but it didn't.

> "So in summary, while it may not be ideal for 3000+ page documents, I think this is still the best compromise given the libraries this program depends on."

I understand this, but it seems that in the process, responsiveness has been sacrificed. I'm hoping this compromise can become unnecessary if a config option can be made available.

> "optionally in parallel which further reduces the time"

I tried enabling the "Parallel search execution" (if that's what you mean) and this did improve the overall speed considerably, more than halving the search duration. However, the initial latency for new searches is still unchanged (at least 2s).

Also, when this option is enabled, the screen goes almost entirely blank sometime during the search, except for the moving blocks of yellow highlighting each "Find Next" result (as I click) -- so I basically have to wait till the search completes, before I can usefully traverse the results.

Revision history for this message

Brian Ejike (bejike) wrote on 2024-05-12:

qpdfv_search_latency.webm Edit (5.6 MiB, video/webm)

Doing some more testing now, I see that my main problem is probably that small but significant delay/latency between requesting a new search and qpdfview actually beginning that search (or seeming to, anyways).

During those few seconds, it searches for the *previous* string whenever I click Find Next, before it belatedly realizes there's a new string that needs finding. I've attached a screen recording, if it helps. Also, here's my test document, if you'd like to try it out: https://www.bluetooth.org/DocMan/handlers/DownloadDoc.ashx?doc_id=521059

This is mostly speculation, but I suspect the reason for the latency has to do with the default SearchAsYouType behaviour. Given how expensive you've said the search procedure is, then to avoid wasting time/effort on unwanted searches, the app has to wait for the user to stop typing first i.e. a reasonable gap between key presses, of said >2s.

The problem is, this means it also ignores *explicit* "Enter" requests to begin searching immediately, stubbornly deciding to wait for that timeout to elapse before it begins the new search. You can see this happen each time in the video, as I search for each string: "encrypt", "indication", "isoc".

So if this is at all correct, I guess the request would be for qpdfview to prioritize "Find Next" inputs to begin a new search immediately, and only rely on the timeout as a fallback. This should hopefully eliminate the latency, perceptibly at least.

Revision history for this message

Adam Reichold (adamreichold) wrote on 2024-05-12:

This is hard to discuss against a somewhat old version running against a somewhat old version of Poppler.

Trying it out here, hitting enter always immediately cancels the old search and starts searching for the current term (as it should be). The timer will trigger a new search if the term has changed eventually, but you can always override it by pressing enter and the program will not wait until the timer is elapsed.

There is one unavoidable cause of delay and that is the search happening on one (or more) background threads: Before the search for the new term can being, the old one must be cancelled, i.e. there can never be two searches in progress at the same time. (There could be in principle, but they will just fit over locks inside Poppler.) This cancellation can only happen at page granularity due to the high overhead of setting up search for a given page.

But testing this here using the current version of qpdfview and Poppler version 24.03 with your document, this delay is barely noticeable, definitely subsecond, more likely around 100ms to 500ms.

Revision history for this message

Adam Reichold (adamreichold) wrote on 2024-05-12:

By the way, it will not make Poppler any faster, but you can at least try the current version of qpdfview via the PPA at https://code.launchpad.net/~adamreichold/+archive/ubuntu/qpdfview-dailydeb

For an newer Poppler, you probably need to upgrade to Ubuntu 24.04 though.

Revision history for this message

Brian Ejike (bejike) wrote on 2024-05-12 (last edit on 2024-05-12):

Sorry, yes, awkwardly looked through the code afterwards, and it seems to already have logic to begin a new search as soon as one hits Enter.

Then I remembered I'd changed the Find Next/Prev shortcuts to "Enter" and "Shift+Enter" shortly after installation. I reverted them to the original shortcuts (Ctrl+G etc.) and that latency is no longer noticeable.

I guess qpdfview is currently unable to disambiguate between "Enter" to begin a new search and "Enter" as "Find Next" -- therefore, choosing the latter every time? Do you think this is something that can be resolved, considering how Enter is a very common Find Next shortcut?

Revision history for this message

Brian Ejike (bejike) wrote on 2024-05-12 (last edit on 2024-05-12):

qpdfv_parallel_default_shortcuts.webm Edit (1.5 MiB, video/webm)

Separately, this is what I have now, with Parallel search enabled, and using the default Find Next/Prev shortcuts. You can see the screen go blank, as I Find Next, while the search is in progress.

It's a relatively minor UX thing, but perhaps Find Next can ignore inputs until it's really able to show the next result and its surrounding context? Depends on how much of this was intended ...

(For some reason, the progress bar's advancement is jankier in the recording than it really was during my test. Can probably be ignored ...)

Revision history for this message

Adam Reichold (adamreichold) wrote on 2024-05-12:

I fear this is a classic throughput versus latency trade-off: Parallel search will occupy all thread-pool threads with text extraction and hence rendering jobs will be enqueued behind that. If you want better latency, parallel search is probably not helpful. (Theoretically, Qt's thread pools have a priority and we already set it to 2 for foreground rendering, 1 for prefetch rendering an QtConcurrent tasks like the search should use the normal value of 0, but this does not seem to be able to preempt an already running QtConcurrent job like the search.)

Revision history for this message

Brian Ejike (bejike) wrote on 2024-05-14:

> "and hence rendering jobs will be enqueued behind that"

Ah, I see. Is there no (easy) way that "Find Next/Prev" could realize the next result's page hasn't been rendered yet, and so wait until that happens before visiting that page?

> "I guess qpdfview is currently unable to disambiguate between "Enter" to begin a new search and "Enter" as "Find Next""

Also, what do you think of this other issue? Is there a way to set Enter as Find Next shortcut, and still have it initiate a new search immediately when pressed?

In fact, it seems the code is inclined to do this already, without needing any change in shortcuts (introduced in 2017: https://bazaar.launchpad.net/~adamreichold/qpdfview/trunk/revision/2034):

```
void MainWindow::on_searchInitiated(const QString& text, bool modified)
...
        if(tab->searchText() != text || tab->searchMatchCase() != matchCase || tab->searchWholeWords() != wholeWords || tab->searchWasCanceled())
        {
            tab->startSearch(text, matchCase, wholeWords);
        }
        else
        {
            tab->findNext();
        }
```

Yet, when I test with pressing Enter a bunch of times after the search is finished, it does nothing. If I click inside the text box again and press Enter (without changing the text), it starts a new search for the same old text.

(Do you have the same behaviour, or is my version somehow too old and missing some fix?)

Revision history for this message

Brian Ejike (bejike) wrote on 2024-05-14:

#10

Ah, I think it was falling into the first "forAllTabs" block in that function, because I had the extended search dock open.

When I disable said dock, I'm now able to use Enter to start searches AND cycle through results, *almost* perfectly (aside from the rendering issue) --

Even after I've just begun or completed a new search, I have to always manually restore focus to the search bar by clicking inside it, before my Enter presses are finally interpreted as Find Next. Not sure what causes that behaviour.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.