Offer a pure Find Next/Prev option
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
qpdfview |
Opinion
|
Undecided
|
Unassigned |
Bug Description
I'm using qpdfview 0.4.18 via Ubuntu 22.04 Snap.
I often work with large PDFs (thousands of pages) and searching for keywords in qpdfview feels slow, e.g. over 20s for a 3000+ page document.
It seems qpdfview tries to search for ALL keyword occurrences and cache their locations, while simultaneously servicing my "Find Next" clicks. Which should normally be a okay strategy, but it doesn't feel like it.
When starting a new search, it even takes a few seconds to begin -- i.e. to forget the previous keyword, and begin searching for the new keyword. It is only after this delay that the very-long progress bar begins to progress. Let alone the time taken to search for that keyword across the document. The overall process just feels slow and unresponsive as a result.
You may want to contrast with Sumatra PDF, which simply searches for the next occurrence with lightning speed, even on Wine on Ubuntu. As the search progresses with each Find Next click, Sumatra caches the location of each result, until the search wraps. In this way, it never does more work than absolutely needed, and feels much faster as a result.
So, it would be great to have some config/setting so a user can specify the sort of search behaviour they want:
- A simple Find Next/Prev that just searches for the next occurrence and caches its location
- Or the default/current behaviour, where qpdfview searches for all occurrences
Did you try disabling the "Advanced search dock" on the "Interface" tab of the settings dialog which should already disable some of that overhead (not sure whether it already actually existed in 0.4.18 though).
Other than that, searching for all occurrences on a page is significantly faster in almost all documents due to the relatively high overhead the underlying libraries have for turning a given page into text. In current versions, all of this happens in the background and optionally in parallel which further reduces the time until results are ready. We even started out with the simpler implementation which tracks only a single current result and it was significantly more janky as searching for the next occurrence on a page is basically as fast/slow as searching for all occurrences on page with Poppler.
So in summary, while it may not be ideal for 3000+ page documents, I think this is still the best compromise given the libraries this program depends on.