Comment 5 for bug 1286771

Revision history for this message
Kurt Bigler (kkbshop) wrote :

Thanks, I see it is basically working now. (I will send a contribution.)

Your fix brings up a question. If the "internal implementation" now does what the /usr/bin/djvutxt implementation used to do, is there a reason the djvutxt implementation is still preferred, e.g. maybe for better potential future functionality?

Otherwise it would seem to add complexity to have two implementations. If the fallback implementation is less preferred then I should point out that, assuming djvutxt is easy to install (I never found out), it would would have served my purpose just as well to have had an error message produced stating /usr/bin/djvutxt is required.

Secondly (and this might belong in a separate bug item if pursued), it appears from the ability to search (the fact that search results hilite individual words in their correct location) that as a result calibre has a "complete picture" of the original, any OCR inaccuracies aside, e.g. such that heuristic processing should be fully functional. Yet I am not seeing section headings recognized. Rather they are wrapped into the paragraph text. In fact paragraph breaks are not recognized. (They have an approximately 2 "n" indent in the original.) Page headings are also not recognized, but being new to calibre I'm not sure what's expected. In my test case I increased the unwrap factor from 0.4 to 0.9 and a heading line which is about 55% of page width gets flowed in with the rest of the text, the paragraph above and below it combined. For due diligence I did a search for a word in that heading, and it is found and hilited at the correct location.

Should this go to the forum or to another bug report? Or are my expectations just not realistic? For a quick look I am attaching screenshots of sections of the input and resulting output that includ the heading "Taking a Speculative Philosophy Seriously?".