Comment 11 for bug 1286771

Revision history for this message
Kurt Bigler (kkbshop) wrote :

Thanks for the discussion.

I suspect there is something that can be done, but may still be missing some pieces of the picture.

I could go find where djvu development lives and make some initial inquiries there.

However I'm still naive about how your heuristics work and to some degree how you handle un-marked-up text (i.e. with more helpful line breaks) and especially whether there is some example of calibre handling it well, or whether "handling well" is reserved for formats that are marked up in some way.

I don't really know what the expected scope (in terms of input formats) of the "un-wrap factor" is, for example. I should probably get a better sense of those sorts of things before marching off to djvu-land with any expectations. The calibre docs on Line-unwrap factor at least sounds to me that the heuristic is expected to be usable with raw text containing hard line breaks which are to be interpreted as intentional vs the effect of flow having already been done. So I am thinking it could be helped if line breaks could be detected from glyph position and manifested as newlines in what still may be raw text. And likewise a blank line could indicate a paragraph break, as I think used to be the case in nroff/troff although it has been a couple decades since I used them. (OTOH, I might also do better by having a way to extract djvu text with some mark-up added to it.)

Does this seem reasonable? Is there anywhere I can read up to get a better sense of what calibre likes to see in its input to be most effective? It may be I need something a little beyond the user documentation.

What I would most like is to know that I have some basic sense of it, which maybe you can confirm in some way, and then just go do it (something with djvu) and not have a long belabored investigation. That way I could actually contribute something, would have time to. Otherwise there is risk I'd not get to anything useful.

Any comments appreciated. Maybe you know how it is to be an outsider on something and how hard it can be to get to the very most basics of a thing because it is all so implicit to those who know about it. I can hardly tell from out here whether I am making good guesses or not, or would need to immerse myself for a full month (which might never happen).