Comment 18 for bug 1286771

Revision history for this message
Kovid Goyal (kovid) wrote : Re: calibre bug 1286771

I suggest either of the two following approaches:

1) Add some code to calibre to directly read the position data from the
djvu file (from what little I have seen of djvu it seems to be afairly
simple format)

2) Use the djvu xml producing tool to output XML and use the lxml
package (part of calibre) to parse it.

The only things you will have to check is that the djvu XML tool is not
too large and is, at least, nominally, compilable on all three platforms
and has a compatible license. I can take care of compiling it on the
other platforms as part of the calibre build process, provided that it
is nominally buildable and comes with some build scripts for the
different platforms.

1) is a bit more work, but is prefereable, since I suspect that using
the djvu XML tool will mean adding a very large block of code into
calibre for what is a fairly simple job.

If I am wrong about the djvu format being relatively simple, then (2)
becomes prefereable.

You can also start out with (2) and switch to (1) at the end once, you
have finished the layout detection work, that way you dont need to put in
much upfront work before implementing the most difficult part of the
process.

You can use either a C or python implementation for (1). Most likely
python will be fine, since I doubt this bit will be performance critical
(the current DJVU plugin has a C implemntation for the decompressor,
since implemnting that in python is very slow, but everything else is in
python).

If you wish to move this discussion to email, you will find my email
address all over the calibre source code.