Comment 15 for bug 1286771

Revision history for this message
Kurt Bigler (kkbshop) wrote :

Regarding the HTML required for the calibre pipeline, any particular flavor? Simple old html will do?

And do you have any representation for page breaks in the event that page headers/footers might ultimately be detected by heuristics? Or if I detect such, should I mark headers/footers with any particular styles, etc.?

***

Incidentally I see that in the example I've been playing with the geometry info is present at the word level via djvutxt -detail. I got that tip from an initial inquiry I made at the DjVuLibre project (discussion) on sourceforge.

Kurt-Biglers-iMac:~ kurt$ djvutxt -detail \[Isabelle_Stengers\]_thinking\ with\ whitehead.djvu | head -15
()
(page 0 0 2864 4937
  (line 132 4358 2724 4492 (word 132 4358 990 4492 "THINKING")
    (word 1118 4362 1556 4490 "WITH")
    (word 1684 4362 2724 4490 "WHITEHEAD") )
  (line 324 3936 2534 4052 (word 324 3962 410 4050 "A")
    (word 468 3960 702 4048 "Free")
    (word 740 3960 964 4052 "and")
    (word 1016 3958 1276 4052 "Wild")
    (word 1326 3958 1808 4050 "Creation")
    (word 1858 3936 2010 4050 "of")
    (word 2024 3936 2534 4050 "Concepts") )
  (line 342 3094 1630 3186 (word 342 3094 904 3186 "ISABELLE")
    (word 972 3094 1630 3186 "STENGERS") )
  (line 336 2814 1576 2906 (word 336 2834 786 2906 "Translated")
Kurt-Biglers-iMac:~ kurt$