Comment 20 for bug 158435

Revision history for this message
danh (danh-archive) wrote :

I'm summarizing the current issues so that everybody has a picture of what's going on.

The biggest problem is that the jp2s as we currently create them with DevelopMekel cause problems in the pdfs, as we currently create them with itext and read them with xpdf and acroread.

There are actually two problems: progressive jp2s are bad for xpdf (and i think evince), while embedded xml is bad for acroread. Note that the effects seem to be independent: that is, adobe can handle progressive as long as there is no xml, and xpdf can handle xml as long as the jp2 is not progressive.

(Note that the xml/progressive stuff is there in the first place to be more nearly NDNP conformant.)

We could try to solve this at the DevelopMekel end (by producing different or more products), or the itext end.

Raj says that there might be some way to do a lossless jp2-to-jp2 conversion to drop the progressive encoding, and of course we'd arrange to drop the xml at the same time if we did that: should be no need to have embedded xml inside an embedded jp2, because we'd just try to embed it directly. (Raj says that this can be done with jpg, for example.) Note also that i've tried to arrange for the backend to deposit the xml before the codestream, so hopefully it would be easy to remove.

On another matter:

For the words for which we get bad font size information, it is very easy to just drop them (we simply don't pass them along to itext). That's what i'm doing now, for my working copy of Todd's software. (But i haven't really bored to the bottom of that problem: i wanted to see the bad characters in context, so i forced out the pdf, but when it came out, it was so awful that i had to deal with it first.)

So, my plan is for now to:
(1) fix up the single page generation and get the mechanics of that worked out, and then
(2) deal with the xml/progressive problems. (That way i can accumulate any xml/progressive feedback while i deal with the other problem.)

Thanks in advance for any feedback from anybody.