Internet Archive - Tech Support

Bug #824878
Comment #2

Comment 2 for bug 824878

Revision history for this message

Hank Bromley (hank-archive) wrote on 2011-08-16:

On item #1, see my email of July 13, with subject "Re: auto_submit for eng newspaper reels," sent to Jesse, Jude, Paul, Venus and Dan. It includes this (note especially the part about "just putting items into the 'newspaper' collection) . . .

= = = = = =

whether we make the single-page PDFs instead of the whole-item PDF is determined by whether the item is in the "newspapers" collection

whether the item is in "newspaper" also affects the DevelopMekel in-derive book-op (this is dan's code, so he may remember more about it), which runs near the beginning of the derive *but only for items that don't yet have jp2s*. previous newspaper scans were uploaded as jpgs and converted to jp2 by this book-op; because we now upload jp2s, it doesn't run. when it does run on a newspaper, it sets pagination=true for the item (causing the "View PDFs" widget to be displayed on the details page) and inserts metadata into each jp2 it makes, to be compliant with the NDNP standard - we currently have no path for inserting those metadata into pre-existing jp2s

just putting items into the "newspaper" collection, while still uploading them as jp2s and skipping RePublisher, yields the result Jesse mentioned (http://www.archive.org/details/december20188702dulu): we make the single-page PDFs:

http://www.archive.org/download/december20188702dulu/december20188702dulu_pdf.zip/

but there's no information available on the issue dates, and initially the "View PDFs" widget doesn't appear, either - I manually added pagination=true to this item to make it appear, but you can see the date info isn't filled in. and the individual jp2s don't have the metadata that DevelopMekel would have inserted.

= = = = = =

This item is in the newspapers collection; if you check the derive log, you'll see that exactly as described above, we made a _pdf.zip instead of a .pdf (and aren't displaying the "View PDFs" widget because the item was uploaded with jp2s rather than jpgs). The derive log also shows that we also skipped making DjVu because the item is in the newspapers collection. Without DjVu, we don't make "full text."

This is getting a little surreal. I keep saying that we're not set up to process newspapers, and it would require a serious engineering effort to become able to with our current microfilm workflow - and yet people keep trying, and being surprised when it doesn't work.

= = = = = =

whether we make the single-page PDFs instead of the whole-item PDF is determined by whether the item is in the "newspapers" collection

http://www.archive.org/download/december20188702dulu/december20188702dulu_pdf.zip/

= = = = = =